chevron-down Created with Sketch Beta.

The SciTech Lawyer

Magazine Archives

Objects May Be Closer Than They Appear: Uncertainty and Reliability Implications of Computer Vision Depth Estimation for Vehicular Collision Avoidance and Navigation

David K. A. Mordecai, Samantha Kappagoda, and John Y Shin

Summary

  • Algorithmic safety-critical systems in autonomous vehivles is an emerging legal frontier that may expose companies to extensive economic risk.
  • The reliability of computer vision depth estimation for vehicular collision avoidance and navigation is uncertain.
  • Machine learning applications of computer vision to safety-critical use-cases for cyberphysical systems highlights to need for risk mitigation, reliability, safety, and security.
Objects May Be Closer Than They Appear: Uncertainty and Reliability Implications of Computer Vision Depth Estimation for Vehicular Collision Avoidance and Navigation
Westend61 via Getty Images

Jump to:

Introduction and Scope

Recent events accompanying increased adoption of machine learning applications of computer vision to safety-critical use-cases for cyberphysical systems has sharpened focus on the necessity of risk mitigation, reliability, safety, and security. An emergent risk domain across embedded cyberphysical systems involves the proliferation of camera-based autonomous driver assistance and vehicular navigation systems and the application of computer vision technology to perform the complex tasks of depth estimation, as well as object detection and image recognition.

→ Download figures from this article [PDF].

A 1987 Harvard Business Review (HBR) article “Product Liability: You’re More Exposed Than You Think” cites numerous instances of product liability and states (emphasis added):

Design defect litigation can be most expensive and troubling for businesses because it delves unabashedly into the gray realm of what should have been done and what would have happened if . . . Almost every court acknowledges that a design defect, as opposed to a manufacturing defect—in which case a product is clearly not what it was intended to be—is difficult to identify. One thing is clear from case history: even if your product is as safe as anyone else’s in your industry and does what customers expect it to do, if a feasible design alternative could have prevented an accident, your product is at fault. In any design decision that may affect the safety of your product, it’s important to compare the benefits of that design solution—like cost savings or speed or ease of manufacturing—with the risk of harm to customers your decision may entail.

By articulating the distinction between unintended consequences of design defects (versus manufacturing defects, e.g., fabrication flaws, operating faults), the HBR article proceeds to articulate tradeoffs between costs saving and manufacturing efficiencies relative to the risk of harm to consumers and broadens the context of performance and use of products beyond manufacturer intended use. As well as extending the concept of product design defects to components, packaging, warning labels, risk disclosures, operating instructions, and documentation, as well as across the supply chain to retailers, lessors and maintenance service providers, the article discusses untested modifications and differences across state product liability regimes. As the HBR article explains, since “[z]ero defects are seldom possible,” analyzing the scope of liability from product deficiencies (e.g., design defects) and allocating resources to reasonably mitigate those most likely to incur injury can drastically reduce product liability exposure from unintended consequences.

Commentary on a recent Michigan state court decision attributed to be the first case to recognize software as a product for purposes of state product liability law notes that the fatality proceeded from an assembly line accident, in which although—as is typical with product liability cases involving machinery—the plaintiff undisputedly failed to comply with onsite safety policies, her estate sought to impose liability on a number of entities principally on the theory that the robotics operating software was insufficient to protect her from her own actions. The commentary further indicates that due to nuances of Michigan law, the defendants sought to characterize the allegedly defective software as a product, the opposite of what is expected for most software companies facing potential liability for personal injuries.

In addressing implications of the Boeing 737 Max product liability litigation, Cornell University Law School torts scholar W. Bradley Wendel argues in favor of a systems approach to accidents involving technologically advanced products, taking into account the relationship between product design and foreseeable carelessness by users. In so doing, he considers the premise that technology can increase safety in the form of reducing human error, as a primary argument for semiautonomous (“driverless”) vehicles, i.e., a prospective reduction in driver carelessness as a cause of automobile accidents, and notes that the evolving technology may be outpacing development of legal principles applicable to the interaction between sometimes careless users and machines with design features intended to mitigate the risks resulting from human error. In advocating for a systems approach to risk management, Wendel observes that products liability analysis tends to focus either on product design or user carelessness, although product use involves dynamic relationship between technological solutions to risks and human behavior. Since attempts to design around a persistent pattern of accidents corresponding to human error might result in new, perhaps unanticipated, and possibly even more dangerous pattern of accidents caused or exacerbated by the technology. Such unintended consequences implies that products liability law proceed from a systems approach to risk management (rather than considering either product design or user error in isolation), in which safety is an emergent property resulting from the interaction between users and machines and the environment, and focuses on the risks associated with latent errors—those made by product designers and engineers seeking to foresee the actions of human users – sometimes introducing new and unanticipated dangers.

The focus of this series of two articles is to present a case study illustrating technical limitations of computer vision algorithms (specifically RGB-D algorithms as further described) as a single point of failure for depth estimation, a principal safety-critical task for automated driver assistance for automotive vehicles across a diverse range of complex operating and environmental conditions, as well as differing spatial and temporal scales.

Depth estimation is critical to many computer vision and image analysis applications including, driver assistance and autonomous driving. In particular, RGB-D algorithms are most recently being explored for computer vision tasks across indoor and outdoor settings, aerial and driving scenarios, as well as medical use-case domains fundamental to addressing classic computer vision tasks, e.g., monocular depth estimation (as described in endnotes to this article), and for investigating generalizable machine learning models in the monocular depth estimation field. Estimating relative distance to a camera aperture is considered fundamental to collision avoidance, autonomous driving, 3D scene reconstruction, augmented reality, and robotics, depth being a key prerequisite to spatial perception, navigation, mapping and planning tasks.

RGB-D algorithms estimate relative depth by mapping systematic pixel-wise differences in light, shadow, and contrast across a large collection of images. The algorithm infers resulting depth estimates from depth cues across the aforementioned optical and photometric properties inherent to the collected sample of images, e.g., lighting contrasts corresponding to relative position and size of image pixel groupings, contextual information from shadows and the position of objects relative to the point of contact with the ground.

Yet unresolved impediments to reliable RGB-D include correspondence matching, which can be difficult for a range of reasons, e.g., images involving texture-less regions, occlusion, non-lambertian surfaces (i.e., disparate brightness related to relative angle of incidence), highly reflective surfaces, and resolving ambiguity, i.e., for which many 3D relative depths can correspond to the same 2D image plane such that depth estimates are nonunique. In this context, depth estimation is technically characterized as an ill-posed inverse problem, e.g., due to pervasive issues like inherent scale ambiguity and projection ambiguity, each of which underly safety-critical field deployments for driver assistance tasks (e.g., collision avoidance) described throughout this article.

This first installment of the series will primarily focus on depth estimation tasks associated with operating and environmental conditions as well as spatial and temporal scales generally applicable to highway, rural and suburban settings. The second installment will address operating and environmental conditions and spatial and temporal scales generally applicable to depth estimation tasks within urban settings, in contrast to comparative depth estimation scales and conditions applicable to highway settings.

As previously indicated, this article, as a case study, presents edge-case artifacts, confounds and anomalies in automotive machine vision depth estimation use-case applications, in order to highlight applicable operating limitations with unintended consequences, which might be indicative of prospective cyberphysical safety risks and related liability implications, subject to further investigation.

As a primer describing fundamental principles underlying RGB machine vision and RGB-D estimation applicable to technical risk and liability mitigation practices for these safety-critical automotive systems, each example discusses a critical role for uncertainty quantification and error diagnostics methods, which may bear relevance for forensic statistical and economic analysis with public policy, regulatory, judicial, and legislative implications.

A key motivation is to introduce and conceptually illustrate relevant and applicable cyberphysical risk and reliability practice principles for automotive safety and reliability engineering in the context of the history of automotive safety and risk mitigation, the ongoing development of driver assistance technology, as well as recent events in automated vehicular safety.

It is from this frame of reference that this case study also provides general background as context regarding the role of computer vision (more specifically, RGB-D algorithms as described herein) for object detection and depth estimation functionality related to automotive safety (e.g., collision avoidance, lane departure), advanced driver assistance (e.g., adaptive cruise control) and (semi)autonomous navigation systems.

Another key objective is to present a conceptual primer on relevant technical intuition regarding computer vision depth estimation for RGB images, RGB-D neural network training and test data, as well as practical issues and tradeoffs for field applications of depth estimation, particularly in terms of reasonably reliable accuracy in the correspondence between absolute versus relative distance (i.e., depth) measurement.

Edge-case examples of image artifacts and confounds highlight the applicability and importance of anomaly detection, outlier analysis and the role of statistical forensics, error propagation diagnostics and uncertainty quantification to suitably robust and reliably safe automotive vision systems. The edge-cases presented also illustrate cyberphysical risks due to computer vision system limitations related to technically challenges across operating and environmental conditions “in the wild” as a prospective single point of failure, in the absence of supplementation by complementary sensor modalities (e.g., RADAR, LIDAR, SONAR).

Cyberphysical Risk and Reliability Engineering: General Practice Principles

As acknowledged by generally accepted practice principles for cyberphysical risk and reliability engineering as documented in hybrid systems technical design literatures for safety-critical settings (e.g., aviation, rail, marine, automotive safety), technical limitations as single points of failure pose implications applicable to implied warranty, due care in health and safety risk mitigation, causation, apportionment of liability and allocation of damages.

The automotive safety literature outlines a history of innovation and industry-wide adoption of vehicle safety improvements, coinciding with evolving consumer expectations, in conjunction with institutional response interventions, i.e., the establishment of national or international industry standards (e.g., device under test analytical tools, methodologies, practice principles), regulatory actions, legislative actions and judicial decisions. Institutional and industry custom and practice, and associated case law comprised of court rulings address product liability and warranty, as well as economic risk and associated safety and reliability requirements for automotive manufacturing can serve as reference for the suitably viable, robust adoption and reasonably reliable technical implementation and field deployment of state-of-the-art safety-critical systems.

Beginning with early twentieth century vehicular braking system innovation, brake light and turn signal indicators, and later followed by wide-scale twentieth century adoption of seatbelts, subsequent innovations have entailed increasingly software intensive fly-by-wire adaptive cruise control and automated braking, among other advanced driver assistance systems of increasing complexity. Throughout, both regulatory records and case law document ongoing adoption of safety improvements accompanied by instances of safety and design defects related to economic tradeoffs between cost and technical overhead, often associated with underlying commonly accepted notions and practices for duty of care. The more recent history of automotive product recalls and liability incidents have involved allegations in which automated vehicular systems have exhibited aberrant, unexpected, or uncontrolled longitudinal and lateral guidance, steering, acceleration, and deceleration occurrences resulting in accidental injuries and fatalities, litigation, and enforcement proceedings.

Recent events represent anecdotal evidence of anomalies, deficiencies, and technical limitations of camera-only ADAS as a single point of failure for safety-critical field applications, as generally acknowledged across widely adopted and customary automotive reliability engineering practice principles.

As highlighted by the preponderance of existing research in the field, since no single sensor is robust across use-cases and conditions, it is generally-accepted that sensor redundancy and data fusion is necessary for fault tolerance of the system overall, i.e., mitigation of a sensor limitation or flaw impairing system reliability. As generally accepted and acknowledged as a matter of practice in the context of safety and reliability, data fusion architectures are widely adopted to mitigate risks from a specific sensor modality as a single point of failure. Although camera-based sensors tend to be more appropriate for photometric object detection such as traffic lights, signs, lane markers, road debris and other road participants (e.g., pedestrians, bicyclists, animals, other vehicles), robustness and reliability is limited for many conditions and use-cases. Camera-based RGB sensors tend to be sensitive to weather conditions (e.g., rain, fog) and lighting conditions, as well as being less reliable in low-light conditions, or for speed detection.

As previously indicated, this series of articles focuses on depth estimation limitations of camera-only ADAS systems as single points of failure. As displayed in Exhibit A, given countervailing field-of-view (FOV) versus range limitations, as well as susceptibility to prevailing operating and environmental conditions, as a customary practice among nearly all ADAS implementations, camera-based algorithms are typically deployed in conjunction with supplemental sensor modalities with physical and functional properties to offset or otherwise compensate for inherent limitations and idiosyncrasies in the following applications: adaptive light control, adaptive cruise control, automated emergency braking, forward and rear collision warning, lane tracking, park assist, free space detection, obstacle detection.

As illustrated in Exhibit B, dilemma zone decisions (i.e., system responses related to relative velocity, acceleration, proximity, and position) pervade automotive navigation tasks, both for highway (e.g., accessing entry and exit ramps, lane changing, traffic merging, vehicle passing) and urban settings (e.g., traffic light transitions approaching intersections).

The proliferation of computer vision systems with prospective adoption and deployment of semiautonomous and autonomous vehicle navigation has precipitated technical discourse and debate regarding the reliability of such systems. Ongoing deliberations cite recent incidents (and related federal investigations) involving vehicle accidents which resulted in fatalities, severe injury, and property damage as evidence of the risks associated with inherent vulnerabilities of adopting camera-based algorithmic systems as a single-point of failure, in the absence of reliable and robust data fusion with the aforementioned compensating sensor modalities.

Case Study: Instances of Risk and Liability Exposure from Undiagnosed Uncertainty and Unmitigated Error in RGB-D Estimates

The subsequent sections of this article serve as a case study in which undiagnosed uncertainty in the absence of mitigating redundancy presents instances of RGB-D anomalies with corresponding risks of unintended consequences due to erroneous depth estimation. The subsequent sections proceed as follows:

  1. an introduction to computer vision depth estimation for RGB images,
  2. a general description of neural network training and test data,
  3. practical issues and tradeoffs for field applications for depth estimation,
  4. underlying statistical tradeoffs involving uncertainty and error propagation,
  5. variance estimation of depth for uncertainty quantification and error diagnostics,
  6. illustrative tradeoffs in highway and suburban intersection depth estimation tasks,
  7. illustrative tradeoffs in urban depth estimation tasks,
  8. summation and discussion of liability implications in the context of edge case testing and unintended consequences in field applications of RGB-D depth estimation as a single point of failure.

Computer Vision Depth Estimation for RGB Images: A Primer

A 2018 accident in Mountain View, CA is representative of recent incidents (and related federal investigations) documented across numerous Associated Press articles, involving vehicle accidents which have resulted in fatalities, severe injury, and property damage, which has precipitated ongoing active debate regarding the reliability of automotive computer vision systems for a range of driver assistance and vehicular navigation tasks, e.g., depth estimation.

Depth estimation can be described as a statistical computer vision task of estimating the depth of each pixel in a given image (typically in RGB format), where either monocular, stereo, or panoramic image processing renders three-dimensional (3D) reconstruction of a scene depicted in an image. The term RGB depth (RGB-D) estimation generally characterizes depth estimated in isolation (subsequent to initial training of an RGB-D algorithm). Therefore, post-training RGB-D estimates are typically calculated independent of measurements calibrated as objective points of reference derived from active sensing modalities (e.g., LIDAR, RADAR or SONAR), e.g., based on the timing of signals transmitted and received as reflected by objects in the environment (or alternatively changes in the structured light).

The task of depth estimation is considered to be an ill-posed problem, since an arbitrarily large (and perhaps infinite) number of relative depth estimates, as weighted averages of pixels comprising the image for a given 3D scene can be attributed to the same 2D projection. Relative depth estimates are typically inferred by comparing relative intensities attributable to contrasts in lighting conditions, color, textures and/or edges (among other properties) across pixels within an image. Since the RGB-D algorithm does not infer clusters of pixels as objects, the depth map is solely a composite comprised of a depth estimate corresponding to each pixel of an image.

RGB-D methods estimate relative geometric relationships related to visual perspective in accordance with corresponding prototypical representations (e.g., averages). These representations correspond to respective distance or proximity inferred from comparative differences in lighting, color and/or other aforementioned properties. In the context of statistical estimates as mathematical functions comprised of such weighted averages, any of these combinations of weighted averages derived from the image of a 3D scene may be a likely solution to the identical 2D projection.

Among other methods (e.g., structured light, time-of-flight ), RGB-D is being adopted for use-case applications for both indoor and outdoor spatial scales, across robotic controls (e.g., Unmanned Aerial Vehicles (UAVs) commonly referred to as drones), automotive navigation, lane departure and collision avoidance systems for driver assistance (as well as for prospective semi-autonomous and autonomous deployment applications). The aforementioned automotive use-case applications entail depth estimation for exterior environments across a more extensive range of spatial scales and resolutions than is typical for spatial rendering and depth estimation of images for indoor settings, and thereby presents different challenges.

Despite exhibited fragility and instability generally acknowledged and widely documented to be due to the inherent ill-posed nature of such systems, in the absence of objective calibration by independent reference measurement from aforementioned active sensor modalities, model-free RGB-D implementations based on neural networks (colloquially referred to as Artificial Intelligence (AI) models) continue to be widely adopted for depth estimation in use-case domains which entail safety-critical reliability tradeoffs with corresponding cyberphysical risk and liability implications. The motivation for this adoption, as customarily expressed, typically involves reductions in hardware cost and engineering or software design complexity (commonly referred to as technical overhead).

Configuration, specification, and calibration of neural network architectures applied to RGB-D, object detection and other image processing tasks primarily employ statistical approximation methods referred to as supervised learning, which typically utilize extensive data repositories comprised of enormous collections of labelled and annotated images consisting of thousands, millions and sometimes billions of sampled image frames (with corresponding metadata). However, an expanding literature documenting anecdotal evidence highlights risks to health, safety and property associated with data incompleteness and inadequate training examples.

RGB-D Neural Network Training and Test Data

There is an extensive body of literature on the importance of data completeness for reliable training of safety-critical computer vision systems in order to mitigate the risk and liability of unintended consequences. Among the key themes underlying such cyberphysical risk mitigation entails uncertainty quantification and error diagnostics for assessing data limitations and algorithmic bias in these settings.

In contrast to unsupervised learning and self-supervised learning, supervised learning typically requires a labelled dataset, as previously described, in order to iteratively configure and adapt a specified network to a latent structure underlying the dataset, generally referred to as training the network. RGB-D network training datasets customarily employ RGB-D image data with corresponding depth annotations as metadata collected via a specialized RGB-D camera. It is these respective depth annotations by which the model iteratively approximates associations between RGB-D image pixel values to derive an attributable depth mapping between 2D representations and 3D spatial relationships. For safety-critical domain applications, in order to mitigate cyberphysical risk, the resulting representation must generalize in order to reliably approximate with reasonable accuracy the spatial properties of the actual 3D environment (e.g., in terms of distances between objects and respective proximities within an image).

By employing a test set of new images, trained models are customarily evaluated by benchmarks which compare resulting representative spatial properties (e.g., relative distance measurements and proximities) reproduced for test dataset image samples versus training dataset image samples, which can be further compared to objective (ground truth) depth measurements collected from independently calibrated sensor data. Reliability testing employing ground truth depth measurements from independently calibrated sensor data is necessary (although insufficient) for mitigating cyberphysical risk in safety-critical domains. An objective of such reliability testing and calibration is validation of the estimated 3D spatial relationships (and attributed tradeoffs) based upon 2D representations inherent in the RGB image frames and the corresponding RGB-D metadata.

Practical Issues and Tradeoffs for Field Applications of Depth Estimation

As previously indicated, robust and reliable field deployment of automotive driver assistance and collision avoidance computer vision systems entail numerous tradeoffs in statistical estimation of relative position, depth, and distance. In practice, particularly for cyberphysical risk mitigation in safety-critical field applications of depth estimation, object detection and other image processing tasks, additional considerations must address other operative technical challenges which involve a range of prospective edge-case vulnerabilities, related to inherent discontinuities, disparities and statistical confounds (e.g., non-stationarity of distributional properties, non-representativeness, sampling, and data bias).

An RGB-D network trained on a dataset comprised of image samples may not reliably approximate reasonably accurate attributed depth to settings not adequately represented by the sample images in the training data. As a case of non-stationarity within the statistics literature, such instances have been referred to in machine learning as out-of-domain generalization, and have been characterized as a type of distribution shift.By way of illustration, as further exhibited herein, image samples from highway (or rural) settings may not adequately generalize to urban settings, and vice-versa.

Similarly, an RBG-D network trained on image samples which comprise settings with a specific subset of geometric or spatial properties might misattribute depth and other spatial estimates to image samples for settings with disparate geometric or spatial properties. As a case of non-stationarity within the statistics literature, such instances have been referred to in machine learning as catastrophic forgetting.

Other cases of non-stationarity or shifts in underlying distributional properties include variability in lighting, contrast, atmospheric and weather conditions, as well as disparities, discontinuities, faults, defects and degradation of sensors, processing hardware and software, in addition to other operating conditions related to the physics and geometry intrinsic to the environment and/or sensor modalities (e.g., photogrammetry) which may alter the reliability and accuracy of depth estimates for safety-critical use-case applications in the field.

Beyond depth estimation, the aforementioned instances of non-stationarity tend to be generally applicable considerations for reliability and accuracy of statistical and machine learning tasks, particularly for safety-critical field deployment. In addition, uncertainty estimation and error propagation diagnostics tend to be generally accepted and established practices fundamental to reliability engineering, particularly across safety-critical embedded systems (e.g., automotive, aviation, satellites, electricity grid, etc.).

As a matter of practice, uncertainty and error diagnostics are essential to cyberphysical risk mitigation in the adoption and field deployment of machine vision applications, by compensating for risk exposures associated with pervasive dilemma zone tradeoffs related to machine vision anomalies, e.g., error propagation and depth estimate confounds. Dilemma zone decisions related to situational tradeoffs in system responses between velocity trajectories and relative proximity can result in unintended consequences based on unmitigated confounds, conflicts, discrepancies, or disparities from erroneous, unduly uncertain, or ambiguous image depth attributions.

Underlying Statistical Tradeoffs Involving Uncertainty and Error Propagation

The relationships between RGB-D and dilemma zone decisions entail inherent tradeoffs, primarily related to statistical uncertainty and error. Uncertainty estimation is a subdiscipline within statistical science which aims to quantify the degree of variation, dispersion or bias exhibited by a data sample or the results of a specified model. Uncertainty estimation and error propagation diagnostics are customarily foundational and generally accepted fundamental disciplines practiced across experimental practices and analytical methods in the physical, biological, and social sciences, as well as engineering.

As within the physical, biological, and social sciences, experimental analysis as well as the design, testing and deployment of engineered systems involve estimation tasks comprised of solutions for complex processes described by systems of interrelated equations. Addressing inherent systematic error propagation and quantifying uncertainty is commonly acknowledged within these disciplines as a necessary and fundamental practice to mitigate system biases, discontinuities, disparities, and discrepancies from numerous sources (e.g., measurement error, relative differences in scale and/or resolution).

Neural networks solve surrogate systems of equations which attempt to approximate functions, in order to emulate actual processes (i.e., physical, biological, sociopolitical, or engineered systems), employing connections between layers of mathematical transformations. In comparison to established and generally accepted practice across statistical science and related fields, the applicability of uncertainty estimation to machine learning (and more specifically computer vision), is relatively nascent although increasingly acknowledged to be an important ongoing topic of research, particularly related to algorithmic robustness and reliability.

In accordance with generally accepted and established engineering practices and principles, embedded systems reliability requirements dictate addressing tradeoffs between algorithmic computational overhead versus robust and reliable uncertainty and error diagnostics, an active area of ongoing research, particularly for cyberphysical risk mitigation in safety-critical applications.

The following case study, in a series of two installments, presents illustrative examples indicative of unintended consequences for instances of dilemma zone tradeoffs, which might result from unmitigated RGB-D confounds and artifacts as a single point of failure (primarily related to undiagnosed statistical uncertainty and error). This first installment addresses lighting conditions as well as spatial and temporal scales applicable to highway and suburban road settings, in contrast to spatial and temporal scales, as well as lighting conditions and relative geometries and object densities applicable to more crowded urban settings.

Variance Estimation of Depth for Uncertainty Quantification and Error Diagnostics

In order to mitigate risks and liability from unintended consequences, generally-accepted practices and principles of reliability engineering and embedded systems design posit that the widespread commercial adoption and field deployment of algorithmic architectures for computer vision systems in safety-critical applications necessitates the fundamental need for rigorous and robust diagnostics and testing, as well as programmatic risk mitigating practices related to algorithmic transparency, interpretability, and the recently emergent notion of explainability. Scholarship within the field of law and economics, particularly related to robotics and statistical forensic analysis of machine testimony and machine behavior in safety-critical domains, discusses liability implications relevant to issues of due care, causal attribution, apportionment of liability and damages.

The following examples exhibit disparities, discrepancies and edge-cases which illustrate prospective risks from RGB-D estimation outliers and anomalies (as a single point of failure) in images corresponding to outdoor driving scenarios. These examples demonstrate the critical importance of forensically reliable uncertainty quantification and error diagnostics, as well as robust data fusion with sensor modality redundancy.

In contrast to more typical RGB-D implementations which exhibit relative average depths within an image as single point estimates omitting corresponding variances, an RGB-D implementation with uncertainty estimation exhibits both relative average depths and corresponding variances for each point estimate. Although the experiments underlying these examples implemented two different models (model 1 and model 2) with two different uncertainty estimation methods (method 1 and method 2), for purposes of comparison and simplicity, this case study limits discussion to results exhibited by model 1/method 1 (without loss of generality, although with differing edge-case anomalies or statistical artifacts each corresponding to a respective pairing of model and method).

In order to highlight illustrative examples indicative of confounds, disparities, discrepancies, outliers, anomalies, or other artifacts across RGB images, comparisons of roadway scenarios descriptive of highway and suburban intersections versus urban settings may demonstrate instances in which these might result in unintended consequences related to undiagnosed uncertainty and/or error propagation with risk and liability implications for safety-critical systems in these settings. Each of the following dilemma zone scenarios (first panel of each figure) exhibit image artifacts corresponding to depth estimate uncertainty as indicated by comparing average depth estimates (second panel of each figure) with respective variances (third panel of each figure).

Illustrative Tradeoffs in Highway and Suburban Intersection Depth Estimation Tasks

As previously discussed, in these settings, safety-critical systems can be susceptible to dilemma zone tradeoffs related to relative proximity, velocity, and trajectory which correspond to object location and depth estimation tasks. As illustrated below, the reliability and robustness of these tasks are subject to statistical uncertainty and error propagation, which in the absence of system redundancies and other mitigants, are vulnerable to misspecification and other algorithmic limitations.

Figures 1 to 5 exhibit examples of RGB-D image edge-cases for depth estimation tasks in highway settings by comparing each RGB image to its corresponding average depth map and variance map, in order to visualize regions of high uncertainty for systematic error diagnostics (as well as indicating tradeoffs between the degree of uncertainty and/or systematic error propagation conditional upon the uncertainty estimation method).

Each of the following five figures display three panels:

Upper panel: a photometric (digital camera) RGB image sourced from either a training dataset or a test dataset,

Middle panel: corresponding mapping of average depth estimates for each pixel in the photometric RGB image displayed in the upper panel, indexed by a color key (towards the right of each middle panel) for which darker hues represent lower average depth estimates. The scale of average depth estimates varies across each of the figures, based on the corresponding spatial scales represented in the upper panel image.

Lower panel: corresponding variances for each pixel-specific depth estimate in the middle panel, indexed by a color key (towards the right of each lower panel) in which brighter hues represent higher variances of pixel-specific average depth estimates. The range and scale of the variance estimates differ across each of the figures, based upon the uncertainty associated with the pixel-specific depth estimates in the middle panel.

Figures 1–5: Non-representativeness and Sample Bias as Illustrated by Average Depth Estimate Divergence and Inherent Uncertainty Between Training and Test Images for Suburban and Highway Settings

Figure 1: Figure 1 is an RGB image of a representative suburban intersection from the training data from the KITTI-360 dataset (i.e., those images with which the algorithm has been trained) in which the average depth estimates in the middle panel range up to 75 meters of depth, and the variances in the lower panel range between 0.0001 (i.e., 10-4) and 100 (i.e.,102) squared meters according to a respective log scale. Sources: Top image: Kitti-360, Yiyi Liao and Jun Xie & Andreas Geiger (2021); Middle/bottom images: processed using algorithms (by authors). Reprinted with permission.

Figure 2: Figure 2 is an RGB image of representative suburban road segment from the test data from the KITTI-360 dataset (i.e., those images with which the trained algorithm is being tested), in which the average depth estimates in the middle panel range up to 75 meters of depth, and the variances in the lower panel range between 0.0001 (i.e., 10-4) and 100 (i.e., 102) squared meters according to a respective log scale. With respect to pixel-specific average depth estimates, higher corresponding variances represent greater uncertainty.

Figures 3, 4 and 5 are test image frames from a Google Streetview video interval for a highway segment in San Jose, CA, corresponding to a location similar to the lane barrier involved in the aforementioned 2018 Mountainview accident. Each figure displays an RGB test frame at a specified distance from the lane barrier gore point (as described below). It should be noted that Figures 3, 4 and 5 are not images from the KITTI-360 dataset, which serves as an instance of a test case for the generalizability of the algorithm beyond image data characteristic of KITTI-360 as a population of images. Sources: Top image: Kitti-360, Yiyi Liao and Jun Xie & Andreas Geiger (2021); Middle/bottom images: processed using algorithms (by authors). Reprinted with permission.

Figure 3: the upper panel is the RGB image frame retrieved from the Google Streetview video stream interval approaching the gore point lane barrier at approximately 75 meters, in which the gore point remains indiscernible to visual inspection of the middle panel, and for which the pixel-specific average depth estimate variances displayed in the lower panel exhibit high uncertainty. Sources: Top image: Image Capture: May 2021 © 2022 Google; Middle/bottom images: processed using algorithms (by authors). Reprinted with permission.

Figure 4: the upper panel is the RGB image frame retrieved from the Google Streetview video stream interval approaching the gore point lane barrier at approximately 50 meters, in which the gore point becomes barely discernible to visual inspection of the middle panel, and for which the pixel-specific average depth estimate variances displayed in the lower panel exhibit similarly high uncertainty. Sources: Top image: Image Capture: May 2021 © 2022 Google; Middle/bottom images: processed using algorithms (by authors). Reprinted with permission.

Figure 5: the upper panel is the RGB image frame retrieved from the Google Streetview video stream interval approaching the gore point lane barrier at approximately 25 meters, in which the gore point, although slightly discernible to visual inspection of the middle panel, exhibits an anomalous geometry, for which the pixel-specific average depth estimate variances displayed in the lower panel exhibit high uncertainty. Sources: Top image: Image Capture: May 2021 © 2022 Google; Middle/bottom images: processed using algorithms (by authors). Reprinted with permission.

In general, test images exhibit higher pixel-specific average depth estimate variances than more representative training images. Other readily observable systematic anomalies and discrepancies in average depth (and corresponding variances) include such artifacts as dark horizontal bands above the horizon and along the ground contact point where depth calibrating metadata from LiDAR sensors are unavailable in the training dataset. It is also important to recognize that depth estimates being pixel-wise, object information is not explicitly recovered from the depth map but rather depth is inferred primarily from disparities in lighting, shadow, contrast, refraction, and reflection across pixels.

The upper panel of Figure 1 displays an RGB image of a road segment image in Karlsruhe, Germany retrieved from the KITTI-360 dataset employed to train the algorithm. The middle panel of Figure 1 displays the corresponding map of average depths, with an index which indicates the depth value in meters. The lower panel of Figure 1 exhibits the corresponding variances of the depth estimates in meters squared. In contrast to the average depth values as displayed in the middle panel, the variances displayed in the lower panel are plotted using a log scale, which represents an order of magnitude difference between incremental units of the index.

The color index of the middle panel exhibiting average depth estimates progresses from dark blue to light yellow as the estimate of average depth for a respective pixel increases. For example, the dark blue coloring in the middle panel corresponds to relative depth of 0 to 5 meters for the position of the red vehicle centered in the RGB image (Fig. 1: upper panel).

Depth values of vehicles traversing the intersection from left to right in the RGB image (Fig. 1: upper panel) are estimated to be approximately 25 meters, with structures in the background (e.g., building on the left of the image) estimated to be approximately 50 meters.

A horizontal band representing a region of high relative variance (ranging between approximately 100 to 102 meters squared) is exhibited across the midsection of the lower panel of Figure 1, corresponding to pixels with average depth values (as exhibited in the middle panel) between approximately 50 meters to 75 meters (or greater). An adjacent horizontal band adjoining the uppermost region of the lower panel of Figure 1 exhibits slightly more moderate variances corresponding to a somewhat anomalous pattern of corresponding average depth values (exhibited in the middle panel) estimated at depths ranging less than 25 meters and greater than 75 meters. For example, the average depth for the uppermost region of the image is exhibited to be less than 25 meters (approaching zero), with extremely low variances of 0.0001 meters squared (i.e., 10-4 m2).

Figure 2 displays a typical image from the test dataset (on which the algorithm has not been trained) reasonably similar to images comprising the training dataset. The average depth estimate and variance mappings display similar corresponding patterns to training dataset images and mappings. However, exhibited variances approach ~101 to ~102 corresponding to certain regions in the image, in particular, a band of high variance at the upper boundary of the horizon.

At the focal point of the Figure 2 RGB image, average depth estimates towards the center of the middle panel exceed 75 meters (with respective variances ranging between 100 and 10-4 as exhibited by Fig. 2: lower panel) corresponding to the region around the vanishing point of the RGB image (Fig. 2: upper panel).

It is generally acknowledged by robotics and computer vision researchers and engineers that in practice, reliable identification of stationary objects remains among the most difficult challenges for object detection and related tasks, e.g., pose estimation of objects in monocular and stereo images, or image sequences, and simultaneous location and mapping (SLAM). As previously indicated, as a matter of generally-accepted practice for safe and reliable deployment of automotive systems, reliable depth estimation typically involves data fusion between redundant SONAR, LIDAR, and RADAR and other related collision avoidance and navigation tasks (e.g., object detection, lane detection, etc.).

As an illustration of this ongoing challenge in the field, Figures 3 through 5 exhibit a sequence of three RGB test image frames from a video segment in Google Street View which illustrate a highway driving edge-case scenario with applicable prospective liability implications corresponding to unintended consequences.

The RGB image in Figure 3 retrieved from Google Maps displays a highway segment near San Jose, California for which the algorithm was not trained, and which exhibits characteristics unrepresentative of roadway images comprising the RGB training dataset. In the image, the vehicle is approaching a gore point lane barrier on the left. Although the physical gore point barrier is not discernable in the depth map at this distance for this model and method, based on depth values exhibited at the anticipated position of the gore point, based on the RGB camera point-of-view (POV), in the corresponding average depth map, the gore point estimated depth would be approximately 75 meters.

Although a lamp post and three vehicles in closest proximity to the position of the camera are identifiable in the depth map (Fig. 3: middle panel), vehicles in the field of view at greater distances clearly visible in the RGB image (Fig. 3: upper panel), in the average depth map (Fig. 3: middle panel) are neither visually discernable, nor exhibit identifiable edges indicative of physical objects. Similar to Figure 2, the lower panel of Figure 3 exhibits broad horizontal bands of high variance in regions surrounding the upper boundary of the horizon.

Figure 4 shows a test image of the same highway segment as displayed in Figure 3, as the vehicle advances closer to the physical barrier for which, at approximately 50 meters (i.e., 164 feet), the physical lane barrier becomes slightly discernable in the depth map. A typical highway cruising speed of 65 miles per hour (equivalent to approximately 95 feet/second) implies approximately one and three-quarter seconds (i.e., ~1.73 seconds) for a camera-based collision avoidance system to reliably detect the physical barrier and modify the vehicle trajectory accordingly. This presents a dilemma zone tradeoff indicative of an unintended consequence which might involve an inadequate system response, i.e., over- or under- correction of the vehicle trajectory within the allotted time.

In Figure 5, as the vehicle approaches within 25 meters of the gore point lane barrier, although the lane barrier is much more visually identifiable in the RGB image (Fig. 5: upper panel) compared to the previous figures, the average estimate is less than 25 meters (~82 feet) which at 65 mph (~95 feet/second), the dilemma zone time interval for a camera-based collision avoidance system is now limited to a fraction of a second (i.e., ~0.86 seconds) to reliably detect the lane barrier as an identifiable object and modify the vehicle trajectory accordingly.

The scenario as described in Figures 3 to 5 anecdotally illustrates an unintended consequence symptomatic of typical dilemma zone tradeoffs which pervade rural and highway driving situations, in which velocity, time and distance interrelationships compound high degrees of depth estimate uncertainty with dilemma zone response time constraints for the ADAS.

The examples in Figures 1 to 5 are illustrative of many instances in which corresponding variance estimates tend to be orders of magnitude greater than average depths estimated for respective pixels, which in the absence of other mitigants might foreseeably impair reliability of camera-only ADAS. These dilemma zone tradeoffs exhibit characteristic patterns of spatial and temporal properties of relative proximity, distance, scale, and velocity which systematically differ from those exhibited across urban environments (to be addressed in the forthcoming second installment).

Summation: Unintended Consequences of Unmitigated Technical Limits for Safety-Critical Automotive Vision Systems as a Single Point of Failure in Suburban Road and Highway Scenarios

As illustrated by the aforementioned examples comprising this case study, unintended consequences related to unmitigated uncertainty and error propagation (with corresponding technical debt) pose health and safety risks with adverse repercussions for product liability across the supply chain for manufacturers, retailers, and lessors, as well as for insurers and reinsurers. By many accounts, such instances are becoming more pervasive.

This first of two installments discusses illustrative technical tradeoffs, discrepancies, and limitations of depth estimation by camera-only ADAS algorithms as single points of failure, particularly related to uncertainty at spatial and temporal scales, as well as across operating and environmental conditions (e.g., lighting, relative geometry, and object proximity) generally applicable to highway, rural and suburban settings. The forthcoming second of the two installments will discuss the comparable technical tradeoffs, discrepancies, and limitations of depth estimation by camera-only ADAS algorithms as single points of failure, particularly related to uncertainty at spatial and temporal scales and conditions applicable to more crowded urban settings. In addition to addressing RGB-D limitations for urban scenarios, the forthcoming second installment will introduce and discuss forensic signal processing applied to uncertainty and error propagation diagnostics, in the context of risk mitigation and liability analysis.

These highlighted instances are indicative of pervasive dilemma zone tradeoffs, symptomatic of associated risk and liability exposure from undiagnosed and/or unmitigated technical limitations of safety-critical RGB-D algorithm-based camera systems as a single point of failure. The primary emphasis has been to illustrate how undiagnosed uncertainty might result in RGB-D anomalies with corresponding risks of unintended consequences, due to erroneous depth estimation in the absence of mitigating redundancy with a particular focus on unintended consequences for automated driver assistance, e.g., lane following, collision avoidance and vehicular navigation. As a primer describing fundamental principles underlying RGB machine vision and RGB-D estimation applicable to technical risk and liability mitigation practices for these safety-critical automotive systems, each example has discussed a critical role for uncertainty quantification and error diagnostics methods which may bear relevance for forensic statistical and economic analysis with public policy, regulatory, judicial, and legislative implications.

NTSB chair Jennifer Homendy has characterized instances of such inadequately mitigated risk of unintended consequences as, “the Wild West on our roads right now,” who according to a Bloomberg article has described the deployment and marketing by Tesla of Autopilot and Full Self-Driving (FSD)—another feature which contrary to its name does not enable fully autonomous driving—“as artificial-intelligence experiments using untrained operators of 5,000-pound vehicles and “a disaster waiting to happen.”

A federal lawsuit filed for an unspecified amount of damages from Tesla for liability, negligence and breach of warranty involving allegations of unintended sudden acceleration by a 2017 Tesla Model S sedan on autopilot mode approaching a highway off ramp, subsequently departing the roadway and colliding with tree, asserts the driver as being fully engaged and describes the Tesla ADAS as “at best a work in progress”. Also, in what is apparently the first case involving felony charges for use of a partially automated driving system, Los Angeles County prosecutors filed manslaughter charges against the driver of a Tesla Model S that ran a red light while on Autopilot in 2019, crashing into a Honda Civic and killing two people.

On July 13, 2022, Bloomberg announced the departure after a prolonged (four month) sabbatical of a senior executive leading the Tesla Autopilot initiative, coinciding with reduction of approximately 200 data-annotation staffers and closure of an office in San Mateo, continuing a persistent pattern of turnover in leadership of the Tesla Autopilot Group. This departure also coincides with NHTSA initiating a special investigation into a Florida accident involving the fatality of a 66-year-old Tesla driver and a 67-year-old passenger in which a 2015 Tesla rear-ended a parked tractor-trailer in the Gainesville area at a rest area off Interstate 75, according to the Florida Highway Patrol, as well as a NHTSA special investigation into a fatal pedestrian crash in California involving a 2018 Tesla Model 3 in which the ADAS was suspected of having been in use. According to Reuters, NHTSA, which typically initiates more than 100 special crash investigations annually into emerging technologies and other potential auto safety issues, and previously helping to develop safety rules on air bags, has initiated 36 special crash investigations since 2016 (including the California incident), in which the Tesla Autopilot ADAS has been suspected, in which a total of 17 collision fatalities have been reported.

According to a June 2022 account in Fortune, the Tesla partially automated driving systems may be closer to an imposed product recall due to safety defects corresponding to increased scrutiny, as NHTSA upgraded its investigation to an engineering analysis of the Tesla fleet of 830,000 Model Y, X, S, and 3 vehicles since the inception of the 2014 model year. As the final stage of an investigation, in most cases NHTSA decides within a year whether or not there should be a recall or the probe should be closed. According to Fortune, NHTSA documents expressed concerns regarding “serious issues about Tesla’s Autopilot system’, indicating misuse “in areas where its capabilities are limited, and that many drivers aren’t taking action to avoid crashes despite warnings from the vehicle.”

The forensics as highlighted by this case study exhibits relevance to the analysis of foreseeability, causal attribution, and apportionment of damages for such matters. In his doctrinal and theoretical legal analysis of product liability in this context, Wendel states (emphasis added):

“Complex systems involving interactions between machines and human operators pose a challenge to the usual approach of torts and products liability law of focusing on ex ante incentive creation because of the dynamic relationship between, on the one hand, features of the product’s design and information provided by the manufacturer that are sensitive to patterns of foreseeable user error, and, on the other hand, user behavior that is shaped by the product’s design and the information provided by the manufacturer. The doctrinal and theoretical analysis developed in the context of the 737-Max accidents will be important in other emerging technology contexts, such as the liability exposure of manufacturers and drivers of semi-autonomous cars.”

As asserted by Wendel, among other legal scholars, pervasive risk of unintended consequences with the proliferation of algorithmic safety-critical systems might present an emerging legal frontier entailing extensive economic exposure and forensic statistical reliability analysis of such engineered cyberphysical systems. The subsequent installment will further explore the forensic statistical analysis of depth estimation and uncertainty diagnostics within densely crowded urban environments.

    Authors