Extending Conventional LiDAR Metrics to Better Evaluate Advanced Sensor Systems
By Luis Dussan, Blair LaCorte, Barry Behnken, and Allan Steinhardt
As the autonomous vehicle market matures, sensor and perception engineers have become increasingly sophisticated in how they evaluate system efficiency, reliability, and performance. Many industry leaders have recognized that conventional metrics for LiDAR data collection (such as frame rate, full frame resolution, and detection range) currently used for evaluating performance no longer adequately measure the effectiveness of sensors to solve real-world use cases that underlie autonomous driving.
First generation LiDAR sensors passively search a scene and detect objects using background patterns that are fixed in both time (no ability to enhance with a faster revisit) and in space (no ability to apply extra resolution to high interest areas like the road surface or intersections). A new class of advanced solid-state LiDAR sensors enable intelligent information capture that expands their capabilities and moves from passive “search” or detection of objects to active search, and in many cases to the actual acquisition and classification attributes of objects in real time.
Because these early generation LiDARs used these fixed raster scans, the industry adopted very simplistic performance metrics that did not capture all the nuances of the sensor requirements needed to enable AVs. In response, many industry leaders including AEye, are proposing the consideration of three new corresponding metrics for extending LiDAR evaluation. Specifically: extending the metric of frame rate to include intra-frame object revisit rate; extending the metric of resolution to capture instantaneous enhanced resolution; and enhancing detection range to reflect the more critically important object classification range.
We are proposing that these new metrics be used in conjunction with existing measurements of basic camera, radar, and passive LiDAR performance as they measure a sensor’s ability to intelligently enhance perception and create a more complete evaluation of a sensor system’s efficacy in improving the safety and performance in real-world scenarios.
Our industry has leveraged proven frameworks from advanced robotic vision research and applied them to LiDAR-specific product architectures. One that has proven to be both versatile and instructive has been work around object identification that connects search, acquisition (or classification), and action.
- Search is the ability to detect any and all objects without the risk of missing anything.
- Acquire is defined as the ability to take a search detection and enhance the understanding of an object’s attributes to accelerate classification and determine possible intent (this could be by classifying object type or by calculating velocity).
- Act defines an appropriate sensor response as trained or as recommended by the vehicle’s perception system or domain controller. Responses can largely fall into four categories:
- Continue scan for new objects (no enhanced information needed)
- Continue scan but also interrogate the object further and gather more information on an acquired object’s attributes to enable classification
- Continue scan but also continue to track an object classified as currently non-threatening
- Continue scan but the control system is going to take evasive action.
Within this framework, performance specifications and system effectiveness need to be assessed with an “eye” firmly on the ultimate objective: completely safe operation of the vehicle. However, as most LiDAR systems today are passive, they are only capable of basic search. Therefore, conventional metrics used for evaluating these systems’ performance relate to basic object detection capabilities – frame rate, resolution, and detection range. If safety is the ultimate goal, then search needs to be more intelligent and acquisition (and classification) done more quickly and accurately so that the sensor or the vehicle can determine how to act immediately.
Rethinking the Metrics
Makers of automotive LiDAR systems are frequently asked about their frame rate, and whether or not their technology has the ability to detect objects with 10 percent reflectivity at some range (often 230 meters). We believe these benchmarks are required, but insufficient as they don’t capture critical details such as the size of the target, speed at which it needs to be detected and recognized, or the cost of collecting that information. We believe it would be productive for the industry to adopt a more holistic approach when it comes to assessing LiDAR systems for automotive use. Additionally, we make the argument that we must look at metrics as they relate to a perception system in general, rather than as an individual point sensor and ask ourselves: “What information would enable a perception system to make better, faster decisions?” Below, we have outlined the three conventional LiDAR metrics with recommendations on how to extend these metrics.
Conventional Metric #1: Frame Rate of 10Hz – 20Hz
Extended Metric: Object Revisit Rate
(The time between two shots at the same point or set of points)
Defining single point detection range alone is insufficient for sensor detection because a single interrogation point (shot) rarely delivers sufficient confidence – it is only suggestive. Therefore, passive LiDAR systems need multiple interrogation/detects at the same point or multiple interrogations/detects on the same object to validate an object or scene. In passive LiDAR systems, the time it takes to detect an object is dependent on many variables, such as distance, interrogation pattern and resolution, reflectivity, or the shape of the objects to interrogate, and can “traditionally” take several full frames to achieve.
A key factor that is missing from the conventional metric is a finer definition of time. Thus, we propose that Object Revisit Rate becomes a new, more refined metric for automotive LiDAR because an agile LiDAR, such as AEye’s iDAR, has the ability to revisit an object within the same frame. The time between the first and second measurement of an object and is critical, as shorter object revisit times can help keep processing times low for advanced algorithms that need to correlate between multiple moving objects in a scene. The best algorithms used to associate/correlate multiple moving objects can be confused when time elapsed between samples is high. This lengthy combined processing time or latency is a primary issue for the industry.
The agile AEye iDAR platform accelerates revisit rate by allowing for intelligent shot scheduling within a frame. Not only can iDAR interrogate a position or object multiple times within a conventional frame, it can maintain a background search pattern while overlaying additional intelligent shots within the same frame. For example, an
iDAR sensor can schedule two repeated shots on a point of interest in quick succession (30μsec). These multiple interrogations can then be contextually integrated with the needs of the user (either human or computer) to increase confidence, reduce latency, or extend ranging performance.
These interrogations can also be data dependent. For example, an object can be revisited if a (low confidence) detection occurs, and it is desirable to quickly validate or reject, enabled with secondary data and measurement, as seen in Figure 1. A typical completive full frame rate (traditional classic) for conventional sensors is approximately 10Hz, or 100 msec. This is also, for said conventional sensors, equivalent to the “object revisit rate.” With AEye’s flexible iDAR technology, the object revisit rate is now different from the frame rate and it can be as low as 10s of microseconds between revisits to key points/objects as the user/host requires – easily 100x to 1000x faster than alternative fixed scan sensors.
Figure 1. Advanced Agile LiDAR Sensors enable intelligent scan patterns such as the “Foveation in Time” Intra-Frame Revisit Interval and random scan pattern of iDAR (B) compared to Revisit Interval on a typical fixed pattern LiDAR (A)
What this means is that a perception engineering team using dynamic object revisit capabilities can create a perception system that is at least an order of magnitude faster than what can be delivered by conventional LiDAR without disrupting the background scan patterns. We believe this capability is invaluable in delivering level 4/5 autonomy as the vehicle will need to handle significantly complex corner cases, such as identifying a pedestrian next to oncoming headlights or a semi-trailer laterally crossing the path of the vehicle.
Within the “Search, Acquire, and Act” framework, an accelerated object revisit rate, therefore, allows for faster acquisition because it can identify and automatically revisit an object, painting a more complete picture of it within the context of the scene. Ultimately, this allows for collection of object classification attributes in the sensor, as well as efficient and effective interrogation and tracking of a potential threat.
Use Case: Head-On Detection
When you’re driving, the world can change dramatically in a tenth of a second. In fact, two cars traveling towards each other at 100 kph are 5.5 meters closer to each other after 0.1 seconds. By having an accelerated revisit rate, we increase the likelihood of hitting the same target with a subsequent shot due to the decreased likelihood that the target has moved significantly in the time between shots. This helps the user solve the “Correspondence Problem” (determining which parts of one “snapshot” of a dynamic scene correspond to which parts of another snapshot of the same scene), while simultaneously enabling the user to quickly build statistical measures of confidence and generate aggregate information that downstream processors might require (such as object velocity and acceleration). The ability to selectively increase revisit rate on points of interest while lowering the revisit rate in sparse areas like the sky can significantly aid higher level inferencing algorithms, allowing perception and path planning systems to more quickly determine optimum autonomous decision making.
Use Case: Lateral Detection
A vehicle entering a scene laterally is the most difficult to track. Even Doppler Radar has a difficult time with this scenario. However, selectively allocating shots to extract velocity and acceleration when detections have occurred (part of the acquisition chain) vastly reduces the required number of shots per frame. Adding a second detection, via iDAR, to build a velocity estimate on each object detection increases the overall number of shots by only 1%, whereas obtaining velocity everywhere with a fixed scan system doubles the required shots (100%, i.e., 2x increase). This speed and shot saliency makes autonomous driving much safer because it eliminates ambiguity and allows for more efficient use of processing resources.
The AEye Advantage
Whereas other LiDAR systems are limited by the physics of fixed laser pulse energy, fixed dwell time, and fixed scan patterns, iDAR is a software definable system that allows perception, path and motion planning modules to dynamically customize their data collection strategy to best suit their information processing needs at design time and/or run time.
Starting with a unique bore-sighted design that eliminates parallax between the camera and the LiDAR brings iDAR extremely close to solving the Correspondence Problem. AEye’s software agility allows iDAR to push the limits of physics in a tailored (as opposed to a static, one-time) fashion. The achievable object revisit rate of AEye’s iDAR system for points of interest (not merely the exact point just visited) is microseconds to a few milliseconds – which can be up to 3000x faster, compared to conventional LiDAR systems that require many tens or hundreds of milliseconds between revisits. This gives the unprecedented ability to calculate valuable attributes such as object velocity (both lateral and radial) faster than any other system, allowing the vehicle to act more readily to immediate threats and track them through time and space more accurately.
This ability to define the new metric, Object Revisit Rate, which is decoupled from the traditional “frame rate,” is important also for the next metric we introduce. This second metric helps to segregate the basic idea of “search” algorithms from “acquisition” algorithms: two algorithm types that should never be confused. Separation of these two basic types of algorithms provides insight into the heart of iDAR, which is the Principle of Information Quality (as opposed to Data Quantity): “more information, less data.”
Conventional Metric #2: Fixed (Angular) Resolution Over a Fixed Field-of-View
Extended Metric: Instantaneous (Angular) Resolution
(The degree to which a LiDAR sensor can apply additional resolution
to key areas within a frame)
The assumption behind the use of resolution as a conventional metric is that it is assumed the Field-of-View will be scanned with a constant pattern and uniform power. This makes perfect sense for less intelligent traditional sensors that have limited or no ability to adapt their collection capabilities. Additionally, the conventional metric assumes that salient information resident within the scene is uniform in space and time, which we know is not true. This is especially apparent in a moving vehicle at speed. However, because of these assumptions, conventional LiDAR systems indiscriminately collect gigabytes of data from a vehicle’s surroundings, sending those inputs to the CPU for decimation and interpretation (wherein an estimated 70 to 90 percent of this data is found to be useless or redundant, and thrown out). In addition, these systems apply the same level of power everywhere such that the sky is scanned at the same power as an object directly in the path of the vehicle. It’s an incredibly inefficient process.
As humans, we don’t “take in” everything around us equally. Rather, the visual cortex filters out irrelevant information, such as an airplane flying overhead, while simultaneously (not serially) focusing our eyes on a particular point of interest. Focusing on a point of interest allows other, less important objects to be pushed to the periphery. This is called foveation, where the target of our gaze is allotted a higher concentration of retinal cones, thus, allowing it to be seen more vividly.
iDAR uses biomimicry (see AEye white paper, The Future of Autonomous Vehicles: Part I – Think Like a Robot, Perceive Like a Human) to improve on the human visual cortex. Whereas humans typically only foveate on one area, iDAR can do this on multiple areas simultaneously and in multiple ways while also maintaining a background scan to assure it never misses new objects. We describe this capability as a Region of Interest (ROI). Furthermore, since humans rely entirely on light from the sun, moon, or artificial lighting, human foveation is “receive only,” i.e., passive. iDAR, in contrast, foveates on both transmit (regions that the laser light chooses to “paint”) and receive (where/ when the processing chooses to focus).
An example of this follows.
Figure 2 below shows two squares, Square A and Square B. Both squares have a similar number of shot points within them. Square A represents a uniform scan pattern, typical of conventional LiDAR sensors. These fixed scan patterns produce a fixed frame rate with no concept of an ROI. Square B shows an adjusted, unfixed scan pattern. As we can see, the shots in Square B are gathered more densely within and around the ROI (the small box) with the square. You can also see in Square B that the background scan continues to search to ensure no new objects are missed, while focusing additional resolution on a fixed area to aid in acquisition. In essence, it is using intelligence to optimize use of power and shots.
Looking at the graphs associated with Squares A and B, we see that, additionally, the unfixed scan pattern of Square B is able to produce revisits to an ROI within a much shorter interval than Square A. Square B can not only complete one ROI revisit interval, but multiple ROIs within a single frame, whereas, Square A cannot complete even one revisit. iDAR does what conventional LiDAR cannot: it enables dynamic perception, allowing the system to focus in on, and gather more comprehensive data about, a particular Region of Interest at unprecedented speed.
Figure 2. Region of Interest (ROI) Revisit Rate and foveation of iDAR (B) compared to conventional scan patterns (A)
Within the “Search, Acquire, and Act” framework, instantaneous resolution allows the iDAR system to search an entire scene and acquire multiple targets, capturing additional information about them. iDAR allows for the creation of multiple simultaneous Regions of Interest (ROI) within a scene, allowing the system to focus and gather more comprehensive data about specific objects, enabling the system to interrogate them more completely and track them more effectively.
Use Case: Object Interrogation
When objects of interest have been identified, iDAR can “foveate” its scanning to gather more useful information about them and acquire additional classification attributes. For example, let’s say the system encounters a fast pedestrian who is jaywalking across the street directly in the path of the vehicle. Because iDAR enables a dynamic change in both temporal and spatial sampling density within a Region of Interest (Instantaneous Resolution), the system can focus more of its attention on this jaywalker, and less on irrelevant information, such as parked vehicles along the side of the road (which iDAR has already identified long ago and is simply tracking). Regions of Interest allow iDAR to quickly, efficiently, and accurately identify critical information about the jaywalker. The iDAR system provides the most useful, actionable data to the domain controller to help determine the best timely course of action.
We see Instantaneous Resolution being utilized in three primary ways to address different use cases.
- Fixed Region of Interest (ROI): Today passive systems can allocate more density (lines of scan) at the horizon – a very simple foveation technique driven by their limited control over frequency, placement and power within a frame. With second generation systems, like iDAR, that enable Instantaneous Resolution, an OEM or Tier 1 will be able to utilize advanced simulation programs to test hundreds (or even thousands) of shot patterns – varying speed, power, and other constraints to identify an optimal pattern that integrates a fixed ROI with higher Instantaneous Resolution to achieve their desired results. For example, a fixed ROI could be used to optimize the shot pattern of a unit behind a windshield with varying rakes or in urban environments, where threats are more likely to come from the side of the road – car doors opening, pedestrians, cross traffic, etc. – or in the immediate path of the vehicle. An ROI can be defined that focuses for both sides of the road and the road surface at a fixed distance in front of the vehicle (See Figure 3B) instantly providing superior resolution (both vertical and horizontal) in the area of greatest concern. Once a pattern is approved it can be fixed for functional safety.
- Triggered ROI: Triggered ROI can only be done with an intelligent system that can be programmed to accept a trigger. The perception software team may determine that when certain conditions are met, an ROI is generated within the existing scan pattern. For example, a mapping or navigation system might signal that you are approaching an intersection, which generates an appropriately targeted ROI on key areas of the scene with greater detail (See Figure 3C).
- Dynamic ROI: The highest level of intelligence requires a feedback loop and AI, and is the same technique and methodology deployed by Automatic Targeting Systems (ATS) to continuously interrogate objects of high interest over time. As these objects move closer or further away, the size and density of the ROI is varied. For example, pedestrians, bicyclists, vehicles, or other objects moving in the scene can be detected and a Dynamic ROI automatically applied to track their movements (See Figure 3D).
Figure 3. Figure 3A shows a scene as a vehicle approaches an intersection. Figure 3B shows a Fixed Region of Interest (ROI) to the sides of the road and the area immediately in front of the vehicle. Figure 3C shows a Triggered ROI where the navigation system triggered specific ROIs as the vehicle approached the intersection. Figure 3D shows a Dynamic ROI where several objects of interest have been detected and are being tracked as they move through the scene.
The AEye Advantage
A major advantage of iDAR is that it is agile in nature, meaning that the main parameters do not have to be fixed, and therefore, it can take advantage of concepts like time multiplexing. It can actually trade off temporal sampling resolution, spatial sampling resolution, and even range simultaneously at multiple points in the “frame” for any of the other two. This allows the system to have tremendous value in perception and do some amazing things that no other system can, such as allows the user to dynamically change the angular density over the entire Field-of-View enabling the robust collection of useful, actionable information.
In a conventional LiDAR system, there is (i) a fixed Field-of-View and (ii) a fixed uniform or patterned sampling density, choreographed to (iii) a fixed laser shot schedule. AEye’s technology allows for these three parameters to vary almost independently. This leads to an almost endless stream of potential innovations and will be the topic of a later paper.
Instantaneous Resolution is required to convey that resolution is not something dictated by physical constraints alone, such as beam divergence, or number of shots per second. Rather, it is determined by starting with a faster, more efficient agile LiDAR and then intelligently optimizing resources. The ability to instantaneously increase resolution is a critical enabler in the next metric we introduce.
Conventional Metric #3: Object Detection Range
Extended Metric: Object Classification Range
(Range at which you have sufficient data to classify an object)
When it comes to measuring how well automotive LiDAR systems perceive the space around them, manufacturers commonly agree that it is valuable to determine their detection range. To optimize safety, the on-board computer system should detect obstacles as far ahead as possible. The speed with which they can do so theoretically determines whether control systems can plan and perform timely, evasive maneuvers. (This is how the earlier example of 230M at 10% reflectivity was calculated.) However, AEye believes that detection range is a required, but insufficient measurement in this scenario. Ultimately, it’s the control system’s ability to classify an object (here we refer to low level classification [e.g., blob plus dimensionality]) that enables it to decide on a basic course of action.
What matters most then, is not only how quickly an object can be detected, but how quickly it can be identified and classified, a threat-level decision made, and an appropriate response calculated. A single point detection is indistinguishable from noise. Therefore, we will use a common industry definition for detection which involves persistence in adjacent shots per frame and/or across frames. For example, we might require 5 detects on an object per frame (5 points at the same range) and/or from frame-to-frame (1 single related point in 5 consecutive frames) to declare that a detection is a valid object. At 20Hz, that takes .25 seconds to define a simple detect.
Currently, classification typically takes place in the perception stack. It’s at this point that objects are labeled and eventually more clearly identified. This data is used to predict behavior patterns or trajectories. The more the sensor can provide classification attributes, the faster the perception system can confirm. AEye argues that a better measurement for assessing this critical automotive LiDAR capability is its ability to impact Object Classification Range. This metric reduces the unknowns – such as latency associated with noise suppression (e.g., N of M detections) – early in the perception stack, pinpointing the salient information.
As a relatively new field, the definition of how much data is necessary for classification in automotive LiDAR has not yet been defined. Thus, we propose that adopting perception standards used by video classification provides a valuable provisional definition. According to video standards, enabling classification begins with a 3×3 pixel grid of an object. Under this definition, an automotive LiDAR system might be assessed by how fast it is able to generate a high-quality, high-resolution 3×3 point cloud that enables the perception stack to comprehend objects and people in a scene.
Generating a 3×3 point cloud is a struggle for conventional LiDAR systems. While many systems tout an ability to manifest point clouds comprising half a million or more points in one second, there is a lack of uniformity in these images. These fixed angular sampling patterns can be difficult for classification routines because the domain controller has to grapple with half a million points per second that are, in many cases, out of balance with the resolution required for the critical sampling of the object in question. Such an askew “mish-mash” of points means it needs to do additional interpretation, putting extra strain on CPU resources.
In Figure 4, we compare this scenario (“Scan 1”) to one in which an AEye sensor (“Scan 2”), detects the object ahead. At this point we have only one detect, so we don’t have the repeated detections that are needed to have a confident, actionable, and confirmed target presence decision, as discussed above.
Figure 4. Packing a dense 3×3 grid around a detect allows the collection of more useful data and greatly speeds up classification. In “Scan 1” on the left we have a single detect on a vehicle. Rather than wait for the next frame to resample this vehicle (as is the traditional mode in LiDAR) we instead quickly form a dedicated dense ROI, as indicated in “Scan 2” on the right. This is done almost immediately after the initial single detect, and before completing the next scan.
Returning to the “Search, Acquire, Act” framework, once we have acquired a target and determined that it is valid (and is potentially encroaching on our planned path) we can allocate more shots to it and take action if need be. Alternatively, if we determine that the target is not an immediate threat, we can more fully interrogate the object for additional classification data or simply track it with a few shots per scan.
In summary, one can achieve low-level object detection at the sensor level by employing a dense 3×3 voxel grid every time a significant detection occurs in real time. This happens before the data is sent to the central controller, allowing for higher Instantaneous Resolution than a fixed pattern system can offer and, ultimately, faster object classification ranges when using video detection range analogies.
Use Case: Unprotected Left-Hand Turn
Different objects demand different responses. This is especially true in challenging driving scenarios such as an unprotected left-hand turn – especially when traversing across high-speed, oncoming traffic. Imagine an autonomous vehicle on a four-lane road with a speed limit of 100kph needing to make an unprotected left-hand turn across two lanes of traffic. In the oncoming traffic, one lane has a motorcycle and the other has a car. In this situation, object classification range is critical, as classifying one of the objects as a motorcycle at sufficient range would indicate that the AV should behave more cautiously in proceeding, as motorcycles are capable of traveling at higher speeds and can take more unpredictable paths.
Use Case: School Bus
The fundamental value of being able to classify objects at range is greatest in instances where the identity of the object defines a specific and immediate response from the vehicle. An excellent example of this is encountering a school bus full of children. The faster that object is classified specifically as a school bus, the faster the autonomous vehicle can initiate an appropriate protocol – slowing the vehicle and deploying other tools such as Instantaneous Resolution (Triggered ROIs) in areas around the school bus to immediately capture any movement of children toward the path of the vehicle. This would enable similarly specific responses for police cars, ambulances, fire trucks, or any vehicle whose presence would require the autonomous vehicle to alter how it interrogates the scene and/or change its speed or path.
The AEye Advantage
LiDAR sensors embedded with AI for perception are very different from those that passively collect data. AEye’s agile system can acquire targets and enable classification in far less time than conventional LiDAR systems would require to merely register a detection. With the ability to modulate revisit rate up to 3000x faster in a frame, we no longer focus on detection alone: it is now more important to gauge speed of acquisition (i.e., classification range). This brings to light the difference between detection range and basic object classification range.
Assuming that metrics like detection range are used for accurately scoring how LiDAR systems contribute to autonomous vehicle safety, then evaluators should also consider how long it takes these systems to identify hazards. Thus, Object Classification Range is a far more meaningful metric.
In this white paper, we have discussed why reducing the time between object detections within the same frame is critical. As capturing multiple detects at the same point/object are required to fully comprehend an object or scene, measuring object revisit rate is a more critical metric for automotive LiDAR than frame rate.
Additionally, we have argued that quantifying (angular) resolution is insufficient. It is important to further quantify instantaneous resolution because intelligent, agile resolution is more efficient and provides greater safety through faster response times, especially when pairing ROIs with convolutional neural networks (a future paper).
Last, we have shown the criticality of moving from measuring a basic detection range to measuring how quickly an object can be identified and classified. It is not simply enough to quantify a distance at which a potential object can be detected at the sensor. One must also quantify the latency from the actual event to the sensor detection – plus the latency from the sensor detection to the CPU decision. Under this framework, the more attributes a LiDAR system can provide, the faster a perception system can classify.
While groundbreaking in its time, earlier LiDAR sensors passively search with scan patterns that are fixed in both time and space. A new generation of intelligent sensors extends their capabilities and moves from passive detection of objects to active search and identification of classification attributes of objects in real time. As perception technology and sensor systems evolve, it is imperative that metrics used to measure their capabilities also evolve.
The agile iDAR system enables the type of toolkit that reduces latency and bandwidth in a dramatic way. It allows for dynamic “Search, Acquire, Act” functions to be implemented at the sensor level, mimicking the process of human perception: new objects can be detected efficiently, classified with multiple supporting sensors, acted upon if the object is perceived as an immediate threat, fully interrogated for more information about the object, or tracked with real-time data. Therefore, equipped with iDAR, an autonomous vehicle can spot hazards sooner and respond more quickly and accurately than other sensor systems. This avoids accidents and gives credibility to the safety promise of self-driving vehicles.
With safety of paramount importance, these extended metrics should not only indicate what LiDAR systems are capable of achieving, but also how those capabilities bring vehicles closer to optimal safety conditions in real-world driving scenarios.
Throughout this series of white papers, AEye will continue to propose new, interconnected metrics that build on each other to help create a more complete and accurate picture of what makes a LiDAR system effective.