Nov. 10 - 6 min read

Forecasting Part 1: Understanding Interaction

Forecasting Part 1: Understanding Interaction

It's tough to make predictions, especially about the future.

- Yogi Berra

By Drew Bagnell, Chief Scientist

The Cascaded Approach

One of the greatest challenges in self-driving is the interaction with other actors on the road — whether drivers, cyclists, or pedestrians. AVs would already likely be ubiquitous if these technical challenges of interaction were not so significant. 

The ability to forecast ¹ other actors well enables safe, interpretable, and responsive driving actions. Forecasting has traditionally been treated as a step to be cascaded after perception and before motion planning, providing key input to the decision-making sub-system.

Figure 1.

The traditional approach dates back to, and perhaps is best exemplified by, the pioneering Urban Challenge era self-driving vehicles and their progeny. 

Figure 2. Forecasting image from Carnegie Mellon’s Urban Challenge entry, showing both on-road and unstructured parking lot forecasts. Image courtesy of Aurora CEO and former Carnegie Mellon University Urban Challenge lead, Chris Urmson.

More recent works have expanded on this cascaded approach and allowed backpropagation between modules, particularly to improve the joint perception and forecasting stack. 

Here we see state-of-the-art cascaded forecasting systems with uncertainty estimates for multiple actors in a scene: 

Figure 3. Left: Example output of a cascaded model demonstrating forecasts with probabilistic uncertainty of an actor in the scene to aid decision making.² Right: Example predicted trajectory and goal paths by a cascaded model, demonstrating the need for multiple discrete modes to capture the kind of uncertainty that happens when forecasting actors at an intersection.³

These approaches demonstrate the importance of reasoning about multi-modality and uncertainty when generating forecasts for actors when there are multiple possible futures. However, the motion of other actors often depends crucially on the actions of the AV, and cascaded forecasting cannot reason about these interactions.

Aurora’s clean-sheet approach to self-driving has given us the space to identify a better way: integrating forecasting within our decision making architecture (i.e., motion planning) to enable reasoning that factors in the impact of the AV’s decisions on the motions of other actors.

Cascade Forecasting Failures

Let’s start by understanding what kinds of failures we’re likely to encounter in a cascaded system. Consider Figure 4, one of a wide variety of interactions where the AV’s decision affects another actor’s behavior. 

Figure 4.

During planning, the AV forward simulates what will happen when it makes decisions and estimates how good or bad those outcomes will be. A cascaded autonomy system will conclude that if it makes the turn, the actor “Alice” in the figure above is likely to rear-end it, even when in reality this would only happen if Alice were being truly reckless! 

Why does the AV think this? If the AV’s planner doesn’t reason that Alice will slow gently in response to the AV’s decision to pull out into traffic, it will incorrectly believe that a collision is inevitable and will thus decide that a legal and, in reality, very safe maneuver should be disallowed.

The result is that a naive implementation of a cascaded system will typically be ineffective in its decision making and unable to drive in normal traffic conditions. It will likely be less safe than a good driver because the actions it takes are surprising to other drivers on the road. Now, naturally, there are ways to patch up such forecasts post hoc, but it should make us begin to question, are forecasts actually useful when they don't consider the context of the AV's decisions? When we attempt to build a forecasting system that runs prior to, instead of within, the decision-making loop, what, exactly, is it learning to forecast?

Well, if it’s an algorithm based on data (it’s difficult to imagine building a modern AI system without a strong data-driven approach), then it must attempt to predict what other actors are likely to do in the context of the driver’s decisions when the data was collected. Now, of course, this means we have a chicken-or-the-egg feedback problem, and we must hope that the driver was an expert, or the predictions we have of other actors will be biased. E.g., if the AV is exposed to data collected with an unusually aggressive human driver behind the wheel, it’s likely that the AV would learn, incorrectly, that Alice, Bob, and other drivers will always move to accommodate it. Though in reality, at the time the data was collected, the other drivers were forced to behave unnaturally in response to aggressive actions.

Lost in the modes… 

Now, what this means is that any forecasts we have about other actors must also implicitly reason about the correct actions of an expert driver. This takes the motion planning system almost completely out of the driver’s seat and relegates it to trying to back-out what the forecasting system has decided already.⁴

Consider for a moment a scenario like that depicted here, where a good driver can choose to move in front of the merging actor Alice or, instead, to yield to her. Let’s say that for similar scenarios, our expert vehicle operator demonstrations might establish that 80% of the time the AV should choose to take the lead position in the merge while 20% of the time it should instead choose to yield. Our predictions, then, if doing a good job, will be multi-modal as shown below. 

Figure 5. Two vehicles (AV and Alice) approach a “merge point” at approximately the same time. The dashed vehicles represent a probability density for Alice’s position seconds into the future. This forecast is multimodal because the AV’s own actions include a discrete decision to attempt to merge in front of or yield to Alice. 

Multi-modal forecasts are common and critical for self-driving — consider reasoning about a car that might make a turn or go straight at an intersection as in Figure 3. But these two particular modes illustrated in Figure 5 are the direct result of there being multiple good options for the AV. That fact is completely obscured by providing only the “marginal” forecasts of the actors themselves. Imagine now what a planner would have to do with an impoverished representation relying on marginal forecasts: the planner foresees some significant probability of collision if it chooses either to yield or to go first. Both options look “expensive” — and unsafe — to the reasoning system, even though in reality both are reasonable choices if executed correctly. 

We can make a toy “matrix game” to capture what’s going on here. Let’s model a too-close interaction between the AV and Alice as “costing” 100⁵ and a proper interaction as the vehicles merge together as costing 0. The “real” situation leads to a good outcome when the actors coordinate while an apriori (cascaded) forecast makes all the reasonable decisions seem untenable. This leads to poor driving decisions. Good merging requires both decisive actions to signal intent and responsiveness to the other actors to fold together smoothly. But an AV with a cascaded forecasting system is forced to hedge indecisively against multiple outcomes that will not — and cannot — both happen.

Figure 6. Left: A mock game matrix showing interaction costs of possible actions in a merge situation for both the AV and for Alice. In reality, there are two equally good coordinated actions. If the AV decides to merge ahead of Alice and she coordinates and falls behind, both actors are reasonably happy with a cost we represent as zero. But if both the AV and Alice decide to take the lead in the merge, the result is catastrophically costly. Likewise if both decide to merge behind. Right: However, if the forecasting system has to average over both actions the AV could take as it doesn’t know what planning will choose — in this hypothetical instance, the forecasting system might produce a weighted average with a 80% probability of the AV choosing “in front” — both options look expensive. 

This reliance on marginal forecasts continues to be problematic for a very wide range of probabilities. Truncating the probabilities when they get small leads to over-confident behavior where the AV won’t correctly consider outcomes that are unlikely but important. There are cases where the AV can do valuable forecasting prior to planning — for instance, on a short time scale of less than, perhaps, two seconds, and for reasoning about intentions of other actors that are independent of the AV’s decisions. But it is precisely where our software would benefit most from understanding the likely positions of actors in the future — in the complex dance of interactions between the AV and other actors during tricky merges, lane changes, and interactions with pedestrians and cyclists — that the cascade approach breaks down. 

Figure 7a. A simulated example of an autonomous truck and another road actor reaching a merge point. In this instance, using cascaded, marginal forecasts, the autonomous system foresees two possible modes for the merging actor, and both seem necessary to avoid. So the AV decides to brake hard at the last moment to protect against both possibilities — even though the act of slowing makes the trailing mode of the probability distribution very unlikely.

Figure 7b. Top: In this instance the AV instead reasons about its two options and decides to merge behind the actor. The AV simulates slowing down before reaching the merge point, signaling its intent to the other actor. Bottom: Similarly, the AV reasons about its options and concludes that it can merge ahead of the other actor. The AV forward simulates speeding up at the merge point and the other actor is forecast to slow in response.

Interleaved Forecasting

So if a cascaded system is an engineering cul-de-sac, what works better? Interleaved forecasting. Here, the AV conditions forecasts on its future options using a form of causal reasoning⁶ to answer interventional questions: “If I were to make this decision, what are the possible — and probable — outcomes from other actors?” 

The AV can evaluate those conditional outcomes and choose the one it deems to be overall best for its own safety and progress, and for acting as an ideal citizen of the road

Now building such a system is complex and computationally demanding; it keeps our engineering and operations teams running hard. But it’s the right approach to deliver the benefits of self-driving safely, quickly, and broadly. Training our AV to reason about the network of interdependencies among all of the actors in a scene, including itself, empowers us to focus on the goal of making the correct decisions, rather than forecasting other agents’ marginal probabilities of future positions. We’re not interested in how accurately we can forecast another actor’s actions in the abstract, only in how it leads the Aurora Driver to make safe, expert-driver-like decisions. 

In future articles, we’ll dive deeper into our approach to learned interleaved forecasting and interactive decision-making. Stay tuned.

¹ At Aurora, we prefer the term "forecasting" to "prediction" — prediction is a generic term in statistics and machine learning referring to the output of a statistical model like linear regression or a deep neural network. We reserve forecasting to mean predictions of other actors’ future decisions, intentions, and motions.  

² Djuric N., et al. “MultiXNet: Multiclass Multistage Multimodal Motion Prediction,” arXiv preprint arXiv:2006.02000v4, 2021.

³ Zhang L., et al. “Map-Adaptive Goal-Based Trajectory Prediction,” arXiv preprint arXiv:2009.04450v2, 2020.

⁴ While we believe that an approach deeply rooted in machine learning is essential for self-driving, we contend that a structured application including engineered reasoning elements such as “guard-rails” and powerful optimization and search techniques is essential to deliver the Aurora Driver both safely and quickly. Current forecasting technology isn’t up to the task of driving the AV itself.

⁵ In some arbitrary unit used by the planning system, which we colloquially call “arunits” in honor of one of Aurora’s earliest employees.

⁶ "Causal" in the technical sense of Glymour and Pearl’s approach to causal modeling. 

Aurora Team
Aurora delivers the benefits of self-driving technology safely, quickly, and broadly

Be part of a team you’re proud of

Our values are more than lip service. They define who we are, set the tone for how we work as a team, and guide us through important decisions. At Aurora, we operate with integrity, set outrageous goals, and continue to build a culture where we win together—all without any jerks.