Skip to yearly menu bar Skip to main content


Poster

Making Large Language Models Better Planners with Reasoning-Decision Alignment

Zhijian Huang · Tao Tang · Shaoxiang Chen · Sihao Lin · Zequn Jie · Lin Ma · Guangrun Wang · Xiaodan Liang

# 161
Strong blind review: This paper was not made available on public preprint services during the review process Strong Double Blind
[ ] [ Project Page ] [ Paper PDF ]
Tue 1 Oct 1:30 a.m. PDT — 3:30 a.m. PDT

Abstract:

Data-driven approaches for autonomous driving (AD) have been widely adopted in the past decade but are confronted with dataset bias and uninterpretability. Inspired by the knowledge-driven nature of human driving, recent approaches explore the potential of large language models (LLMs) to improve understanding and decision-making in traffic scenarios.They find that the pretraining-finetuning paradigm of LLMs on downstream data with the Chain-of-Thought (CoT) reasoning process can enhance explainability and scene understanding. However, such a popular strategy proves to suffer from the notorious problems of misalignment between the crafted CoTs against the consequent decision-making, which remains untouched by previous LLM-based AD methods. To address this problem, we motivate an end-to-end decision-making model based on multimodality-augmented LLM, which simultaneously executes CoT reasoning and carries out planning results. Furthermore, we propose a reasoning-decision alignment constraint between the paired CoTs and planning results, imposing the correspondence between reasoning and decision-making. Moreover, we redesign the CoTs to enable the model to comprehend complex scenarios and enhance decision-making performance. We dub our proposed large language planners with reasoning-decision alignment as RDA-Driver. Experimental evaluations on the nuScenes and DriveLM-nuScnene benchmarks demonstrate the effectiveness of our RDA-Driver in enhancing the performance of end-to-end autonomous driving systems. Specifically, our RDA-Driver achieves state-of-the-art end-to-end planning performance on the NuScenes dataset with 0.80 L2 error and 0.32 collision rates, and also achieves leading results on challenging DriveLM-nuScnene benchmarks with 0.82 L2 error and 0.38 collision rate.

Chat is not available.