Skip to content

Commit 1d864db

Browse files
author
Daphne
committed
Fix sup tags.
1 parent cb12de1 commit 1d864db

1 file changed

Lines changed: 4 additions & 4 deletions

File tree

docs/src/pufferdrive-2.0.md

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,8 @@
11
# PufferDrive 2.0: A fast and friendly driving simulator for training and evaluating RL agents
22

3-
**Daphne Cornelisse**`<sup>`1\*`</sup>`, **Spencer Cheng**`<sup>`2\*`</sup>`, Pragnay Mandavilli`<sup>`1`</sup>`, Julian Hunt`<sup>`1`</sup>`, Kevin Joseph`<sup>`1`</sup>`, Waël Doulazmi`<sup>`3, 4`</sup>`, Valentin Charraut`<sup>`4`</sup>`, Aditya Gupta`<sup>`1`</sup>`, Eugene Vinitsky`<sup>`1`</sup>`
3+
**Daphne Cornelisse**<sup>1*</sup>, **Spencer Cheng**<sup>2*</sup>, Pragnay Mandavilli<sup>1</sup>, Julian Hunt<sup>1</sup>, Kevin Joseph<sup>1</sup>, Waël Doulazmi<sup>3, 4</sup>, Valentin Charraut<sup>4</sup>, Aditya Gupta<sup>1</sup>, Eugene Vinitsky<sup>1</sup>
44

5-
`<sup>`1`</sup>` Emerge Lab at NYU Tandon School of Engineering | `<sup>`2`</sup>` [Puffer.ai](https://puffer.ai/) | `<sup>`3`</sup>` Centre for Robotics, Mines Paris - PSL | `<sup>`4`</sup>` Valeo | `<sup>`\*`</sup>` Shared first contributor
5+
<sup>1</sup> Emerge Lab at NYU Tandon School of Engineering | <sup>2</sup> [Puffer.ai](https://puffer.ai/) | <sup>3</sup> Centre for Robotics, Mines Paris - PSL | <sup>4</sup> Valeo | <sup>*</sup> Shared first contributor
66

77
*December 30, 2025*
88

@@ -40,7 +40,7 @@
4040

4141
Deep reinforcement learning algorithms such as [PPO](https://arxiv.org/abs/1707.06347), work effectively in the billion-sample regime. With sufficient scale and occasional successes, RL can optimize well-defined objectives even under sparse reward signals.
4242

43-
This shifts the primary bottleneck to simulation. The rate at which high-quality experience can be generated _directly determines_ how reliably RL can be applied to challenging real-world problems, such as autonomous navigation in dynamic, multi-agent environments.`<sup>`[1](#notes)`</sup>`
43+
This shifts the primary bottleneck to simulation. The rate at which high-quality experience can be generated _directly determines_ how reliably RL can be applied to challenging real-world problems, such as autonomous navigation in dynamic, multi-agent environments.<sup>[1](#notes)</sup>
4444

4545
Over the past few years, we developed a sequence of data-driven, multi-agent simulators to study large-scale self-play for autonomous driving. Agents are trained from scratch. They generate their own experience by interacting with other agents in the environment and learn from it over time. In this post, we briefly summarize this progression and show how we arrived at PufferDrive 2.0.
4646

@@ -60,7 +60,7 @@ Later work explored what becomes possible once reaching scale is no longer a bot
6060
These results suggested that once simulation becomes cheap, self-play RL can produce robust autonomous driving policies.
6161

6262
![SPS comparison between sims](images/sim-comparison.png)
63-
**Figure 1:** _Progression of RL-based driving simulators. Left: end-to-end training throughput on an NVIDIA RTX 4080, counting only transitions collected by learning policy agents. Right: wall-clock time to reach 80 percent goal-reaching`<sup>`[2](#notes)`</sup>`. This captures both simulation speed and algorithmic efficiency._
63+
**Figure 1:** _Progression of RL-based driving simulators. Left: end-to-end training throughput on an NVIDIA RTX 4080, counting only transitions collected by learning policy agents. Right: wall-clock time to reach 80 percent goal-reaching<sup>[2](#notes)</sup>. This captures both simulation speed and algorithmic efficiency._
6464

6565
| Simulator | End-to-end training SPS | Time to 80% success rate |
6666
| ----------- | ----------------------- | ------------------------ |

0 commit comments

Comments
 (0)