KRAMPUS KART: PPO

How are cars doing?

Cars: 0/0 | Leader: 0%

Avg Reward: 0 | Best: 0

How is learning going?

PPO updates: 0

Initializing...

Actor:

Critic:

What is PPO?

Actor = the policy (decides steering)

Critic = predicts future reward

Advantage = "was this action good?"

Car Colors = Episode Reward

🔴 Just spawned 🔵 Survived longer

Separate Networks (as paper recommends)

Line colors: red=negative, blue=positive | Each trained independently!

Avg Reward

PPO • TensorFlow.js •