PPO - Ant-v4 π
A Proximal Policy Optimization (PPO) agent trained with stable-baselines3 on the MuJoCo Ant-v4 environment.
| Details | |
|---|---|
| Environment | gymnasium==0.29 & mujoco==2.3 (Ant-v4) |
| Algorithm | PPO (stable-baselines3==2.3.0) |
| Timesteps | 100 000 |
| Policy | MlpPolicy (2 Γ 64 hidden, tanh) |
| Return (mean Β± std) | ~ 964 |
| Seed | 0 |
Hyper-parameters
{
"n_steps": 128,
"batch_size": 64,
"n_epochs": 20,
"gamma": 0.99,
"learning_rate": 3e-4,
"ent_coef": 0.0,
"clip_range": 0.2
}
- Downloads last month
- 35