cLLv2 / README.md
Adilbai's picture
Update README.md
59b128b verified
---
tags:
- LunarLander-v2
- ppo
- deep-reinforcement-learning
- reinforcement-learning
- custom-implementation
model-index:
- name: PPO
results:
- task:
type: reinforcement-learning
name: reinforcement-learning
dataset:
name: LunarLander-v2
type: LunarLander-v2
metrics:
- type: mean_reward
value: -113.57 +/- 74.63
name: mean_reward
verified: false
---
# PPO Agent for LunarLander-v2
## Model Description
This is a Proximal Policy Optimization (PPO) agent trained to play the LunarLander-v2 environment from OpenAI Gym. The model was trained using a custom PyTorch implementation of the PPO algorithm.
## Model Details
- **Model Type**: Reinforcement Learning Agent (PPO)
- **Architecture**: Actor-Critic Neural Network
- **Framework**: PyTorch
- **Environment**: LunarLander-v2 (OpenAI Gym)
- **Algorithm**: Proximal Policy Optimization (PPO)
- **Training Library**: Custom PyTorch implementation
## Training Details
### Hyperparameters
| Parameter | Value |
|-----------|-------|
| Total Timesteps | 50,000 |
| Learning Rate | 0.00025 |
| Number of Environments | 4 |
| Steps per Environment | 128 |
| Batch Size | 512 |
| Minibatch Size | 128 |
| Number of Minibatches | 4 |
| Update Epochs | 4 |
| Discount Factor (γ) | 0.99 |
| GAE Lambda (λ) | 0.95 |
| Clip Coefficient | 0.2 |
| Value Function Coefficient | 0.5 |
| Entropy Coefficient | 0.01 |
| Max Gradient Norm | 0.5 |
### Training Configuration
- **Seed**: 1 (for reproducibility)
- **Device**: CUDA enabled
- **Learning Rate Annealing**: Enabled
- **Generalized Advantage Estimation (GAE)**: Enabled
- **Advantage Normalization**: Enabled
- **Value Loss Clipping**: Enabled
## Performance
### Evaluation Results
- **Environment**: LunarLander-v2
- **Mean Reward**: -113.57 ± 74.63
The agent achieves a mean reward of -113.57 with a standard deviation of 74.63 over evaluation episodes.
## Usage
This model can be used for:
- Reinforcement learning research and experimentation
- Educational purposes to understand PPO implementation
- Baseline comparison for LunarLander-v2 experiments
- Fine-tuning starting point for similar control tasks
## Technical Implementation
### Architecture Details
The model uses an Actor-Critic architecture implemented in PyTorch:
- **Actor Network**: Outputs action probabilities for the discrete action space
- **Critic Network**: Estimates state values for advantage computation
- **Shared Features**: Common feature extraction layers (if applicable)
### PPO Algorithm Features
- **Clipped Surrogate Objective**: Prevents large policy updates
- **Value Function Clipping**: Stabilizes value function learning
- **Generalized Advantage Estimation**: Reduces variance in advantage estimates
- **Multiple Epochs**: Updates policy multiple times per batch of experience
## Environment Information
**LunarLander-v2** is a classic control task where an agent must learn to:
- Land a lunar lander safely on a landing pad
- Control thrust and rotation to manage descent
- Balance fuel efficiency with landing accuracy
- Handle continuous state space and discrete action space
**Action Space**: Discrete(4)
- 0: Do nothing
- 1: Fire left orientation engine
- 2: Fire main engine
- 3: Fire right orientation engine
**Observation Space**: Box(8) containing:
- Position (x, y)
- Velocity (x, y)
- Angle and angular velocity
- Left and right leg ground contact
## Training Environment
- **Framework**: Custom PyTorch PPO implementation
- **Parallel Environments**: 4 concurrent environments for data collection
- **Total Training Time**: 50,000 timesteps across all environments
- **Experience Collection**: On-policy learning with trajectory batches
## Limitations and Considerations
- The model shows moderate performance with high variance in rewards
- Training was limited to 50,000 timesteps, which may be insufficient for optimal performance
- Performance may vary significantly across different episodes due to the stochastic nature of the environment
- The model has not been tested on variations of the LunarLander environment
## Citation
If you use this model in your research, please cite:
```bibtex
@misc{cllv2_ppo_lunarlander,
author = {Adilbai},
title = {PPO Agent for LunarLander-v2},
year = {2024},
publisher = {Hugging Face},
url = {https://huggingface.co/Adilbai/cLLv2}
}
```
## License
Please refer to the repository license for usage terms and conditions.
## Contact
For questions or issues regarding this model, please open an issue in the model repository or contact the model author.