cLLv2 / README.md

Adilbai

Update README.md

59b128b verified 3 months ago

preview code

raw

history blame contribute delete

4.6 kB

metadata

tags:
  - LunarLander-v2
  - ppo
  - deep-reinforcement-learning
  - reinforcement-learning
  - custom-implementation
model-index:
  - name: PPO
    results:
      - task:
          type: reinforcement-learning
          name: reinforcement-learning
        dataset:
          name: LunarLander-v2
          type: LunarLander-v2
        metrics:
          - type: mean_reward
            value: '-113.57 +/- 74.63'
            name: mean_reward
            verified: false

PPO Agent for LunarLander-v2

Model Description

This is a Proximal Policy Optimization (PPO) agent trained to play the LunarLander-v2 environment from OpenAI Gym. The model was trained using a custom PyTorch implementation of the PPO algorithm.

Model Details

Model Type: Reinforcement Learning Agent (PPO)
Architecture: Actor-Critic Neural Network
Framework: PyTorch
Environment: LunarLander-v2 (OpenAI Gym)
Algorithm: Proximal Policy Optimization (PPO)
Training Library: Custom PyTorch implementation

Training Details

Hyperparameters

Parameter	Value
Total Timesteps	50,000
Learning Rate	0.00025
Number of Environments	4
Steps per Environment	128
Batch Size	512
Minibatch Size	128
Number of Minibatches	4
Update Epochs	4
Discount Factor (γ)	0.99
GAE Lambda (λ)	0.95
Clip Coefficient	0.2
Value Function Coefficient	0.5
Entropy Coefficient	0.01
Max Gradient Norm	0.5

Training Configuration

Seed: 1 (for reproducibility)
Device: CUDA enabled
Learning Rate Annealing: Enabled
Generalized Advantage Estimation (GAE): Enabled
Advantage Normalization: Enabled
Value Loss Clipping: Enabled

Performance

Evaluation Results

Environment: LunarLander-v2
Mean Reward: -113.57 ± 74.63

The agent achieves a mean reward of -113.57 with a standard deviation of 74.63 over evaluation episodes.

Usage

This model can be used for:

Reinforcement learning research and experimentation
Educational purposes to understand PPO implementation
Baseline comparison for LunarLander-v2 experiments
Fine-tuning starting point for similar control tasks

Technical Implementation

Architecture Details

The model uses an Actor-Critic architecture implemented in PyTorch:

Actor Network: Outputs action probabilities for the discrete action space
Critic Network: Estimates state values for advantage computation
Shared Features: Common feature extraction layers (if applicable)

PPO Algorithm Features

Clipped Surrogate Objective: Prevents large policy updates
Value Function Clipping: Stabilizes value function learning
Generalized Advantage Estimation: Reduces variance in advantage estimates
Multiple Epochs: Updates policy multiple times per batch of experience

Environment Information

LunarLander-v2 is a classic control task where an agent must learn to:

Land a lunar lander safely on a landing pad
Control thrust and rotation to manage descent
Balance fuel efficiency with landing accuracy
Handle continuous state space and discrete action space

Action Space: Discrete(4)

0: Do nothing
1: Fire left orientation engine
2: Fire main engine
3: Fire right orientation engine

Observation Space: Box(8) containing:

Position (x, y)
Velocity (x, y)
Angle and angular velocity
Left and right leg ground contact

Training Environment

Framework: Custom PyTorch PPO implementation
Parallel Environments: 4 concurrent environments for data collection
Total Training Time: 50,000 timesteps across all environments
Experience Collection: On-policy learning with trajectory batches

Limitations and Considerations

The model shows moderate performance with high variance in rewards
Training was limited to 50,000 timesteps, which may be insufficient for optimal performance
Performance may vary significantly across different episodes due to the stochastic nature of the environment
The model has not been tested on variations of the LunarLander environment

Citation

If you use this model in your research, please cite:

@misc{cllv2_ppo_lunarlander,
  author = {Adilbai},
  title = {PPO Agent for LunarLander-v2},
  year = {2024},
  publisher = {Hugging Face},
  url = {https://huggingface.co/Adilbai/cLLv2}
}

License

Please refer to the repository license for usage terms and conditions.

Contact

For questions or issues regarding this model, please open an issue in the model repository or contact the model author.