cLLv2 / README.md
Adilbai's picture
Update README.md
59b128b verified
metadata
tags:
  - LunarLander-v2
  - ppo
  - deep-reinforcement-learning
  - reinforcement-learning
  - custom-implementation
model-index:
  - name: PPO
    results:
      - task:
          type: reinforcement-learning
          name: reinforcement-learning
        dataset:
          name: LunarLander-v2
          type: LunarLander-v2
        metrics:
          - type: mean_reward
            value: '-113.57 +/- 74.63'
            name: mean_reward
            verified: false

PPO Agent for LunarLander-v2

Model Description

This is a Proximal Policy Optimization (PPO) agent trained to play the LunarLander-v2 environment from OpenAI Gym. The model was trained using a custom PyTorch implementation of the PPO algorithm.

Model Details

  • Model Type: Reinforcement Learning Agent (PPO)
  • Architecture: Actor-Critic Neural Network
  • Framework: PyTorch
  • Environment: LunarLander-v2 (OpenAI Gym)
  • Algorithm: Proximal Policy Optimization (PPO)
  • Training Library: Custom PyTorch implementation

Training Details

Hyperparameters

Parameter Value
Total Timesteps 50,000
Learning Rate 0.00025
Number of Environments 4
Steps per Environment 128
Batch Size 512
Minibatch Size 128
Number of Minibatches 4
Update Epochs 4
Discount Factor (γ) 0.99
GAE Lambda (λ) 0.95
Clip Coefficient 0.2
Value Function Coefficient 0.5
Entropy Coefficient 0.01
Max Gradient Norm 0.5

Training Configuration

  • Seed: 1 (for reproducibility)
  • Device: CUDA enabled
  • Learning Rate Annealing: Enabled
  • Generalized Advantage Estimation (GAE): Enabled
  • Advantage Normalization: Enabled
  • Value Loss Clipping: Enabled

Performance

Evaluation Results

  • Environment: LunarLander-v2
  • Mean Reward: -113.57 ± 74.63

The agent achieves a mean reward of -113.57 with a standard deviation of 74.63 over evaluation episodes.

Usage

This model can be used for:

  • Reinforcement learning research and experimentation
  • Educational purposes to understand PPO implementation
  • Baseline comparison for LunarLander-v2 experiments
  • Fine-tuning starting point for similar control tasks

Technical Implementation

Architecture Details

The model uses an Actor-Critic architecture implemented in PyTorch:

  • Actor Network: Outputs action probabilities for the discrete action space
  • Critic Network: Estimates state values for advantage computation
  • Shared Features: Common feature extraction layers (if applicable)

PPO Algorithm Features

  • Clipped Surrogate Objective: Prevents large policy updates
  • Value Function Clipping: Stabilizes value function learning
  • Generalized Advantage Estimation: Reduces variance in advantage estimates
  • Multiple Epochs: Updates policy multiple times per batch of experience

Environment Information

LunarLander-v2 is a classic control task where an agent must learn to:

  • Land a lunar lander safely on a landing pad
  • Control thrust and rotation to manage descent
  • Balance fuel efficiency with landing accuracy
  • Handle continuous state space and discrete action space

Action Space: Discrete(4)

  • 0: Do nothing
  • 1: Fire left orientation engine
  • 2: Fire main engine
  • 3: Fire right orientation engine

Observation Space: Box(8) containing:

  • Position (x, y)
  • Velocity (x, y)
  • Angle and angular velocity
  • Left and right leg ground contact

Training Environment

  • Framework: Custom PyTorch PPO implementation
  • Parallel Environments: 4 concurrent environments for data collection
  • Total Training Time: 50,000 timesteps across all environments
  • Experience Collection: On-policy learning with trajectory batches

Limitations and Considerations

  • The model shows moderate performance with high variance in rewards
  • Training was limited to 50,000 timesteps, which may be insufficient for optimal performance
  • Performance may vary significantly across different episodes due to the stochastic nature of the environment
  • The model has not been tested on variations of the LunarLander environment

Citation

If you use this model in your research, please cite:

@misc{cllv2_ppo_lunarlander,
  author = {Adilbai},
  title = {PPO Agent for LunarLander-v2},
  year = {2024},
  publisher = {Hugging Face},
  url = {https://huggingface.co/Adilbai/cLLv2}
}

License

Please refer to the repository license for usage terms and conditions.

Contact

For questions or issues regarding this model, please open an issue in the model repository or contact the model author.