Proximal Policy Optimization (PPO) Agent playing HalfCheetah-v5
This is a trained Proximal Policy Optimization (PPO) agent for the MuJoCo HalfCheetah-v5 environment.
Model Details
The model was trained using the code available here.
Usage
To load and use this model for inference:
import torch
import json
import gymnasium as gym
from agent import SimpleAgent
from environment import make_env
#Load the configuration
with open("config.json", "r") as f:
config = json.load(f)
env_id = config["env_id"]
hidden_dim = config["hidden_dim"]
# Create environment. Get action and space dimensions
env, state_size, action_size = make_env(
env_id,
render_mode="human",
)
# Instantiate the agent and load the trained policy network
agent = SimpleAgent(state_size, action_size, hidden_dim)
agent.policy.load_state_dict(torch.load("model.pt"))
# Enjoy the agent!
state, _ = env.reset()
done = False
while not done:
action = agent.select_action(state, deterministic=True)
state, reward, terminated, truncated, _ = env.step(action)
done = terminated or truncated
env.render()
env.close()
- Downloads last month
- 24
Evaluation results
- mean_reward on HalfCheetah-v5self-reported1244.04 +/- 36.99