This repo is my implementation of PPO, mainly from the HuggingFace RL Course.
The main script will automatically utilize PyTorch to use multiple GPUs if available.
The environments are available from PyBullet-Gym. Unfortunately I could not get the Mujoco envs to work no matter what Gym and PyBullet / Mujoco combinations I used. In the the mean time PyBullet will need to suffice as this is a known issue. At least until I can test Genesis World Model which can create massively parallel physics sims, capable of setting up vectorized gym-like environments (see example).
HuggingFace Metrics (showing the results and a video of performance) at: huggingface.co
WandB (Weights & Biases) Metrics (showing training info such as the loss convergenge, GPU use, etc) at: WandB.ai
uv venv
source .venv/bin/activate
uv sync
huggingface-cli login
After creating an identity token at huggingface.co
wandb login
after you have a WandB account (go to settings for the API key).
uv run main.py
uv run main.py --track --env-id "HumanoidBulletEnv-v0"
uv run main.py --env-id "CartPole-v1"
to specify the CartPole-v1 environment instead.
Doom Environment
uv run main.py --env-id doom