Skip to content

Proximal Polixy Optimization RL Agent using a variety of Environments

Notifications You must be signed in to change notification settings

chancecardona/ppo-rl

Repository files navigation

Proximal Policy Optimization RL

This repo is my implementation of PPO, mainly from the HuggingFace RL Course.

The main script will automatically utilize PyTorch to use multiple GPUs if available.

Environments and Metrics

The environments are available from PyBullet-Gym. Unfortunately I could not get the Mujoco envs to work no matter what Gym and PyBullet / Mujoco combinations I used. In the the mean time PyBullet will need to suffice as this is a known issue. At least until I can test Genesis World Model which can create massively parallel physics sims, capable of setting up vectorized gym-like environments (see example).

HuggingFace Metrics (showing the results and a video of performance) at: huggingface.co
WandB (Weights & Biases) Metrics (showing training info such as the loss convergenge, GPU use, etc) at: WandB.ai

Installation (UV)

uv venv
source .venv/bin/activate
uv sync

Login to Online Services:

huggingface

huggingface-cli login After creating an identity token at huggingface.co

wandb

wandb login

after you have a WandB account (go to settings for the API key).

Running (Training and Evaluating)

Pybullet Cheetah Environment (default)

uv run main.py

Bullet Humanoid Env with WandB tracking

uv run main.py --track --env-id "HumanoidBulletEnv-v0"

*** DEPRECATED (See Discrete Branch) ***

Only for Discrete (see MAIN branch prior to PR)

uv run main.py --env-id "CartPole-v1"

to specify the CartPole-v1 environment instead.

Doom (todo)

Doom Environment

uv run main.py --env-id doom

About

Proximal Polixy Optimization RL Agent using a variety of Environments

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages