GitHub - chancecardona/ppo-rl: Proximal Polixy Optimization RL Agent using a variety of Environments

Proximal Policy Optimization RL

This repo is my implementation of PPO, mainly from the HuggingFace RL Course.

The main script will automatically utilize PyTorch to use multiple GPUs if available.

Environments and Metrics

The environments are available from PyBullet-Gym. Unfortunately I could not get the Mujoco envs to work no matter what Gym and PyBullet / Mujoco combinations I used. In the the mean time PyBullet will need to suffice as this is a known issue. At least until I can test Genesis World Model which can create massively parallel physics sims, capable of setting up vectorized gym-like environments (see example).

HuggingFace Metrics (showing the results and a video of performance) at: huggingface.co
WandB (Weights & Biases) Metrics (showing training info such as the loss convergenge, GPU use, etc) at: WandB.ai

Installation (UV)

uv venv
source .venv/bin/activate
uv sync

Login to Online Services:

huggingface

huggingface-cli login After creating an identity token at huggingface.co

wandb

wandb login

after you have a WandB account (go to settings for the API key).

Running (Training and Evaluating)

Pybullet Cheetah Environment (default)

uv run main.py

Bullet Humanoid Env with WandB tracking

uv run main.py --track --env-id "HumanoidBulletEnv-v0"

* DEPRECATED (See Discrete Branch) *

Only for Discrete (see MAIN branch prior to PR)

uv run main.py --env-id "CartPole-v1"

to specify the CartPole-v1 environment instead.

Doom (todo)

Doom Environment

uv run main.py --env-id doom

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
.gitignore		.gitignore
README.md		README.md
huggingface.py		huggingface.py
main.py		main.py
openai_wrappers_normalize.py		openai_wrappers_normalize.py
parse_args.py		parse_args.py
ppo_policy.py		ppo_policy.py
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Proximal Policy Optimization RL

Environments and Metrics

Installation (UV)

Login to Online Services:

huggingface

wandb

Running (Training and Evaluating)

Pybullet Cheetah Environment (default)

Bullet Humanoid Env with WandB tracking

* DEPRECATED (See Discrete Branch) *

Only for Discrete (see MAIN branch prior to PR)

Doom (todo)

About

Releases

Packages

Languages

chancecardona/ppo-rl

Folders and files

Latest commit

History

Repository files navigation

Proximal Policy Optimization RL

Environments and Metrics

Installation (UV)

Login to Online Services:

huggingface

wandb

Running (Training and Evaluating)

Pybullet Cheetah Environment (default)

Bullet Humanoid Env with WandB tracking

*** DEPRECATED (See Discrete Branch) ***

Only for Discrete (see MAIN branch prior to PR)

Doom (todo)

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

* DEPRECATED (See Discrete Branch) *

Packages