urlb

jsrimr · Oct 28, 2021 · 710c3eb · 710c3eb
commit 710c3eb
Show file tree

Hide file tree

Showing 47 changed files with 6,351 additions and 0 deletions.
diff --git a/LICENSE b/LICENSE
@@ -0,0 +1,21 @@
+MIT License
+
+Copyright (c) Facebook, Inc. and its affiliates.
+
+Permission is hereby granted, free of charge, to any person obtaining a copy
+of this software and associated documentation files (the "Software"), to deal
+in the Software without restriction, including without limitation the rights
+to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+copies of the Software, and to permit persons to whom the Software is
+furnished to do so, subject to the following conditions:
+
+The above copyright notice and this permission notice shall be included in all
+copies or substantial portions of the Software.
+
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+SOFTWARE.
diff --git a/README.md b/README.md
@@ -0,0 +1,93 @@
+# The Unsupervised Reinforcement Learning Benchmark (URLB)
+
+URLB provides a set of leading algorithms for unsupervised reinforcement learning where agents first pre-train without access to extrinsic rewards and then are finetuned to downstream tasks.
+
+## Requirements
+We assume you have access to a GPU that can run CUDA 10.2 and CUDNN 8. Then, the simplest way to install all required dependencies is to create an anaconda environment by running
+```sh
+conda env create -f conda_env.yml
+```
+After the instalation ends you can activate your environment with
+```sh
+conda activate urlb
+```
+
+## Implemented Agents
+| Agent | Command | Implementation Author(s) | Paper |
+|---|---|---|---|
+| ICM | `agent=icm` | Denis | [paper](https://arxiv.org/abs/1705.05363)|
+| ProtoRL | `agent=proto` | Denis | [paper](https://arxiv.org/abs/2102.11271)|
+| DIAYN | `agent=diayn` | Misha | [paper](https://arxiv.org/abs/1802.06070)|
+| APT(ICM) | `agent=icm_apt` | Hao, Kimin | [paper](https://arxiv.org/abs/2103.04551)|
+| APT(Ind) | `agent=ind_apt` | Hao, Kimin | [paper](https://arxiv.org/abs/2103.04551)|
+| APS | `agent=aps` | Hao, Kimin | [paper](http://proceedings.mlr.press/v139/liu21b.html)|
+| SMM | `agent=smm` | Albert | [paper](https://arxiv.org/abs/1906.05274) |
+| RND | `agent=rnd` | Kevin | [paper](https://arxiv.org/abs/1810.12894) |
+| Disagreement | `agent=disagreement` | Catherine | [paper](https://arxiv.org/abs/1906.04161) |
+
+## Available Domains
+We support the following domains.
+| Domain | Tasks |
+|---|---|
+| `walker` | `stand`, `walk`, `run`, `flip` |
+| `quadruped` | `walk`, `run`, `stand`, `jump` |
+| `jaco` | `reach_top_left`, `reach_top_right`, `reach_bottom_left`, `reach_bottom_right` |
+
+
+## Domain observation mode
+Each domain supports two observation modes: states and pixels.
+| Model | Command |
+|---|---|
+| states | `obs_type=states` |
+| pixels | `obs_type=pixels` |
+
+
+## Instructions
+### Pre-training
+To run pre-training use the `pretrain.py` script
+```sh
+python pretrain.py agent=icm domain=walker
+```
+or, if you want to train a skill-based agent, like DIAYN, run:
+```sh
+python pretrain.py agent=diayn domain=walker
+```
+This script will produce several agent snapshots after training for `100k`, `500k`, `1M`, and `2M` frames. The snapshots will be stored under the following directory:
+```sh
+./pretrained_models/<obs_type>/<domain>/<agent>/
+```
+For example:
+```sh
+./pretrained_models/states/walker/icm/
+```
+
+### Fine-tuning
+Once you have pre-trained your method, you can use the saved snapshots to initialize the `DDPG` agent and fine-tune it on a downstream task. For example, let's say you have pre-trained `ICM`, you can fine-tune it on `walker_run` by running the following command:
+```sh
+python finetune.py pretrained_agent=icm task=walker_run snapshot_ts=1000000 obs_type=states
+```
+This will load a snapshot stored in `./pretrained_models/states/walker/icm/snapshot_1000000.pt`, initialize `DDPG` with it (both the actor and critic), and start training on `walker_run` using the extrinsic reward of the task.
+
+For methods that use skills, include the agent, and the `reward_free` tag to false.
+```sh
+python finetune.py pretrained_agent=smm task=walker_run snapshot_ts=1000000 obs_type=states agent=smm reward_free=false
+```
+
+### Monitoring
+Logs are stored in the `exp_local` folder. To launch tensorboard run:
+```sh
+tensorboard --logdir exp_local
+```
+The console output is also available in a form:
+```
+| train | F: 6000 | S: 3000 | E: 6 | L: 1000 | R: 5.5177 | FPS: 96.7586 | T: 0:00:42
+```
+a training entry decodes as
+```
+F  : total number of environment frames
+S  : total number of agent steps
+E  : total number of episodes
+R  : episode return
+FPS: training throughput (frames per second)
+T  : total training time
+```