work

haosulab · Jan 20, 2024 · da3b9b6 · da3b9b6
1 parent 4b59e64
commit da3b9b6
Show file tree

Hide file tree

Showing 9 changed files with 227 additions and 11 deletions.
diff --git a/docs/source/algorithms_and_models/baselines.md b/docs/source/algorithms_and_models/baselines.md
@@ -0,0 +1,39 @@
+# Baselines
+
+ManiSkill has a number of baseline Reinforcement Learning (RL), Learning from Demonstrations (LfD) / Imitation Learning (IL) algorithms implemented that are easily runnable and reproducible for ManiSkill tasks. All baselines have their own standalone folders that you can download and run the code without having. The tables in the subsequent sections list out the implemented baselines, where they can be found, as well as results of running that code with tuned hyperparameters on some relevant ManiSkill tasks.
+
+<!-- TODO: Add pretrained models? -->
+
+<!-- Acknowledgement: This neat categorization of algorithms is taken from https://github.com/tinkoff-ai/CORL -->
+
+## Offline Only Methods
+These are algorithms that do not use online interaction with the environment to be trained and only learn from demonstration data. 
+<!-- Note that some of these algorithms can be trained offline and online and are marked with a \* and discussed in a [following section](#offline--online-methods) -->
+
+| Baseline                                                   | Source                                                                                             | Results               |
+| ---------------------------------------------------------- | -------------------------------------------------------------------------------------------------- | --------------------- |
+| Behavior Cloning                                           | [source](https://github.com/haosulab/ManiSkill2/tree/main/examples/baselines/behavior-cloning)     | [results](#baselines) |
+| [Decision Transformer](https://arxiv.org/abs/2106.01345)   | [source](https://github.com/haosulab/ManiSkill2/tree/main/examples/baselines/decision-transformer) | [results](#baselines) |
+| [Decision Diffusers](https://arxiv.org/abs/2211.15657.pdf) | [source](https://github.com/haosulab/ManiSkill2/tree/main/examples/baselines/decision-diffusers)   | [results](#baselines) |
+
+
+## Online Only Methods
+These are online only algorithms that do not learn from demonstrations and optimize based on feedback from interacting with the environment. These methods also benefit from GPU simulation which can massively accelerate training time
+
+| Baseline                                                               | Source                                                                             | Results               |
+| ---------------------------------------------------------------------- | ---------------------------------------------------------------------------------- | --------------------- |
+| [Proximal Policy Optimization (PPO)](https://arxiv.org/abs/1707.06347) | [source](https://github.com/haosulab/ManiSkill2/tree/main/examples/baselines/ppo)  | [results](#baselines) |
+| [Soft Actor Critic (SAC)](https://arxiv.org/abs/1801.01290)            | [source](https://github.com/haosulab/ManiSkill2/tree/main/examples/baselines/sac)  | [results](#baselines) |
+| [REDQ](https://arxiv.org/abs/2101.05982)                               | [source](https://github.com/haosulab/ManiSkill2/tree/main/examples/baselines/redq) | [results](#baselines) |
+
+
+## Offline + Online Methods
+These are baselines that can train on offline demonstration data as well as use online data collected from interacting with an environment.
+
+| Baseline                                                                                  | Source                                                                              | Results               |
+| ----------------------------------------------------------------------------------------- | ----------------------------------------------------------------------------------- | --------------------- |
+| [Soft Actor Critic (SAC)](https://arxiv.org/abs/1801.01290) with demonstrations in buffer | [source](https://github.com/haosulab/ManiSkill2/tree/main/examples/baselines/sac)   | [results](#baselines) |
+| [MoDem](https://arxiv.org/abs/2212.05698)                                                 | [source](https://github.com/haosulab/ManiSkill2/tree/main/examples/baselines/modem) | [results](#baselines) |
+| [RLPD](https://arxiv.org/abs/2302.02948)                                                  | [source](https://github.com/haosulab/ManiSkill2/tree/main/examples/baselines/rlpd)  | [results](#baselines) |
+
+
diff --git a/docs/source/conf.py b/docs/source/conf.py
@@ -6,10 +6,10 @@
 # -- Project information -----------------------------------------------------
 # https://www.sphinx-doc.org/en/master/usage/configuration.html#project-information
 
-project = "ManiSkill2"
-copyright = "2023, ManiSkill2 Contributors"
-author = "ManiSkill2 Contributors"
-release = "0.5.0"
+project = "ManiSkill3"
+copyright = "2024, ManiSkill3 Contributors"
+author = "ManiSkill3 Contributors"
+release = "3.0.0"
 
 # -- General configuration ---------------------------------------------------
 # https://www.sphinx-doc.org/en/master/usage/configuration.html#general-configuration
@@ -35,7 +35,7 @@
 # -- Options for HTML output -------------------------------------------------
 # https://www.sphinx-doc.org/en/master/usage/configuration.html#options-for-html-output
 
-html_theme = "sphinx_rtd_theme"
+html_theme = "furo"
 # html_static_path = ["_static"]
 
 # replace "view page source" with "edit on github" in Read The Docs theme

diff --git a/docs/source/datasets/datasets.md b/docs/source/datasets/datasets.md
@@ -0,0 +1,127 @@
+# Datasets
+
+ManiSkill has a wide variety of demonstrations from different sources including RL, human teleoperation, and motion-planning.
+
+## Download
+
+We provide a command line tool to download demonstrations directly from our [Hugging Face 🤗 dataset page](https://huggingface.co/datasets/haosulab/ManiSkill2) which are done by environment ID. The tool will download the demonstration files to a folder and also a few demonstration videos visualizing what the demonstrations look like. See [Environments](../concepts/environments.md) for a list of all supported environments.
+
+<!-- TODO: add a table here detailing the data info in detail -->
+<!-- Please see our [notes](https://docs.google.com/document/d/1bBKmsR-R_7tR9LwaT1c3J26SjIWw27tWSLdHnfBR01c/edit?usp=sharing) about the details of the demonstrations. -->
+
+```bash
+# Download the full datasets
+python -m mani_skill2.utils.download_demo all
+# Download the demonstration dataset for certain task
+python -m mani_skill2.utils.download_demo ${ENV_ID}
+# Download the demonstration datasets for all rigid-body tasks to "./demos"
+python -m mani_skill2.utils.download_demo rigid_body -o ./demos
+# Download the demonstration datasets for all soft-body tasks
+python -m mani_skill2.utils.download_demo soft_body
+```
+
+## Format
+
+All demonstrations for an environment are saved in the HDF5 format openable by [h5py](https://github.com/h5py/h5py). Each HDF5 dataset is named `trajectory.{obs_mode}.{control_mode}.h5`, and is associated with a JSON metadata file with the same base name. Unless otherwise specified, `trajectory.h5` is short for `trajectory.none.pd_joint_pos.h5`, which contains the original demonstrations generated by the `pd_joint_pos` controller with the `none` observation mode (empty observations). However, there may exist demonstrations generated by other controllers. **Thus, please check the associated JSON to ensure which controller is used.**
+<!-- 
+:::{note}
+For `PickSingleYCB-v0`, `TurnFaucet-v0`, the dataset is named `{model_id}.h5` for each asset. It is due to some legacy issues, and might be changed in the future.
+
+For `OpenCabinetDoor-v1`, `OpenCabinetDrawer-v1`, `PushChair-v1`, `MoveBucket-v1`, which are migrated from [ManiSkill1](https://github.com/haosulab/ManiSkill), trajectories are generated by the RL and `base_pd_joint_vel_arm_pd_joint_vel` controller.
+::: -->
+
+### Meta Information (JSON)
+
+Each JSON file contains:
+
+- `env_info` (Dict): environment information, which can be used to initialize the environment
+  - `env_id` (str): environment id
+  - `max_episode_steps` (int)
+  - `env_kwargs` (Dict): keyword arguments to initialize the environment. **Essential to recreate the environment.**
+- `episodes` (List[Dict]): episode information
+
+The episode information (the element of `episodes`) includes:
+
+- `episode_id` (int): a unique id to index the episode
+- `reset_kwargs` (Dict): keyword arguments to reset the environment. **Essential to reproduce the trajectory.**
+- `control_mode` (str): control mode used for the episode.
+- `elapsed_steps` (int): trajectory length
+- `info` (Dict): information at the end of the episode.
+
+With just the meta data, you can reproduce the environment the same way it was created when the trajectories were collected as so:
+
+```python
+env = gym.make(env_info["env_id"], **env_info["env_kwargs"])
+episode = env_info["episodes"][0] # picks the first
+env.reset(**episode["reset_kwargs"])
+```
+
+### Trajectory Data (HDF5)
+
+Each HDF5 demonstration dataset consists of multiple trajectories. The key of each trajectory is `traj_{episode_id}`, e.g., `traj_0`.
+
+Each trajectory is an `h5py.Group`, which contains:
+
+- actions: [T, A], `np.float32`. `T` is the number of transitions.
+- success: [T], `np.bool_`. It indicates whether the task is successful at each time step.
+- env_states: [T+1, D], `np.float32`. Environment states. It can be used to set the environment to a certain state, e.g., `env.set_state(env_states[i])`. However, it may not be enough to reproduce the trajectory.
+- env_init_state: [D], `np.float32`. The initial environment state. It is used for soft-body environments, since their states (particle positions) can use too much space.
+- obs (optional): observations. If the observation is a `dict`, the value will be stored in `obs/{key}`. The convention is applied recursively for nested dict.
+
+## Replaying/Converting Demonstration data
+
+To replay the demonstrations (without changing the observation mode and control mode):
+
+```bash
+# Replay and view trajectories through sapien viewer
+python -m mani_skill2.trajectory.replay_trajectory --traj-path demos/rigid_body/PickCube-v0/trajectory.h5 --vis
+
+# Save videos of trajectories (to the same directory of trajectory)
+python -m mani_skill2.trajectory.replay_trajectory --traj-path demos/rigid_body/PickCube-v0/trajectory.h5 --save-video
+```
+
+:::{note}
+The script requires `trajectory.h5` and `trajectory.json` to be both under the same directory.
+:::
+
+The raw demonstration files contain all the necessary information (e.g. initial states, actions, seeds) to reproduce a trajectory. Observations are not included since they can lead to large file sizes without postprocessing. In addition, actions in these files do not cover all control modes. Therefore, you need to convert the raw files into your desired observation and control modes. We provide a utility script that works as follows:
+
+```bash
+# Replay demonstrations with control_mode=pd_joint_delta_pos
+python -m mani_skill2.trajectory.replay_trajectory \
+  --traj-path demos/rigid_body/PickCube-v0/trajectory.h5 \
+  --save-traj --target-control-mode pd_joint_delta_pos --obs-mode none --num-procs 10
+```
+
+<details>
+
+<summary><b>Click here</b> for important notes about the script arguments.</summary>
+
+- `--save-traj`: save the replayed trajectory to the same folder as the original trajectory file.
+- `--num-procs=10`: split trajectories to multiple processes (e.g., 10 processes) for acceleration.
+- `--obs-mode=none`: specify the observation mode as `none`, i.e. not saving any observations.
+- `--obs-mode=rgbd`: (not included in the script above) specify the observation mode as `rgbd` to replay the trajectory. If `--save-traj`, the saved trajectory will contain the RGBD observations. RGB images are saved as uint8 and depth images (multiplied by 1024) are saved as uint16.
+- `--obs-mode=pointcloud`: (not included in the script above) specify the observation mode as `pointcloud`. We encourage you to further process the point cloud instead of using this point cloud directly.
+- `--obs-mode=state`: (not included in the script above) specify the observation mode as `state`. Note that the `state` observation mode is not allowed for challenge submission.
+- `--use-env-states`: For each time step $t$, after replaying the action at this time step and obtaining a new observation at $t+1$, set the environment state at time $t+1$ as the recorded environment state at time $t+1$. This is necessary for successfully replaying trajectories for the tasks migrated from ManiSkill1.
+</details>
+
+<br>
+
+:::{note}
+For soft-body environments, please compile and generate caches (`python -m mani_skill2.utils.precompile_mpm`) before running the script with multiple processes (with `--num-procs`).
+:::
+
+:::{caution}
+The conversion between controllers (or action spaces) is not yet supported for mobile manipulators (e.g., used in tasks migrated from ManiSkill1).
+:::
+
+:::{caution}
+Since some demonstrations are collected in a non-quasi-static way (objects are not fixed relative to the manipulator during manipulation) for some challenging tasks (e.g., `TurnFaucet` and tasks migrated from ManiSkill1), replaying actions can fail due to non-determinism in simulation. Thus, replaying trajectories by environment states is required (passing `--use-env-states`).
+:::
+
+---
+
+We recommend using our script only for converting actions into different control modes without recording any observation information (i.e. passing `--obs-mode=none`). The reason is that (1) some observation modes, e.g. point cloud, can take much space without any post-processing, e.g., point cloud downsampling; in addition, the `state` mode for soft-body environments also has a similar issue, since the states of those environments are particles. (2) Some algorithms  (e.g. GAIL) require custom keys stored in the demonstration files, e.g. next-observation.
+
+Thus we recommend that, after you convert actions into different control modes, implement your custom environment wrappers for observation processing. After this, use another script to render and save the corresponding post-processed visual demonstrations. [ManiSkill2-Learn](https://github.com/haosulab/ManiSkill2-Learn) has included such observation processing wrappers and demonstration conversion script (with multi-processing), so we recommend referring to the repo for more details.
diff --git a/docs/source/datasets/teleoperation.md b/docs/source/datasets/teleoperation.md
@@ -0,0 +1 @@
+# Teleoperation
diff --git a/docs/source/education/resources.md b/docs/source/education/resources.md
@@ -0,0 +1,11 @@
+# Resources
+
+Are you looking to teach a course on robot learning, simulated robotics etc.? We have compiled a large list of resources along with recommendations to help get started.
+
+
+
+
+
+
+<!-- ## Courses using ManiSkill -->
+
diff --git a/docs/source/getting_started/installation.md b/docs/source/getting_started/installation.md
@@ -1,5 +1,7 @@
 # Installation
 
+ManiSkill is a GPU-accelerated robotics benchmark built on top of [SAPIEN](https://github.com/haosulab/sapien) designed to support a wide array of applications from robot learning, learning from demonstrations, sim2real/real2sim, and more. Follow the instructions below to get started using ManiSkill.
+
 From pip (stable version):
 
 ```bash
@@ -21,7 +23,7 @@ cd ManiSkill2 && pip install -e .
 ```
 
 :::{note}
-A GPU with the Vulkan driver installed is required to enable rendering in ManiSkill2. See [here](vulkan) for how to install and configure Vulkan on Ubuntu.
+While state-based simulation does not require any additional dependencies, a GPU with the Vulkan driver installed is required to enable rendering in ManiSkill. See [here](#vulkan) for how to install and configure Vulkan on Ubuntu.
 :::
 
 The rigid-body environments, powered by SAPIEN, are ready to use after installation. Test your installation:
@@ -32,9 +34,9 @@ The rigid-body environments, powered by SAPIEN, are ready to use after installat
 python -m mani_skill2.examples.demo_random_action
 ```
 
-Besides, we provide a docker image (`haosulab/mani-skill2`) on [Docker Hub](https://hub.docker.com/repository/docker/haosulab/mani-skill2/general), and its corresponding [Dockerfile](https://github.com/haosulab/ManiSkill2/blob/main/docker/Dockerfile).
+A docker image is also provided on [Docker Hub](https://hub.docker.com/repository/docker/haosulab/mani-skill2/general) called  `haosulab/mani-skill2` and its corresponding [Dockerfile](https://github.com/haosulab/ManiSkill2/blob/main/docker/Dockerfile).
 
-## Warp (ManiSkill2-version)
+## Soft-body environments / Warp (ManiSkill2-version)
 
 :::{note}
 The following section is to install [NVIDIA Warp](https://github.com/NVIDIA/warp) for soft-body environments. You can skip it if you do not need soft-body environments yet.

diff --git a/docs/source/getting_started/overview.md b/docs/source/getting_started/overview.md
@@ -0,0 +1,13 @@
+# Overview
+
+ManiSkill is a feature-rich GPU-accelerated robotics benchmark built on top of [SAPIEN](https://github.com/haosulab/sapien) designed to provide accessible support for a wide array of applications from robot learning, learning from demonstrations, sim2real/real2sim, and more. 
+
+Features:
+- GPU parallelized simulation enabling 200,000+ FPS on some tasks
+- GPU parallelized rendering enabling 10,000+ FPS on some tasks, massively outperforming other benchmarks
+- Flexible API to build custom tasks of any complexity
+- Variety of verified robotics tasks with diverse dynamics and visuals
+- Reproducible baselines in Reinforcement Learning and Learning from Demonstrations, spread across tasks from dextrous manipulation to mobile manipulation 
+
+
+To install see the [installation page](installation).
diff --git a/docs/source/getting_started/tutorials.md b/docs/source/getting_started/tutorials.md
@@ -0,0 +1,3 @@
+# Tutorials
+
+
diff --git a/docs/source/index.rst b/docs/source/index.rst
@@ -3,16 +3,18 @@
    You can adapt this file completely to your liking, but it should at least
    contain the root `toctree` directive.
 
-Welcome to ManiSkill2's documentation!
+ManiSkill2
 =======================================
 
 .. toctree::
    :maxdepth: 1
    :caption: Getting Started
-
+
+   getting_started/overview
    getting_started/installation
    getting_started/quickstart
    getting_started/docker
+   getting_started/tutorials
 
 .. toctree::
    :maxdepth: 1
@@ -21,14 +23,32 @@ Welcome to ManiSkill2's documentation!
    concepts/environments
    concepts/observation
    concepts/controllers
-   concepts/demonstrations
+
+.. toctree::
+   :maxdepth: 1
+   :caption: Datasets
+
+   datasets/datasets
+   datasets/teleoperation
+
+.. toctree::
+   :maxdepth: 1
+   :caption: Algorithms and Models
+
+   algorithms_and_models/baselines
 
 .. toctree::
    :maxdepth: 1
    :caption: Benchmark
 
    benchmark/submission
 
+.. toctree::
+   :maxdepth: 1
+   :caption: Educational Resources
+
+   education/resources
+
 .. Indices and tables
 .. ==================