-
Notifications
You must be signed in to change notification settings - Fork 201
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
1 parent
4b59e64
commit da3b9b6
Showing
9 changed files
with
227 additions
and
11 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,39 @@ | ||
# Baselines | ||
|
||
ManiSkill has a number of baseline Reinforcement Learning (RL), Learning from Demonstrations (LfD) / Imitation Learning (IL) algorithms implemented that are easily runnable and reproducible for ManiSkill tasks. All baselines have their own standalone folders that you can download and run the code without having. The tables in the subsequent sections list out the implemented baselines, where they can be found, as well as results of running that code with tuned hyperparameters on some relevant ManiSkill tasks. | ||
|
||
<!-- TODO: Add pretrained models? --> | ||
|
||
<!-- Acknowledgement: This neat categorization of algorithms is taken from https://github.com/tinkoff-ai/CORL --> | ||
|
||
## Offline Only Methods | ||
These are algorithms that do not use online interaction with the environment to be trained and only learn from demonstration data. | ||
<!-- Note that some of these algorithms can be trained offline and online and are marked with a \* and discussed in a [following section](#offline--online-methods) --> | ||
|
||
| Baseline | Source | Results | | ||
| ---------------------------------------------------------- | -------------------------------------------------------------------------------------------------- | --------------------- | | ||
| Behavior Cloning | [source](https://github.com/haosulab/ManiSkill2/tree/main/examples/baselines/behavior-cloning) | [results](#baselines) | | ||
| [Decision Transformer](https://arxiv.org/abs/2106.01345) | [source](https://github.com/haosulab/ManiSkill2/tree/main/examples/baselines/decision-transformer) | [results](#baselines) | | ||
| [Decision Diffusers](https://arxiv.org/abs/2211.15657.pdf) | [source](https://github.com/haosulab/ManiSkill2/tree/main/examples/baselines/decision-diffusers) | [results](#baselines) | | ||
|
||
|
||
## Online Only Methods | ||
These are online only algorithms that do not learn from demonstrations and optimize based on feedback from interacting with the environment. These methods also benefit from GPU simulation which can massively accelerate training time | ||
|
||
| Baseline | Source | Results | | ||
| ---------------------------------------------------------------------- | ---------------------------------------------------------------------------------- | --------------------- | | ||
| [Proximal Policy Optimization (PPO)](https://arxiv.org/abs/1707.06347) | [source](https://github.com/haosulab/ManiSkill2/tree/main/examples/baselines/ppo) | [results](#baselines) | | ||
| [Soft Actor Critic (SAC)](https://arxiv.org/abs/1801.01290) | [source](https://github.com/haosulab/ManiSkill2/tree/main/examples/baselines/sac) | [results](#baselines) | | ||
| [REDQ](https://arxiv.org/abs/2101.05982) | [source](https://github.com/haosulab/ManiSkill2/tree/main/examples/baselines/redq) | [results](#baselines) | | ||
|
||
|
||
## Offline + Online Methods | ||
These are baselines that can train on offline demonstration data as well as use online data collected from interacting with an environment. | ||
|
||
| Baseline | Source | Results | | ||
| ----------------------------------------------------------------------------------------- | ----------------------------------------------------------------------------------- | --------------------- | | ||
| [Soft Actor Critic (SAC)](https://arxiv.org/abs/1801.01290) with demonstrations in buffer | [source](https://github.com/haosulab/ManiSkill2/tree/main/examples/baselines/sac) | [results](#baselines) | | ||
| [MoDem](https://arxiv.org/abs/2212.05698) | [source](https://github.com/haosulab/ManiSkill2/tree/main/examples/baselines/modem) | [results](#baselines) | | ||
| [RLPD](https://arxiv.org/abs/2302.02948) | [source](https://github.com/haosulab/ManiSkill2/tree/main/examples/baselines/rlpd) | [results](#baselines) | | ||
|
||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,127 @@ | ||
# Datasets | ||
|
||
ManiSkill has a wide variety of demonstrations from different sources including RL, human teleoperation, and motion-planning. | ||
|
||
## Download | ||
|
||
We provide a command line tool to download demonstrations directly from our [Hugging Face 🤗 dataset page](https://huggingface.co/datasets/haosulab/ManiSkill2) which are done by environment ID. The tool will download the demonstration files to a folder and also a few demonstration videos visualizing what the demonstrations look like. See [Environments](../concepts/environments.md) for a list of all supported environments. | ||
|
||
<!-- TODO: add a table here detailing the data info in detail --> | ||
<!-- Please see our [notes](https://docs.google.com/document/d/1bBKmsR-R_7tR9LwaT1c3J26SjIWw27tWSLdHnfBR01c/edit?usp=sharing) about the details of the demonstrations. --> | ||
|
||
```bash | ||
# Download the full datasets | ||
python -m mani_skill2.utils.download_demo all | ||
# Download the demonstration dataset for certain task | ||
python -m mani_skill2.utils.download_demo ${ENV_ID} | ||
# Download the demonstration datasets for all rigid-body tasks to "./demos" | ||
python -m mani_skill2.utils.download_demo rigid_body -o ./demos | ||
# Download the demonstration datasets for all soft-body tasks | ||
python -m mani_skill2.utils.download_demo soft_body | ||
``` | ||
|
||
## Format | ||
|
||
All demonstrations for an environment are saved in the HDF5 format openable by [h5py](https://github.com/h5py/h5py). Each HDF5 dataset is named `trajectory.{obs_mode}.{control_mode}.h5`, and is associated with a JSON metadata file with the same base name. Unless otherwise specified, `trajectory.h5` is short for `trajectory.none.pd_joint_pos.h5`, which contains the original demonstrations generated by the `pd_joint_pos` controller with the `none` observation mode (empty observations). However, there may exist demonstrations generated by other controllers. **Thus, please check the associated JSON to ensure which controller is used.** | ||
<!-- | ||
:::{note} | ||
For `PickSingleYCB-v0`, `TurnFaucet-v0`, the dataset is named `{model_id}.h5` for each asset. It is due to some legacy issues, and might be changed in the future. | ||
For `OpenCabinetDoor-v1`, `OpenCabinetDrawer-v1`, `PushChair-v1`, `MoveBucket-v1`, which are migrated from [ManiSkill1](https://github.com/haosulab/ManiSkill), trajectories are generated by the RL and `base_pd_joint_vel_arm_pd_joint_vel` controller. | ||
::: --> | ||
|
||
### Meta Information (JSON) | ||
|
||
Each JSON file contains: | ||
|
||
- `env_info` (Dict): environment information, which can be used to initialize the environment | ||
- `env_id` (str): environment id | ||
- `max_episode_steps` (int) | ||
- `env_kwargs` (Dict): keyword arguments to initialize the environment. **Essential to recreate the environment.** | ||
- `episodes` (List[Dict]): episode information | ||
|
||
The episode information (the element of `episodes`) includes: | ||
|
||
- `episode_id` (int): a unique id to index the episode | ||
- `reset_kwargs` (Dict): keyword arguments to reset the environment. **Essential to reproduce the trajectory.** | ||
- `control_mode` (str): control mode used for the episode. | ||
- `elapsed_steps` (int): trajectory length | ||
- `info` (Dict): information at the end of the episode. | ||
|
||
With just the meta data, you can reproduce the environment the same way it was created when the trajectories were collected as so: | ||
|
||
```python | ||
env = gym.make(env_info["env_id"], **env_info["env_kwargs"]) | ||
episode = env_info["episodes"][0] # picks the first | ||
env.reset(**episode["reset_kwargs"]) | ||
``` | ||
|
||
### Trajectory Data (HDF5) | ||
|
||
Each HDF5 demonstration dataset consists of multiple trajectories. The key of each trajectory is `traj_{episode_id}`, e.g., `traj_0`. | ||
|
||
Each trajectory is an `h5py.Group`, which contains: | ||
|
||
- actions: [T, A], `np.float32`. `T` is the number of transitions. | ||
- success: [T], `np.bool_`. It indicates whether the task is successful at each time step. | ||
- env_states: [T+1, D], `np.float32`. Environment states. It can be used to set the environment to a certain state, e.g., `env.set_state(env_states[i])`. However, it may not be enough to reproduce the trajectory. | ||
- env_init_state: [D], `np.float32`. The initial environment state. It is used for soft-body environments, since their states (particle positions) can use too much space. | ||
- obs (optional): observations. If the observation is a `dict`, the value will be stored in `obs/{key}`. The convention is applied recursively for nested dict. | ||
|
||
## Replaying/Converting Demonstration data | ||
|
||
To replay the demonstrations (without changing the observation mode and control mode): | ||
|
||
```bash | ||
# Replay and view trajectories through sapien viewer | ||
python -m mani_skill2.trajectory.replay_trajectory --traj-path demos/rigid_body/PickCube-v0/trajectory.h5 --vis | ||
|
||
# Save videos of trajectories (to the same directory of trajectory) | ||
python -m mani_skill2.trajectory.replay_trajectory --traj-path demos/rigid_body/PickCube-v0/trajectory.h5 --save-video | ||
``` | ||
|
||
:::{note} | ||
The script requires `trajectory.h5` and `trajectory.json` to be both under the same directory. | ||
::: | ||
|
||
The raw demonstration files contain all the necessary information (e.g. initial states, actions, seeds) to reproduce a trajectory. Observations are not included since they can lead to large file sizes without postprocessing. In addition, actions in these files do not cover all control modes. Therefore, you need to convert the raw files into your desired observation and control modes. We provide a utility script that works as follows: | ||
|
||
```bash | ||
# Replay demonstrations with control_mode=pd_joint_delta_pos | ||
python -m mani_skill2.trajectory.replay_trajectory \ | ||
--traj-path demos/rigid_body/PickCube-v0/trajectory.h5 \ | ||
--save-traj --target-control-mode pd_joint_delta_pos --obs-mode none --num-procs 10 | ||
``` | ||
|
||
<details> | ||
|
||
<summary><b>Click here</b> for important notes about the script arguments.</summary> | ||
|
||
- `--save-traj`: save the replayed trajectory to the same folder as the original trajectory file. | ||
- `--num-procs=10`: split trajectories to multiple processes (e.g., 10 processes) for acceleration. | ||
- `--obs-mode=none`: specify the observation mode as `none`, i.e. not saving any observations. | ||
- `--obs-mode=rgbd`: (not included in the script above) specify the observation mode as `rgbd` to replay the trajectory. If `--save-traj`, the saved trajectory will contain the RGBD observations. RGB images are saved as uint8 and depth images (multiplied by 1024) are saved as uint16. | ||
- `--obs-mode=pointcloud`: (not included in the script above) specify the observation mode as `pointcloud`. We encourage you to further process the point cloud instead of using this point cloud directly. | ||
- `--obs-mode=state`: (not included in the script above) specify the observation mode as `state`. Note that the `state` observation mode is not allowed for challenge submission. | ||
- `--use-env-states`: For each time step $t$, after replaying the action at this time step and obtaining a new observation at $t+1$, set the environment state at time $t+1$ as the recorded environment state at time $t+1$. This is necessary for successfully replaying trajectories for the tasks migrated from ManiSkill1. | ||
</details> | ||
|
||
<br> | ||
|
||
:::{note} | ||
For soft-body environments, please compile and generate caches (`python -m mani_skill2.utils.precompile_mpm`) before running the script with multiple processes (with `--num-procs`). | ||
::: | ||
|
||
:::{caution} | ||
The conversion between controllers (or action spaces) is not yet supported for mobile manipulators (e.g., used in tasks migrated from ManiSkill1). | ||
::: | ||
|
||
:::{caution} | ||
Since some demonstrations are collected in a non-quasi-static way (objects are not fixed relative to the manipulator during manipulation) for some challenging tasks (e.g., `TurnFaucet` and tasks migrated from ManiSkill1), replaying actions can fail due to non-determinism in simulation. Thus, replaying trajectories by environment states is required (passing `--use-env-states`). | ||
::: | ||
|
||
--- | ||
|
||
We recommend using our script only for converting actions into different control modes without recording any observation information (i.e. passing `--obs-mode=none`). The reason is that (1) some observation modes, e.g. point cloud, can take much space without any post-processing, e.g., point cloud downsampling; in addition, the `state` mode for soft-body environments also has a similar issue, since the states of those environments are particles. (2) Some algorithms (e.g. GAIL) require custom keys stored in the demonstration files, e.g. next-observation. | ||
|
||
Thus we recommend that, after you convert actions into different control modes, implement your custom environment wrappers for observation processing. After this, use another script to render and save the corresponding post-processed visual demonstrations. [ManiSkill2-Learn](https://github.com/haosulab/ManiSkill2-Learn) has included such observation processing wrappers and demonstration conversion script (with multi-processing), so we recommend referring to the repo for more details. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
# Teleoperation |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,11 @@ | ||
# Resources | ||
|
||
Are you looking to teach a course on robot learning, simulated robotics etc.? We have compiled a large list of resources along with recommendations to help get started. | ||
|
||
|
||
|
||
|
||
|
||
|
||
<!-- ## Courses using ManiSkill --> | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,13 @@ | ||
# Overview | ||
|
||
ManiSkill is a feature-rich GPU-accelerated robotics benchmark built on top of [SAPIEN](https://github.com/haosulab/sapien) designed to provide accessible support for a wide array of applications from robot learning, learning from demonstrations, sim2real/real2sim, and more. | ||
|
||
Features: | ||
- GPU parallelized simulation enabling 200,000+ FPS on some tasks | ||
- GPU parallelized rendering enabling 10,000+ FPS on some tasks, massively outperforming other benchmarks | ||
- Flexible API to build custom tasks of any complexity | ||
- Variety of verified robotics tasks with diverse dynamics and visuals | ||
- Reproducible baselines in Reinforcement Learning and Learning from Demonstrations, spread across tasks from dextrous manipulation to mobile manipulation | ||
|
||
|
||
To install see the [installation page](installation). |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,3 @@ | ||
# Tutorials | ||
|
||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters