diff --git a/README.md b/README.md index 34e77175a..e997375bd 100644 --- a/README.md +++ b/README.md @@ -17,8 +17,8 @@ ManiSkill is a powerful unified framework for robot simulation and training powe - Example tasks cover a wide range of different robot embodiments (humanoids, mobile manipulators, single-arm robots) as well as a wide range of different tasks (table-top, drawing/cleaning, dextrous manipulation) - Flexible and simple task building API that abstracts away much of the complex GPU memory management code via an object oriented design - Real2sim environments for scalably evaluating real-world policies 100x faster via GPU simulation. +- Many tuned robot learning baselines in Reinforcement Learning (e.g. PPO, SAC, [TD-MPC2](https://github.com/nicklashansen/tdmpc2)), Imitation Learning (e.g. Behavior Cloning, [Diffusion Policy](https://github.com/real-stanford/diffusion_policy)), and large Vision Language Action (VLA) models (e.g. [Octo](https://github.com/octo-models/octo), [RDT-1B](https://github.com/thu-ml/RoboticsDiffusionTransformer), [RT-x](https://robotics-transformer-x.github.io/)) - For more details we encourage you to take a look at our [paper](https://arxiv.org/abs/2410.00425). There are more features to be added to ManiSkill 3, see [our roadmap](https://maniskill.readthedocs.io/en/latest/roadmap/index.html) for planned features that will be added over time before the official v3 is released. diff --git a/docs/source/index.md b/docs/source/index.md index 3308e37e0..3dc1794fe 100644 --- a/docs/source/index.md +++ b/docs/source/index.md @@ -12,12 +12,12 @@ ManiSkill is a powerful unified framework for robot simulation and training powered by [SAPIEN](https://sapien.ucsd.edu/), with a strong focus on manipulation skills. The entire tech stack is as open-source as possible and ManiSkill v3 is in beta release now. Among its features include: - GPU parallelized visual data collection system. On the high end you can collect RGBD + Segmentation data at 30,000+ FPS with a 4090 GPU, 10-1000x faster compared to most other simulators. - GPU parallelized simulation, enabling high throughput state-based synthetic data collection in simulation -- GPU parallelized heteogeneous simuluation, where every parallel environment has a completely different scene/set of objects +- GPU parallelized heterogeneous simulation, where every parallel environment has a completely different scene/set of objects - Example tasks cover a wide range of different robot embodiments (humanoids, mobile manipulators, single-arm robots) as well as a wide range of different tasks (table-top, drawing/cleaning, dextrous manipulation) - Flexible and simple task building API that abstracts away much of the complex GPU memory management code via an object oriented design -- Real2sim environments for scalably evaluating real-world policies 60-100x faster via GPU simulation. +- Real2sim environments for scalably evaluating real-world policies 100x faster via GPU simulation. +- Many tuned robot learning baselines in Reinforcement Learning (e.g. PPO, SAC, [TD-MPC2](https://github.com/nicklashansen/tdmpc2)), Imitation Learning (e.g. Behavior Cloning, [Diffusion Policy](https://github.com/real-stanford/diffusion_policy)), and large Vision Language Action (VLA) models (e.g. [Octo](https://github.com/octo-models/octo), [RDT-1B](https://github.com/thu-ml/RoboticsDiffusionTransformer), [RT-x](https://robotics-transformer-x.github.io/)) - For more details we encourage you to take a look at our [paper](https://arxiv.org/abs/2410.00425). There are more features to be added to ManiSkill 3, see [our roadmap](https://maniskill.readthedocs.io/en/latest/roadmap/index.html) for planned features that will be added over time before the official v3 is released. diff --git a/docs/source/user_guide/index.md b/docs/source/user_guide/index.md index 0a9810525..3383c8d49 100644 --- a/docs/source/user_guide/index.md +++ b/docs/source/user_guide/index.md @@ -12,12 +12,12 @@ ManiSkill is a powerful unified framework for robot simulation and training powered by [SAPIEN](https://sapien.ucsd.edu/), with a strong focus on manipulation skills. The entire tech stack is as open-source as possible and ManiSkill v3 is in beta release now. Among its features include: - GPU parallelized visual data collection system. On the high end you can collect RGBD + Segmentation data at 30,000+ FPS with a 4090 GPU, 10-1000x faster compared to most other simulators. - GPU parallelized simulation, enabling high throughput state-based synthetic data collection in simulation -- GPU parallelized heteogeneous simuluation, where every parallel environment has a completely different scene/set of objects +- GPU parallelized heterogeneous simulation, where every parallel environment has a completely different scene/set of objects - Example tasks cover a wide range of different robot embodiments (humanoids, mobile manipulators, single-arm robots) as well as a wide range of different tasks (table-top, drawing/cleaning, dextrous manipulation) - Flexible and simple task building API that abstracts away much of the complex GPU memory management code via an object oriented design -- Real2sim environments for scalably evaluating real-world policies 60-100x faster via GPU simulation. +- Real2sim environments for scalably evaluating real-world policies 100x faster via GPU simulation. +- Many tuned robot learning baselines in Reinforcement Learning (e.g. PPO, SAC, [TD-MPC2](https://github.com/nicklashansen/tdmpc2)), Imitation Learning (e.g. Behavior Cloning, [Diffusion Policy](https://github.com/real-stanford/diffusion_policy)), and large Vision Language Action (VLA) models (e.g. [Octo](https://github.com/octo-models/octo), [RDT-1B](https://github.com/thu-ml/RoboticsDiffusionTransformer), [RT-x](https://robotics-transformer-x.github.io/)) - For more details we encourage you to take a look at our [paper](https://github.com/haosulab/ManiSkill/blob/main/figures/maniskill3_paper.pdf). There are more features to be added to ManiSkill 3, see [our roadmap](https://maniskill.readthedocs.io/en/latest/roadmap/index.html) for planned features that will be added over time before the official v3 is released. @@ -46,6 +46,7 @@ datasets/index data_collection/index reinforcement_learning/index learning_from_demos/index +vision_language_action_models/index wrappers/index ``` diff --git a/docs/source/user_guide/vision_language_action_models/index.md b/docs/source/user_guide/vision_language_action_models/index.md new file mode 100644 index 000000000..28f6ef0aa --- /dev/null +++ b/docs/source/user_guide/vision_language_action_models/index.md @@ -0,0 +1,11 @@ +# Vision Language Action Models + +ManiSkill supports evaluating and pretraining vision language action models. Currently the following VLAs have been tested via the ManiSkill framework: + +- [Octo](https://github.com/octo-models/octo) +- [RDT-1B](https://github.com/thu-ml/RoboticsDiffusionTransformer) +- [RT-x](https://robotics-transformer-x.github.io/) + +RDT-1B uses some of the ManiSkill demonstrations for pretraining data and evaluates by fine-tuning on some demonstrations on various ManiSkill tasks, see their [README](#https://github.com/thu-ml/RoboticsDiffusionTransformer?tab=readme-ov-file#simulation-benchmark) for more details. + +Octo and RT series of models are evaluated through various real2sim environments as part of the SIMPLER project, see their [README](https://github.com/simpler-env/SimplerEnv/tree/maniskill3) for details on how to run the evaluation setup. \ No newline at end of file