Recommended way to actively set system's state #498

yardenas · 2024-07-02T16:52:23Z

yardenas
Jul 2, 2024

In many sim2real RL applications, it makes sense to leverage our ability to "teleport" between states, without really following the dynamics. For example, in the cartpole environment, instead of exploring and learning to swing up the pendulum to initially get rewards, we can just sample the upright position (how do we know that the upright position is interesting is a matter for a different discussion though).

What's the recommended way to achieve that in Brax?
Going over previous tickets, seems like there are two approaches to achieve that:

Overriding the reset function, e.g., Manually Set Initial Pose of Agent in the Reset Function #457
Directly overriding the system's state: Resetting to a particular state #249

What would be the recommended way?
This PR indicates that using reset can significantly slow down the simulation, while I'm not sure if this approach would give a valid state for the simulator (for instance contact forces etc.)

Any ideas anyone?

Answered by erikfrey

Jul 2, 2024

You're welcome to rely on the Reset function to create a pool of states that are sampled from some distribution, or override the state yourself during Step - either will work and it depends on your use case.

Brax is designed so resetting has virtually no impact on simulation speed. My default suggestion is to just add your "start upright" states into reset and everything should just work:

def reset(rng: jax.Array) -> State:
  # set the pendulum upright 20% of the time
  up = jax.random.uniform(rng) < 0.2
  pendulum_ypos = jp.where(up, 1.0, 0.0)
  ...

View full answer

erikfrey · 2024-07-02T19:28:14Z

erikfrey
Jul 2, 2024
Maintainer

You're welcome to rely on the Reset function to create a pool of states that are sampled from some distribution, or override the state yourself during Step - either will work and it depends on your use case.

Brax is designed so resetting has virtually no impact on simulation speed. My default suggestion is to just add your "start upright" states into reset and everything should just work:

def reset(rng: jax.Array) -> State:
  # set the pendulum upright 20% of the time
  up = jax.random.uniform(rng) < 0.2
  pendulum_ypos = jp.where(up, 1.0, 0.0)
  ...

1 reply

yardenas Jul 2, 2024
Author

Thank you very much @erikfrey! 🙌

, or override the state yourself during Step - either will work and it depends on your use case.

Hope not a too trivial question -- what would be the correct way to do it via Step?

I'm also curious what would be the optimal way to make this distribution dynamic? For example, in an active learning setting, where we'd want to sample only novel/unseen states

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Recommended way to actively set system's state #498

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 1 comment 1 reply

{{title}}

{{title}}

Select a reply

Recommended way to actively set system's state #498

yardenas Jul 2, 2024

Replies: 1 comment · 1 reply

erikfrey Jul 2, 2024 Maintainer

yardenas Jul 2, 2024 Author

yardenas
Jul 2, 2024

Replies: 1 comment 1 reply

erikfrey
Jul 2, 2024
Maintainer

yardenas Jul 2, 2024
Author