-
In many sim2real RL applications, it makes sense to leverage our ability to "teleport" between states, without really following the dynamics. For example, in the cartpole environment, instead of exploring and learning to swing up the pendulum to initially get rewards, we can just sample the upright position (how do we know that the upright position is interesting is a matter for a different discussion though). What's the recommended way to achieve that in Brax?
What would be the recommended way? Any ideas anyone? |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 1 reply
-
You're welcome to rely on the Reset function to create a pool of states that are sampled from some distribution, or override the state yourself during Step - either will work and it depends on your use case. Brax is designed so resetting has virtually no impact on simulation speed. My default suggestion is to just add your "start upright" states into reset and everything should just work: def reset(rng: jax.Array) -> State:
# set the pendulum upright 20% of the time
up = jax.random.uniform(rng) < 0.2
pendulum_ypos = jp.where(up, 1.0, 0.0)
... |
Beta Was this translation helpful? Give feedback.
You're welcome to rely on the Reset function to create a pool of states that are sampled from some distribution, or override the state yourself during Step - either will work and it depends on your use case.
Brax is designed so resetting has virtually no impact on simulation speed. My default suggestion is to just add your "start upright" states into reset and everything should just work: