First working use of rtgym for Deep-RL applied to Gran Turismo 1. #48
Replies: 5 comments 22 replies
-
Nice progress, thanks for the video :) Just a detail: at the moment there is no option in rtgym to apply the default action in case of timestep timeout. It is probably possible to implement if you need that, but the philosophy of rtgym is more that timeouts should not happen (when they do, rtgym fires a warning and breaks elasticity of the next timestep, so that the upcoming timesteps stick to their nominal duration instead of being overly compressed to compensate for the overflowing delay). Instead, the role of the default action has to do with the Markov structure of real-time environments around "reset" transitions. In real-time environments, time cannot be paused, so an action needs to be applied at all time (should this be the "no action" action). So when you call reset(), especially the first time, reset() needs a default action to apply in your environment, because you have to account for the fact that the world is not paused while you are computing your first action. This is where the default action intervenes. |
Beta Was this translation helpful? Give feedback.
-
Thanks for the feedback and correcting that point. Will address it in the
next issue.
…On Thu, 11 May 2023, 17:00 Yann Bouteiller, ***@***.***> wrote:
Nice progress, thanks for the video :)
Just a detail: at the moment there is no option in rtgym to apply the
default action in case of timestep timeout. It is probably possible to
implement if you need that, but the philosophy of rtgym is more that
timeouts should not happen (when they do, rtgym fires a warning and breaks
elasticity of the next timestep, so that the upcoming timesteps stick to
their nominal duration instead of being overly compressed to compensate for
the overflowing delay).
Instead, the role of the default action has to do with the Markov
structure of real-time environments around "reset" transitions. In
real-time environments, time cannot be paused, so an action needs to be
applied at all time (should this be the "no action" action). So when you
call reset(), especially the first time, reset() needs a default action to
apply in your environment, because you have to account for the fact that
the world is not paused while you are computing your first action. This is
where the default action intervenes.
—
Reply to this email directly, view it on GitHub
<#48 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AIKWRWTNKA4QNNS6QDLTWTTXFT5KVANCNFSM6AAAAAAX54EJ4M>
.
You are receiving this because you authored the thread.Message ID:
***@***.***>
|
Beta Was this translation helpful? Give feedback.
-
So after all the celebrations... I feel like I am back to square 1. But, PPO only works smoothly with torch, in tf/tf2, things behave weirdly. I revisited sb3, because their beta version supports gymnasium, but they do not support observation spaces that are tuples, only dicts. My research has suggested SAC ought to be the best way... maybe I have no choice but to make a full implementation of tmrl? |
Beta Was this translation helpful? Give feedback.
-
Do you think it is possible to make a simple wrapper to rtgym to convert the observation tuple to a dict? |
Beta Was this translation helpful? Give feedback.
-
I just want to debug a few issues and tes.t some others. My plan was to
fully stack the observation. I'm not sure why actions ought to be part of
the observation to make it more Markovian...
…On Thu, 8 Jun 2023, 18:49 Yann Bouteiller, ***@***.***> wrote:
You can remove the action buffer from rtgym by setting "act_in_obs" to
False in the configuration dictionary.
Are you sure you want to do that though? It only makes sense if you are
using an RNN or another way of handling real-time non-Markovness.
—
Reply to this email directly, view it on GitHub
<#48 (reply in thread)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AIKWRWXELC7RL33LKPY4YF3XKH7CVANCNFSM6AAAAAAX54EJ4M>
.
You are receiving this because you authored the thread.Message ID:
***@***.***>
|
Beta Was this translation helpful? Give feedback.
-
First working use of rtgym for Deep-RL via the Rlib framework applied to Gran Turismo on PCSX-Redux emu.. Communicating via TCP sockets, with protobuf for serialisation
Sharing my first working pipeline. :) Major mini-party.
https://youtu.be/zVrhbXNOHCc
Beta Was this translation helpful? Give feedback.
All reactions