First working use of rtgym for Deep-RL applied to Gran Turismo 1. #48

NDR008 · 2023-05-11T09:53:01Z

NDR008
May 11, 2023

First working use of rtgym for Deep-RL via the Rlib framework applied to Gran Turismo on PCSX-Redux emu.. Communicating via TCP sockets, with protobuf for serialisation

Sharing my first working pipeline. :) Major mini-party.

https://youtu.be/zVrhbXNOHCc

yannbouteiller · 2023-05-11T15:00:48Z

yannbouteiller
May 11, 2023
Maintainer

Nice progress, thanks for the video :)

Just a detail: at the moment there is no option in rtgym to apply the default action in case of timestep timeout. It is probably possible to implement if you need that, but the philosophy of rtgym is more that timeouts should not happen (when they do, rtgym fires a warning and breaks elasticity of the next timestep, so that the upcoming timesteps stick to their nominal duration instead of being overly compressed to compensate for the overflowing delay).

Instead, the role of the default action has to do with the Markov structure of real-time environments around "reset" transitions. In real-time environments, time cannot be paused, so an action needs to be applied at all time (should this be the "no action" action). So when you call reset(), especially the first time, reset() needs a default action to apply in your environment, because you have to account for the fact that the world is not paused while you are computing your first action. This is where the default action intervenes.

0 replies

NDR008 · 2023-05-11T15:32:23Z

NDR008
May 11, 2023
Author

Thanks for the feedback and correcting that point. Will address it in the next issue.

…

On Thu, 11 May 2023, 17:00 Yann Bouteiller, ***@***.***> wrote: Nice progress, thanks for the video :) Just a detail: at the moment there is no option in rtgym to apply the default action in case of timestep timeout. It is probably possible to implement if you need that, but the philosophy of rtgym is more that timeouts should not happen (when they do, rtgym fires a warning and breaks elasticity of the next timestep, so that the upcoming timesteps stick to their nominal duration instead of being overly compressed to compensate for the overflowing delay). Instead, the role of the default action has to do with the Markov structure of real-time environments around "reset" transitions. In real-time environments, time cannot be paused, so an action needs to be applied at all time (should this be the "no action" action). So when you call reset(), especially the first time, reset() needs a default action to apply in your environment, because you have to account for the fact that the world is not paused while you are computing your first action. This is where the default action intervenes. — Reply to this email directly, view it on GitHub <#48 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AIKWRWTNKA4QNNS6QDLTWTTXFT5KVANCNFSM6AAAAAAX54EJ4M> . You are receiving this because you authored the thread.Message ID: ***@***.***>

0 replies

NDR008 · 2023-05-21T20:38:07Z

NDR008
May 21, 2023
Author

So after all the celebrations... I feel like I am back to square 1.
I could use the rllib PPO to train my agent to do well in a drag race (it learnt to not zig-zag towards the finish line, and to avoid the brake).

But, PPO only works smoothly with torch, in tf/tf2, things behave weirdly.
SAC just goes nuts after 14 epsidoes, (the timesteps go from 0.05s to .9s).
Not sure what is happening, but my GPU/CPU and main RAM are all OK.
(I got a 3090 on the way, but it should not be the issue really).

I revisited sb3, because their beta version supports gymnasium, but they do not support observation spaces that are tuples, only dicts.
So that felt like another wall.

My research has suggested SAC ought to be the best way... maybe I have no choice but to make a full implementation of tmrl?

18 replies

NDR008 May 24, 2023
Author

Is there a way for me to remove the action history from rtgym?

NDR008 May 24, 2023
Author

"My recommendation is rather to use remote training, by setting your worker on another machine and keeping your 3090 for the trainer. Both rllib and tmrl allow you to do this"

Just in case - do you know how to set the worker on a remote machine?

NDR008 May 24, 2023
Author

I think this is what I need to look into:
https://docs.ray.io/en/latest/rllib/rllib-env.html#Hierarchical:~:text=hierarchical_training.py.-,External,-Agents%20and%20Applications

I think the issue is not, CPU/GPU power, but that rllib is design to roll-up the environment into its workers, and it assume that the worker controls step(), but when there is a long delay between step() calls, I think rtgym still complains.
rllib is not aware that the environment is still stepping.
(And since the virtual gamepad remains in the last action-state, it looks like hanging, but isn't).

But it seems by using the client/server examples mentioned here might be my way: https://docs.ray.io/en/latest/rllib/rllib-env.html#external-agents-and-applications:~:text=hierarchical_training.py.-,External,-Agents%20and%20Applications

NDR008 Jun 7, 2023
Author

Is there a way for me to remove the action history from rtgym?

Sorry does rtgym work if I set the action buffer length to 0?

yannbouteiller Jun 8, 2023
Maintainer

You can remove the action buffer from rtgym by setting "act_in_obs" to False in the configuration dictionary.

Are you sure you want to do that though? It only makes sense if you are using an RNN or another way of handling real-time non-Markovness.

NDR008 · 2023-05-22T13:16:48Z

NDR008
May 22, 2023
Author

Do you think it is possible to make a simple wrapper to rtgym to convert the observation tuple to a dict?

1 reply

yannbouteiller May 22, 2023
Maintainer

Sure, gymnasium has a Wrapper class that you can use to transform the observation space of any gymnasium environment, including rtgym.

NDR008 · 2023-06-08T17:17:31Z

NDR008
Jun 8, 2023
Author

I just want to debug a few issues and tes.t some others. My plan was to fully stack the observation. I'm not sure why actions ought to be part of the observation to make it more Markovian...

…

On Thu, 8 Jun 2023, 18:49 Yann Bouteiller, ***@***.***> wrote: You can remove the action buffer from rtgym by setting "act_in_obs" to False in the configuration dictionary. Are you sure you want to do that though? It only makes sense if you are using an RNN or another way of handling real-time non-Markovness. — Reply to this email directly, view it on GitHub <#48 (reply in thread)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AIKWRWXELC7RL33LKPY4YF3XKH7CVANCNFSM6AAAAAAX54EJ4M> . You are receiving this because you authored the thread.Message ID: ***@***.***>

3 replies

yannbouteiller Jun 8, 2023
Maintainer

Because rtgym environments are real-time: you don't pause the simulator (unless you are performing computations between atomic game frames), thus stuff happens in the simulator while you are computing a new action, and thus you need the applied action to be part of the observation used to compute this new action, otherwise this observation does not satisfy the Markov property.

NDR008 Jun 11, 2023
Author

I understand what you are saying in "gut-feeling" sense, but not in a mathematical or theoretical sense.
Similarly, I guess the size of action history, is probably an empirically set parameter?

yannbouteiller Jun 11, 2023
Maintainer

No it is defined by the duration of your maximum delay (probably 2 time steps unless there is additional delay happening in the game after you sent the gamepad command). You can google "reinforcement learning with random delays" if you want to understand this in mathematical terms, but put simply, because time is not paused while you compute an action or even capture an observation, the action that your NN computes will be applied in a future state that is not the state it actually sees. To predict this future state, the NN needs to observe the actions that will lead to it.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

First working use of rtgym for Deep-RL applied to Gran Turismo 1. #48

{{title}}

Replies: 5 comments 22 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Select a reply

First working use of rtgym for Deep-RL applied to Gran Turismo 1. #48

NDR008 May 11, 2023

Replies: 5 comments · 22 replies

yannbouteiller May 11, 2023 Maintainer

NDR008 May 11, 2023 Author

NDR008 May 21, 2023 Author

NDR008 May 24, 2023 Author

NDR008 May 24, 2023 Author

NDR008 May 24, 2023 Author

NDR008 Jun 7, 2023 Author

yannbouteiller Jun 8, 2023 Maintainer

NDR008 May 22, 2023 Author

yannbouteiller May 22, 2023 Maintainer

NDR008 Jun 8, 2023 Author

yannbouteiller Jun 8, 2023 Maintainer

NDR008 Jun 11, 2023 Author

yannbouteiller Jun 11, 2023 Maintainer

NDR008
May 11, 2023

Replies: 5 comments 22 replies

yannbouteiller
May 11, 2023
Maintainer

NDR008
May 11, 2023
Author

NDR008
May 21, 2023
Author

NDR008 May 24, 2023
Author

NDR008 May 24, 2023
Author

NDR008 May 24, 2023
Author

NDR008 Jun 7, 2023
Author

yannbouteiller Jun 8, 2023
Maintainer

NDR008
May 22, 2023
Author

yannbouteiller May 22, 2023
Maintainer

NDR008
Jun 8, 2023
Author

yannbouteiller Jun 8, 2023
Maintainer

NDR008 Jun 11, 2023
Author

yannbouteiller Jun 11, 2023
Maintainer