Discussion about how rtgym works #35

NDR008 · 2023-02-13T14:25:22Z

NDR008
Feb 13, 2023

Hey, so I am getting close to developing a method of reading from/to the Gran Turismo emulated environment.
So now getting close to the time of implementing rtgym.
(Bare in mind, I have no gym / gymnasium experience other than running tmrl).

I read
"What we need to do in order to make the observation space Markovian in this setting is to augment the available observation with the 4 last sent actions"
and
"will automatically append a buffer of the 4 last actions, but the observation space you define here must not take this buffer into account."

I am little lost why you need the last 4 actions.

Here is what I understood, that if the next observation is too late, your method skips the pre-prepared action - right?
I guess in comparison to the work here: https://www.cs.toronto.edu/~vmnih/docs/dqn.pdf <-- here they skip k_frames and keep pumping the same action, so for O_0, a_0, a_k1, a_k would be sent, and then O_k, leads to a_k, a_k+1, a_k+2, ..a_2k
Your work is neater in that it tries to maximise the maximum observation rate that is possible and only skips when it just couldn't keep up.

But I am not sure why 4 last actions are needed or how they are used....

In addition, I was thinking of feeding my NN the last N observation frames (my guess was N = 3), to get a sense of motion as part of the observation without defining it.
I am wondering if this method would be incompatible with rtgym. My instinct tells me not, since the last N_observations are accepted to have different time-steps (so say 3 frames, 50ms apart share a different story from 2 frames 50ms and 1 frame 70ms apart).

yannbouteiller · 2023-02-13T16:26:31Z

yannbouteiller
Feb 13, 2023
Maintainer

Hi!

An important assumption of (deep) RL is that what you feed to your model (NN) is Markov.

This means that you feed your model all the information needed to predict the future. Equivalently, this means that no additional information from the past would be useful in any sense for your model to predict the future.

Real-time RL essentially consists of dealing with this fact in a clever fashion. In the rtgym tutorial, the task is very complex because it features a remotely controlled drone with random communication delays that can go up to 4 timesteps. In other words, it is possible that, at some point, you need to take into account the 4 last actions that you sent to the drone when computing a new action from a (delayed) obsdrvation, because it is possible that all 4 actions get applied by the drone by the time the action you are currently computing gets applied as well.

In tmrl TrackMania, the situation is much simpler because the delay is constantly of 1 timestep, which is the duration of a forward pass in the actor network (+ 1 additional timestep accounting for the fact that observation capture is not instantaneous). Thus, in TrackMania, we need only an history of 2 actions to account for real-time considerations.

For an in-depth exploration of this topic, you can google the paper "Reinforcement Learning with Random Delays".

PS: you may be confused by the belief that past actions influence the future in some magical way: they do not. If we were controlling TrackMania in the video-game sense rather than in the robotic sense, we would do something that is not possible in real life: we would leave the simulation paused between frames to perform all the computations, then we would apply our computed action, and we would unpause the simulation. If we did this, we would not need an action buffer at all, because the effect of past actions would be integrated in the frames that we observe (assuming they contain the physical state of the car's dynamics).

0 replies

yannbouteiller · 2023-02-13T17:06:21Z

yannbouteiller
Feb 13, 2023
Maintainer

Regarding your question about integrating the past frames in observation, it is of course possible in rtgym and this is what we do in Trackmania here (with an history of 4 screenshots, this is how we represent the car's dynamics)

(Note that we do it manually, there is no rtgym-specific option to do it automatically. rtgym is only concerned with clocking your environment, it is up to you to define the content of your observations, except for the action buffer that rtgym appends automatically)

0 replies

NDR008 · 2023-02-13T17:40:23Z

NDR008
Feb 13, 2023
Author

Regarding the 4-action buffer, makes sense now. Regarding the 4 frames. I found many solutions using such an approach. Is there a reference paper introducing this concept? Is 4-frames just a hyper parameter for capturing dynamics?

…

On Mon, 13 Feb 2023, 18:06 Yann Bouteiller, ***@***.***> wrote: Regarding your question about integrating the past frames in observation, it is of course possible in rtgym and this is what we do in Trackmania here <https://github.com/trackmania-rl/tmrl/blob/78e5468c5eecb963864a619f173b2189cac8dca1/tmrl/custom/custom_gym_interfaces.py#L207> (with an history of 4 screenshot, this is how we represent the car's dynamics) — Reply to this email directly, view it on GitHub <#35 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AIKWRWS5H3UDNCNJHOCH3PLWXJSZPANCNFSM6AAAAAAU2KR6Z4> . You are receiving this because you authored the thread.Message ID: ***@***.***>

2 replies

yannbouteiller Feb 13, 2023
Maintainer

Yes it is an hyperparameter.

It has some physical sense though.

2 frames is theoretically enough to compute an estimate of the velocity.

So 3 frames is 2 velocities estimates, i.e. an estimate of the acceleration.

4 frames gives you an estimate of dynamics of order 3, etc.

Of course, there is some delay in these theorical estimates, because of how spaced in time frames are.

But the higher your estimated dynamics' order, the better you can compensate for this delay I think. For instance with only 2 frames I know the avergage velocity between the two frames, but with 3 frames I can compute a better estimate of the velocity when the last frame is captured by looking at the acceleration.

yannbouteiller Feb 13, 2023
Maintainer

(And with rtgym it makes even more sense to have a longer buffer of frames when there is some jitter in timing, for instance because capturing screenshots takes a stochastic amount of time)

NDR008 · 2023-02-15T22:04:43Z

NDR008
Feb 15, 2023
Author

I'm having a technical issue. I ran the rtgym tutorial and it worked fine.
But now if I try to re-run that script....

ImportError: cannot import name 'RealTimeGymInterface' from partially initialized module 'rtgym' (most likely due to a circular import) (j:\git\bla\rtgym.py)

I have not changed a single line of code. Could a process be running in the background even after killing the terminal?

4 replies

yannbouteiller Feb 16, 2023
Maintainer

Circular imports are my nightmare when coding in Python.

It has nothing to do with a process running in the background, it means there is a circular chain of "import" statements somewhere in your program. What do you do to get this error exactly, can you provide a way to reproduce it please?

NDR008 Feb 18, 2023
Author

Nothing mind blowing...
trying to get this file to run: https://github.com/NDR008/TensorFlowPSX/blob/implement-rtgym/Py/rtgym.py
(I was able to run it once...)

NDR008 Feb 18, 2023
Author

Full error is:

J:\git\TensorFlowPSX>C:/Users/user/anaconda3/envs/GTAI2/python.exe j:/git/TensorFlowPSX/Py/rtgym.py
Traceback (most recent call last):
  File "j:\git\TensorFlowPSX\Py\rtgym.py", line 3, in <module>
    from rtgym import RealTimeGymInterface, DEFAULT_CONFIG_DICT, DummyRCDrone
  File "j:\git\TensorFlowPSX\Py\rtgym.py", line 3, in <module>
    from rtgym import RealTimeGymInterface, DEFAULT_CONFIG_DICT, DummyRCDrone
ImportError: cannot import name 'RealTimeGymInterface' from partially initialized module 'rtgym' (most likely due to a circular import) (j:\git\TensorFlowPSX\Py\rtgym.py)

yannbouteiller Feb 19, 2023
Maintainer

I have never had this error but this file might be the culprit, what happens if you clone the rtgym repo, erase the content of this file, install locally with pip install -e . and retry?

NDR008 · 2023-02-22T21:03:33Z

NDR008
Feb 22, 2023
Author

Cloning the git worked... I how do I overwrite/edit the rtgym that I installed into my environment using pip install rtgym

1 reply

yannbouteiller Feb 22, 2023
Maintainer

Did you edit the __init__.py file that I linked to make it work? Or did you do something else?

If you say cloning the repo has worked, I suppose you already overrode your local rtgym installation. To override your local installation, you just have to do pip install -e . where the setup.py file is.

To update the rtgym version that is on PyPI, you can submit a Pull Request on the rtgym repo with whatever change you did to make it work, I'll review it (since I am the maintainer of rtgym as well), and if it makes sense I'll accept the PR and upload a new version on PyPI such that it gets automatically installed when people do pip install rtgym

NDR008 · 2023-02-22T21:24:17Z

NDR008
Feb 22, 2023
Author

Ok, I understood what happened but not...

So, relying on the environment from pip install rtgym did not work.
Cloning the repo and then copying the rtgym folder into my own project, worked (even though I did not edit init.py).
pip uninstall rtgym and then running the pip install -e . from the root folder of the cloned rtgym, and trying to re-run the file in my own project, was not working again...
editing rtgym\envs_init_.py, pip uninstall rtgym, pip install -e . still did not work :(

10 replies

NDR008 Feb 22, 2023
Author

Hold on... me stupid :p (I think I know what is wrong, let me confirm it)
[But in any case, Py is 3.9.9]

NDR008 Feb 22, 2023
Author

So sorry for the stress but... naming my file rtgym.py was the circularity :P
renaming it to something else was good (and relying on the PyPI library)

yannbouteiller Feb 22, 2023
Maintainer

Haaaaa indeed, nice catch ^^ Yes, importing rtgym in a file named rtgym sounds like something Python would not enjoy haha

NDR008 Feb 22, 2023
Author

It still was bizarre, because it did not like it so long as it had to fetch the env level library, when having a local library Python was able to move on past the filename issue. Great, now I can get back to slowly and painfully understanding this, and figuring out how to implement it into my unique project.
(Also no real gym experience here).
Will try to give it a serious attempt this weekend.

yannbouteiller Feb 22, 2023
Maintainer

As a matter of fact few people have experience building gym environments, they often just use whatever is readily available. Building rtgym environments is even more niche because it tackles an issue that only the people who build real-world applications run into.

The tutorial should hopefully help you get on track regardless of your initial experience, but if you have any question don't hesitate :)

NDR008 · 2023-02-23T19:09:55Z

NDR008
Feb 23, 2023
Author

Would it be difficult to migrate from gym to gymnasium?

1 reply

yannbouteiller Feb 23, 2023
Maintainer

Depends, but why would you want to do that? gym is the official project, and it has the terminated/truncated signal that old versions lacked.

NDR008 · 2023-02-23T19:16:49Z

NDR008
Feb 23, 2023
Author

The team that has been maintaining Gym since 2021 has moved all future development to [Gymnasium](https://github.com/Farama-Foundation/Gymnasium), a drop in replacement for Gym (import gymnasium as gym), and Gym will not be receiving any future updates. Please switch over to Gymnasium as soon as you're able to do so.
So I thought it would be good to try, but my quick attempt failed.

6 replies

NDR008 Feb 24, 2023
Author

Yeah... It's not going to be that easy 🤣
That was my first try.

yannbouteiller Feb 24, 2023
Maintainer

What did you try?

Idk, these new Gym people have been doing all kinds of crazy things since they are in charge :$

yannbouteiller Mar 19, 2023
Maintainer

Well, on gym maintainers' request, tmrl has been upgraded to gymnasium. It was in fact rather straightforward :P

NDR008 Mar 19, 2023
Author

Oooo what did you do?

yannbouteiller Mar 19, 2023
Maintainer

I just replaced all instances of gym by gymnasium both in rtgym and tmrl, worked like a charm!

NDR008 · 2023-03-19T09:06:48Z

NDR008
Mar 19, 2023
Author

I thought that was what I had tried, but as long as you managed - I am happy. Does his mean I can use Baseline3 models rather than coding an algorithm from scratch? I have been sick for 1 month (more like a mini-burn out which resulted in Tinnitus). So I want to find the shortest path to get something bashed up - to get something in the bag, then work on trying different things against different assumptions/hypotheses and comparing results.

…

On Sun, 19 Mar 2023 at 09:18, Yann Bouteiller ***@***.***> wrote: I just replaced all instances of gym by gymnasium both in rtgym and tmrl, worked like a charm! — Reply to this email directly, view it on GitHub <#35 (reply in thread)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AIKWRWWCCEZTZP3KTQYCRMTW426O7ANCNFSM6AAAAAAU2KR6Z4> . You are receiving this because you authored the thread.Message ID: ***@***.***>

1 reply

yannbouteiller Mar 19, 2023
Maintainer

You can use SB3, although I would recommend to copy paste whatever algo you are interested in from them and adapt it to the tmrl pipeline if you are going ton implement a real-time robot/video game environment. Not sure whether SB3 can handle this kind of environments at all.

Discussion about how rtgym works #35

NDR008 Feb 13, 2023

Replies: 9 comments · 25 replies

yannbouteiller Feb 13, 2023 Maintainer

yannbouteiller Feb 13, 2023 Maintainer

NDR008 Feb 13, 2023 Author

yannbouteiller Feb 13, 2023 Maintainer

yannbouteiller Feb 13, 2023 Maintainer

NDR008 Feb 15, 2023 Author

yannbouteiller Feb 16, 2023 Maintainer

NDR008 Feb 18, 2023 Author

NDR008 Feb 18, 2023 Author

yannbouteiller Feb 19, 2023 Maintainer

NDR008 Feb 22, 2023 Author

yannbouteiller Feb 22, 2023 Maintainer

NDR008 Feb 22, 2023 Author

NDR008 Feb 22, 2023 Author

NDR008 Feb 22, 2023 Author

yannbouteiller Feb 22, 2023 Maintainer

NDR008 Feb 22, 2023 Author

yannbouteiller Feb 22, 2023 Maintainer

NDR008 Feb 23, 2023 Author

yannbouteiller Feb 23, 2023 Maintainer

NDR008 Feb 23, 2023 Author

NDR008 Feb 24, 2023 Author

yannbouteiller Feb 24, 2023 Maintainer

yannbouteiller Mar 19, 2023 Maintainer

NDR008 Mar 19, 2023 Author

yannbouteiller Mar 19, 2023 Maintainer

NDR008 Mar 19, 2023 Author

yannbouteiller Mar 19, 2023 Maintainer

NDR008
Feb 13, 2023

Replies: 9 comments 25 replies

yannbouteiller
Feb 13, 2023
Maintainer

yannbouteiller
Feb 13, 2023
Maintainer

NDR008
Feb 13, 2023
Author

yannbouteiller Feb 13, 2023
Maintainer

yannbouteiller Feb 13, 2023
Maintainer

NDR008
Feb 15, 2023
Author

yannbouteiller Feb 16, 2023
Maintainer

NDR008 Feb 18, 2023
Author

NDR008 Feb 18, 2023
Author

yannbouteiller Feb 19, 2023
Maintainer

NDR008
Feb 22, 2023
Author

yannbouteiller Feb 22, 2023
Maintainer

NDR008
Feb 22, 2023
Author

NDR008 Feb 22, 2023
Author

NDR008 Feb 22, 2023
Author

yannbouteiller Feb 22, 2023
Maintainer

NDR008 Feb 22, 2023
Author

yannbouteiller Feb 22, 2023
Maintainer

NDR008
Feb 23, 2023
Author

yannbouteiller Feb 23, 2023
Maintainer

NDR008
Feb 23, 2023
Author

NDR008 Feb 24, 2023
Author

yannbouteiller Feb 24, 2023
Maintainer

yannbouteiller Mar 19, 2023
Maintainer

NDR008 Mar 19, 2023
Author

yannbouteiller Mar 19, 2023
Maintainer

NDR008
Mar 19, 2023
Author

yannbouteiller Mar 19, 2023
Maintainer