Replies: 9 comments 25 replies
-
Hi! An important assumption of (deep) RL is that what you feed to your model (NN) is Markov. This means that you feed your model all the information needed to predict the future. Equivalently, this means that no additional information from the past would be useful in any sense for your model to predict the future. Real-time RL essentially consists of dealing with this fact in a clever fashion. In the In For an in-depth exploration of this topic, you can google the paper "Reinforcement Learning with Random Delays". PS: you may be confused by the belief that past actions influence the future in some magical way: they do not. If we were controlling TrackMania in the video-game sense rather than in the robotic sense, we would do something that is not possible in real life: we would leave the simulation paused between frames to perform all the computations, then we would apply our computed action, and we would unpause the simulation. If we did this, we would not need an action buffer at all, because the effect of past actions would be integrated in the frames that we observe (assuming they contain the physical state of the car's dynamics). |
Beta Was this translation helpful? Give feedback.
-
Regarding your question about integrating the past frames in observation, it is of course possible in (Note that we do it manually, there is no rtgym-specific option to do it automatically. rtgym is only concerned with clocking your environment, it is up to you to define the content of your observations, except for the action buffer that rtgym appends automatically) |
Beta Was this translation helpful? Give feedback.
-
Regarding the 4-action buffer, makes sense now.
Regarding the 4 frames. I found many solutions using such an approach. Is
there a reference paper introducing this concept?
Is 4-frames just a hyper parameter for capturing dynamics?
…On Mon, 13 Feb 2023, 18:06 Yann Bouteiller, ***@***.***> wrote:
Regarding your question about integrating the past frames in observation,
it is of course possible in rtgym and this is what we do in Trackmania
here
<https://github.com/trackmania-rl/tmrl/blob/78e5468c5eecb963864a619f173b2189cac8dca1/tmrl/custom/custom_gym_interfaces.py#L207>
(with an history of 4 screenshot, this is how we represent the car's
dynamics)
—
Reply to this email directly, view it on GitHub
<#35 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AIKWRWS5H3UDNCNJHOCH3PLWXJSZPANCNFSM6AAAAAAU2KR6Z4>
.
You are receiving this because you authored the thread.Message ID:
***@***.***>
|
Beta Was this translation helpful? Give feedback.
-
I'm having a technical issue. I ran the rtgym tutorial and it worked fine.
I have not changed a single line of code. Could a process be running in the background even after killing the terminal? |
Beta Was this translation helpful? Give feedback.
-
Cloning the git worked... I how do I overwrite/edit the rtgym that I installed into my environment using |
Beta Was this translation helpful? Give feedback.
-
Ok, I understood what happened but not...
|
Beta Was this translation helpful? Give feedback.
-
Would it be difficult to migrate from gym to gymnasium? |
Beta Was this translation helpful? Give feedback.
-
|
Beta Was this translation helpful? Give feedback.
-
I thought that was what I had tried, but as long as you managed - I am
happy.
Does his mean I can use Baseline3 models rather than coding an algorithm
from scratch?
I have been sick for 1 month (more like a mini-burn out which resulted in
Tinnitus).
So I want to find the shortest path to get something bashed up - to get
something in the bag, then work on trying different things against
different assumptions/hypotheses and comparing results.
…On Sun, 19 Mar 2023 at 09:18, Yann Bouteiller ***@***.***> wrote:
I just replaced all instances of gym by gymnasium both in rtgym and tmrl,
worked like a charm!
—
Reply to this email directly, view it on GitHub
<#35 (reply in thread)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AIKWRWWCCEZTZP3KTQYCRMTW426O7ANCNFSM6AAAAAAU2KR6Z4>
.
You are receiving this because you authored the thread.Message ID:
***@***.***>
|
Beta Was this translation helpful? Give feedback.
-
Hey, so I am getting close to developing a method of reading from/to the Gran Turismo emulated environment.
So now getting close to the time of implementing rtgym.
(Bare in mind, I have no gym / gymnasium experience other than running tmrl).
I read
"What we need to do in order to make the observation space Markovian in this setting is to augment the available observation with the 4 last sent actions"
and
"will automatically append a buffer of the 4 last actions, but the observation space you define here must not take this buffer into account."
I am little lost why you need the last 4 actions.
Here is what I understood, that if the next observation is too late, your method skips the pre-prepared action - right?
I guess in comparison to the work here: https://www.cs.toronto.edu/~vmnih/docs/dqn.pdf <-- here they skip k_frames and keep pumping the same action, so for O_0, a_0, a_k1, a_k would be sent, and then O_k, leads to a_k, a_k+1, a_k+2, ..a_2k
Your work is neater in that it tries to maximise the maximum observation rate that is possible and only skips when it just couldn't keep up.
But I am not sure why 4 last actions are needed or how they are used....
In addition, I was thinking of feeding my NN the last N observation frames (my guess was N = 3), to get a sense of motion as part of the observation without defining it.
I am wondering if this method would be incompatible with rtgym. My instinct tells me not, since the last N_observations are accepted to have different time-steps (so say 3 frames, 50ms apart share a different story from 2 frames 50ms and 1 frame 70ms apart).
Beta Was this translation helpful? Give feedback.
All reactions