Skip to content

Questions about the reward function #45

Answered by yannbouteiller
veczieds asked this question in Q&A
Discussion options

You must be logged in to vote

Hi!

So, in the TrackMania pipeline, the only reward is indeed the number of points passed from the demo trajectory during the previous timestep. This reward encompasses everything in theory: a trajectory that is better than the demo trajectory yields a higher reward, banging into walls yields a lower reward, etc.

It is true that SAC struggles at understanding that ramming into walls is not a great idea though. Several other works have noticed this, and they usually artificially add a punishment for collisions to avoid this issue altogether. In TrackMania, this issue can be alleviated with hyperparameter tuning though, and I believe the residual difficulty comes from non-Markovness essenti…

Replies: 4 comments 15 replies

Comment options

You must be logged in to vote
0 replies
Comment options

You must be logged in to vote
1 reply
@yannbouteiller
Comment options

Answer selected by veczieds
Comment options

You must be logged in to vote
0 replies
Comment options

You must be logged in to vote
14 replies
@veczieds
Comment options

@yannbouteiller
Comment options

@veczieds
Comment options

@yannbouteiller
Comment options

@veczieds
Comment options

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
3 participants