-
Notifications
You must be signed in to change notification settings - Fork 326
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DQN not learning on stacked frame inputs #7
Comments
The hyperparameters here were tuned to allow the agent to learn pong quickly, as opposed to all Atari games offered by OpenAI Gym. Pong is a relatively simple Atari game so framestacking is unnecessary. If you want to enable framestacking, you'll likely need to tune several of the other hyperparameters. A good place to start would be the hyperparameter values reported in the Nature DQN paper; however, I suspect you could tune those to enable quicker learning if you're only interested in Pong. |
@qfettes Got it. Yes, I am looking to train models for multiple Atari games. I will tinker with hyperparams for a while. Any ideas what params in particular would be particularly sensitive for this kind of change? I don't have much experience in RL training. Thanks again. |
All will be important. Epsilon decay, Target Net Update frequency, experience replay size, and the learning rate are probably the furthest "off" for working on general Atari games. Also note the original paper only performs an update every 4 timesteps; this agent updates every timestep. Finally, this code uses Huber loss rather than MSE |
@qfettes Thanks for the tips. Sorry for the barrage of questions, but doesn't your code also give an update every 4 time steps, due to the fact that we are using the "make_atari" function from the openai baselines to apply the Also, I cannot get the 01.DQN code to train, even without any modifications. The only thing that I have done is copy and paste the code to a .py file, replace the Any ideas? Thanks again for your time, I really appreciate it. |
Correct. Have a look at Table 1 in "Human-level control through deep reinforcement learning." You'll notice they both skip 4 frame (and repeat the selected action) and update once every 4th action. So 1 update every 16 frames is what was originally published. I'm unsure what is causing your issue. Running as-is in the IPython notebook should work correctly. Is it possible you changed something by mistake while copying the code? |
@MatthewInkawhich I wonder if this might help provide some clarity on frame skipping? |
Hi @qfettes, many thanks for this repo. I appreciate the readability of your code, and the implementation of several methods on the same environment (Pong).
I started by running 01.DQN in its original form, and it does not seem to have made progress during training. As suggested in the Readme, I used the relevant code from OpenAI Baselines for the env wrappers. The notebook 01.DQN.pynb is running straight off my pc. I know the as-is hyperparameters are sensible for Pong, since similar values without frame-stacking led to human-level results when running OpenAI Baselines and higgsfield's RL-Adventure on my computer. Sorry I haven't yet been able to identify possible reasons for the lack of learning. Perhaps I'll put it into a .py for de-bugging. But for now, I thought it might be useful to just put the issue out there. |
…tting code. Added MSE as an option for the loss function (now default for DQN). New results for 01.DQN.ipynb. Retesting other notebooks coming soon
Thank you @MatthewInkawhich for bringing this to my attention and @algebraic-mouse for confirming the issue. After some testing, you both were correct in your assessment. I introduced a bug in the training loop after one of the most recent commits; the bug has been fixed in the latest commit, and a few other QoL changes were made. The other notebooks will be receiving a similar check/update soon! |
The rewards aren't increasing. |
@BlueDi Double check to make sure you have pulled the most recent version. It has been recently verified to be working correctly. |
Hello! I am trying to train the DQN model (01.DQN) on the Pong task. I changed the
frame_stack
arg in thewrap_deepmind
function to True, however, the model does not learn anything. I was curious if you had any advice for this. Also, I was wondering why your default script usesframe_stack = False
? All of the papers appear to recommend feeding 4x84x84 inputs to infer temporal components of the environment such as ball velocity.Thanks for the nice readable repo!
The text was updated successfully, but these errors were encountered: