Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DQN not learning on stacked frame inputs #7

Closed
MatthewInkawhich opened this issue Jan 28, 2019 · 10 comments
Closed

DQN not learning on stacked frame inputs #7

MatthewInkawhich opened this issue Jan 28, 2019 · 10 comments

Comments

@MatthewInkawhich
Copy link

Hello! I am trying to train the DQN model (01.DQN) on the Pong task. I changed the frame_stack arg in the wrap_deepmind function to True, however, the model does not learn anything. I was curious if you had any advice for this. Also, I was wondering why your default script uses frame_stack = False? All of the papers appear to recommend feeding 4x84x84 inputs to infer temporal components of the environment such as ball velocity.

Thanks for the nice readable repo!

@qfettes
Copy link
Owner

qfettes commented Jan 28, 2019

The hyperparameters here were tuned to allow the agent to learn pong quickly, as opposed to all Atari games offered by OpenAI Gym. Pong is a relatively simple Atari game so framestacking is unnecessary.

If you want to enable framestacking, you'll likely need to tune several of the other hyperparameters. A good place to start would be the hyperparameter values reported in the Nature DQN paper; however, I suspect you could tune those to enable quicker learning if you're only interested in Pong.

@MatthewInkawhich
Copy link
Author

@qfettes Got it. Yes, I am looking to train models for multiple Atari games. I will tinker with hyperparams for a while. Any ideas what params in particular would be particularly sensitive for this kind of change? I don't have much experience in RL training.

Thanks again.

@qfettes
Copy link
Owner

qfettes commented Jan 29, 2019

All will be important.

Epsilon decay, Target Net Update frequency, experience replay size, and the learning rate are probably the furthest "off" for working on general Atari games.

Also note the original paper only performs an update every 4 timesteps; this agent updates every timestep. Finally, this code uses Huber loss rather than MSE

@MatthewInkawhich
Copy link
Author

MatthewInkawhich commented Jan 29, 2019

@qfettes Thanks for the tips. Sorry for the barrage of questions, but doesn't your code also give an update every 4 time steps, due to the fact that we are using the "make_atari" function from the openai baselines to apply the env = MaxAndSkipEnv(env, skip=4) wrapper ? This wrapper should handle the skipping (and action repeating) of 4 time steps for us, no? Or do we have to update every 4 "observed" time steps through the use of this wrapper (equivalent to every 16 "real" time steps with the frame skipping)?

Also, I cannot get the 01.DQN code to train, even without any modifications. The only thing that I have done is copy and paste the code to a .py file, replace the plot function calls with a call to print the average loss and reward over the last 10000 steps, and run on a CUDA server. I am observing no reward gain at all.

Any ideas? Thanks again for your time, I really appreciate it.

@qfettes
Copy link
Owner

qfettes commented Jan 30, 2019

Correct. Have a look at Table 1 in "Human-level control through deep reinforcement learning." You'll notice they both skip 4 frame (and repeat the selected action) and update once every 4th action. So 1 update every 16 frames is what was originally published.

I'm unsure what is causing your issue. Running as-is in the IPython notebook should work correctly. Is it possible you changed something by mistake while copying the code?

@joleeson
Copy link

joleeson commented Feb 8, 2019

@MatthewInkawhich I wonder if this might help provide some clarity on frame skipping?

@joleeson
Copy link

joleeson commented Feb 9, 2019

Hi @qfettes, many thanks for this repo. I appreciate the readability of your code, and the implementation of several methods on the same environment (Pong).

Also, I cannot get the 01.DQN code to train, even without any modifications.

I started by running 01.DQN in its original form, and it does not seem to have made progress during training. As suggested in the Readme, I used the relevant code from OpenAI Baselines for the env wrappers. The notebook 01.DQN.pynb is running straight off my pc. I know the as-is hyperparameters are sensible for Pong, since similar values without frame-stacking led to human-level results when running OpenAI Baselines and higgsfield's RL-Adventure on my computer.

Sorry I haven't yet been able to identify possible reasons for the lack of learning. Perhaps I'll put it into a .py for de-bugging. But for now, I thought it might be useful to just put the issue out there.

01 dqn-pong-results

qfettes added a commit that referenced this issue Feb 11, 2019
…tting code. Added MSE as an option for the loss function (now default for DQN). New results for 01.DQN.ipynb. Retesting other notebooks coming soon
@qfettes
Copy link
Owner

qfettes commented Feb 11, 2019

Thank you @MatthewInkawhich for bringing this to my attention and @algebraic-mouse for confirming the issue. After some testing, you both were correct in your assessment. I introduced a bug in the training loop after one of the most recent commits; the bug has been fixed in the latest commit, and a few other QoL changes were made. The other notebooks will be receiving a similar check/update soon!

@qfettes qfettes closed this as completed Feb 11, 2019
@BlueDi
Copy link

BlueDi commented Feb 24, 2019

The rewards aren't increasing.

@qfettes
Copy link
Owner

qfettes commented Feb 25, 2019

@BlueDi Double check to make sure you have pulled the most recent version. It has been recently verified to be working correctly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants