-
Notifications
You must be signed in to change notification settings - Fork 267
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Has anyone tried using image input? #11
Comments
Are you sure you are getting the correct inputs, check #4 |
@saiprabhakar Yes, It is correct. Before doing some modification on the code, it outputs gray image. Also, I have tried to use pre-trained actor(using low-dimensional states) as supervisor to train the actor(using image input), which worked pretty well. The critic also could generate plausible q-value. But when I turned off supervised learning using RL, the car just keeps turning left. This is how I build my Critic mode
|
Are you saying that you used two actors one using the low-dimensional states which is already trained, and another using images, you used first net to train the second? So you trained actor and critic separately? For me training using both image and low-dim states together, the training sometime takes ~300 episodes. I haven't used just image to train the network. I don't think using a single frames image will work very well since it doesn't have enough information. But I think since you say it keeps turning left, I think training more would work. |
@saiprabhakar Thanks for your reply. I am using actually three networks. pre-trained actor( I call it guide), actor and critic. For the supervised section. The method for training critic is the same as DDPG provided. For the actor, instead using derivative from critic. I use action generated by guide as label, using mean square error, to minimize the error between guide and actor. I have also tried concatenate image (convolved, then flatten) with the low-dimensional sensor data, it is still not working. I was wondering how are you designing your convolution layer for processing the image and do you change any other parts of the code. |
I didnt do anything special for image, like you mention I convolved and flattened it. How much training episodes did you run it for. Which action input are you giving to the critic the guide's or other's? Just to be clear when you said you turned off the supervised training, did you mean you started the training from scratch without using the guide actor, or just turned of the guide and used critic's gradients? If its the second case, then may be there some destabilization going on (I am not familiar with using pre-trained actor approach). The guide actor is an interesting idea is there any literature on it? |
Thank you, That is wired for our network. we fetch 4 frame gray image as 4 channels. we trained 2000 episodes, it still act wired, especially steer (it keeps one for ever). Did you also setup the image in such approach? I would appreciate if you can share the network configuration parameters I mean I start training with supervised learning, then I turned off the guide and used critic's gradients. I give all three actions (brake, steer, throttle) to the critic the second network (not guide's). the idea is from Ruslan paper for knowledge transfer: https://arxiv.org/pdf/1511.06342v4.pdf |
Did you change the TORCS into 64x64 image size? gym_torcs only support
64x64 pixel model.
…On Wed, Jan 18, 2017 at 6:51 AM, Sufeng Niu ***@***.***> wrote:
Thank you, That is wired for our network. we fetch 4 frame gray image as 4
channels. we trained 2000 episodes, it still act wired, especially steer
(it keeps one for ever). Did you also setup the image in such approach? I
would appreciate if you can share the network configuration parameters
I mean I start training with supervised learning, then I turned off the
guide and used critic's gradients. I give all three actions (brake, steer,
throttle) to the critic the second network (not guide's). the idea is from
Ruslan paper for knowledge transfer: https://arxiv.org/pdf/1511.
06342v4.pdf
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#11 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AO1sYyycv669qk40hpw9jzd6kprRBwHMks5rTUXxgaJpZM4LmGWW>
.
|
yes, I have extracted and visualize the image in python, and it is 64 by 64 image. I found online mentioned that "DDPG is extremely sensitive to bad hyperparameters. I haven't tried on TORCS, but for other control problems I have found reasonably good results by precisely following the recipe in the appendix of the DDPG paper. In my experience even small details (like initializing weights with a uniform distribution rather than a Guassian of the same scale) make a difference." I was guessing the tuning parameters play the trick. |
@sufengniu I have the same problem as you. |
@stepjam @sufengniu Hi, I am able to get the DDPG agent works using only 4 sequential images as its observation, the performance is comparable with @yanpanlau original implementation using laser+physical states. Some implementation details for your reference:
|
@ghliu Thank you very much for your info. I will try it. will post the results later if I got any news. |
@ghliu Thank you also. Was this using Batch Normalization? |
@stepjam Yeah, you are right. I forgot to mention :P Just updated the previous comments. |
@ghliu Thank you for your suggestions! I am also trying to make this work using images as input. I have a few doubts regarding your work. @sufengniu Hi! Have you had any success in training a vision based DDPG model? |
@pavitrakumar78 -- I have tried many variations, and my thinking is that the hyper-parameters are super sensitive. Unless you have them within this sweet spot, you just end up with garbage. Edit: Just to clarify, I'm talking about the algorithm itself, not the implementation. |
@stepjam Hi! Thanks for your input! Yes, it seems that way in the tests that I have been doing, though what I've been working on is not exactly DDPG, it is somewhat closer to it. I am still trying to figure out what is best ! :) |
@pavitrakumar78 I agree with @stepjam, I think both DDPG and actor-critic algorithms are very sensitive to hyper-parameter settings. I also have same experiences on DDPG for other environment. it might need to explore how to set good initialization and hyperparameters for TOCRS. By the way, mine is also not working |
@pavitrakumar78 , to answer your questions
Most of the hyper-parameters remain the same, only the minibatch size is reduced to 16 instead of 32, as suggested by the original DDPG paper. And I am using Keras 1.1.0. I noticed there are some differences w.r.t. BN layer implementation among Keras version. At least the parameters printed out from model.summary() is different from Keras 1.2.0 using same code. From my experience, both training and testing is working fine here, and I just did more experiments recently to verify it again. (Actually same vision input also works with other algorithms such as CDQN(NAF).) BN implementation is currently my only suspicion, other than that I have no clues yet. I am planning to release codes, network weights, and probably some videos later. |
@ghliu Hi! Thanks for your answers!
and Critic:
I am trying 2 models - one with only steering as output and another with all 3 outputs (steering, break, accel.). So far, the in a sample run with all 3, I observed that the steering was largely positive and very close to zero (10^-5) and accel and break seemes to be even close to zero - so the car just stays in one spot. @ghliu In your tests, did you use the same gym_torcs.py file as this repo, since the code in this repo has some changes regarding what we consider a terminal state as and some minor changes to how reward is calculated |
@pavitrakumar78 I am using the gym_torcs.py as this repo, but modified quite a lot for other usages. Reward function remained the same, which is sightly different from the original paper. I terminated the episode if the car runs backward (and give it high negative reward), and if the car is out of track with no progress and negative speedX. |
@ghliu Thanks for your reply! Yes that I noticed that behaviour also, the agent seems to stay inside the road, it seems to have somehow figure out steering, but acceleration becomes less and brake gradually increases so it actually doesn't move. |
@pavitrakumar78 Hi, any good news with your vision experiment (1 output or 3 outputs)? I've tried many hyper-parameters settings both for 1 output (steer) and 3 outputs (steer, accel., brake). When the output is just steer and the network starts to train, the output will quickly reaching their maximum/minimum value. And after training for 1000 epochs, the output is still the maximum/minimum value. As you say above: the agent seems to have figured out steering, do you meet the above situation when training? Many thanks. |
@dongleecsu Hi, yes - I observed the explained scenario while training. But as I said, even though it managed to stay inside the road - the acceleration and break parameters are not learned properly. Unfortunately, I haven't had time to use only steering as output and run training again because I am working on another topic now. If the output quickly reaches the max value i.e always goes to left or always goes to right then it is actually not learning. You might have to try playing around with network configurations or try modelling rewards in a different way. |
Hi, I think the original code in this repo doesn't take into account the past frames. Have you already implemented that in your code in your repo? Thanks |
@ghliu Hi, I tried to train my model by only using images, but I found that the runtime increased from 0.2s to 0.4s in a cycle (from choosing an action to next choosing an action). Is this reasonable? How long was the runtime in a cycle when you trained your model? |
@Sophistt Did the model train and produce reasonable output? |
@ghliu do you train your model with image as input from scratch?I failed to do so ,the agent sharply turn right finally· |
Hi everyone, My implementation of ddpg is hugely based off the one from openai and i modify this original repo so that the class TorcsEnv inherits from gym.env to make it compatible with the code from openai. Thank you very much for any advice about how to make it work! :) |
I had the same problem (but I fixed it, scroll down for solution) - the actor was almost always predicting the max/min value after a short number of iterations and then staying that way. When this was happening, my actor and critic were 2 layers (64 neurons, 64 neurons) with relu hidden activations and tanh and linear output activations, respectively. The actor used a learning rate of 1e-4 while the critic used a learning rate of 1e-3; the critic also used a weight decay of 1e-2. The actor's output would reach the min/max values and then stay there because the gradients around that area are near zero. The critic's gradient scale for the actor (for the ddpg update rule) was also really small. I tried a million things to try to stop the actor from falling into this rut. I tried 20 different hidden activation functions for both the actor and critic, I tried adding neurons for both the actor and critic, I tried different architectures, and I tried all the optimizers available to me (including Adam, Adamax, Adagrad, Adadelta, RMSProp, Rprop, and SGD). SOLUTION: The thing that did it for me that resulted in CONSISTENT mid-range actor output and return improvement was just increasing the number of neurons for the critic (and not the actor). My final actor had 2 layers (64 neurons, 64 neurons) and my final critic had 2 layers (200 neurons, 200 neurons). Perhaps someone could elaborate why this ended up working for me. EDIT: I should also note, my actor had 30 inputs while my critic had about 100 (using extra state information similar to MADDPG). |
Hi @theOGognf , Thank you for your suggestions! I haven't work on this project for a while, but I am still curiosity how it works. To clarify your methods, what you did is only adding more neurons on critics to 200-200 while keeps actor 64-64? Also, you mentioned your actor has 30 inputs, and critic has 100, can you provide more details here? I thought the actor and critic use image based input and share the initial convolution layers. Thank you! |
Correct - I just added more neurons to the critic. My critic makes use of extra environment information (such as other agents' actions) because I was working on a multi-agent problem. This is why my critic has more inputs than my actor. Also, I apologize for not clarifying, but I didn't have any convolutional layers because I just used MLPs. |
Hi @theOGognf, this paper explore this idea of having a critic network with more capacity and information it explains why it works better. Asymmetric Actor Critic for Image-Based Robot Learning |
Hi.Do you have the changed code in your reposi..Can you share the change code use image as input.Thanks! |
@pavitrakumar78 Do you have the changed code?I need the help about this recently.Thanks! |
@sufengniu .Hi.Do you have solve this problem?I also need this help recently.Do you have the success changed code?Thanks! |
@BCWang93 No, sorry! Please see my comment from March 1, 2017, above. The code might be alright, but you need to have very specific parameters to create a good agent. As I mentioned in my previous comment, I had to drop this idea due to insufficient time and resources to test various models. |
@saiprabhakar HI,I see the code in your repertories,and when I run your code ,I have some problems!Like this: |
hello, I want to know if you secceed using image input directly. I am trying to do so, but I do not know how. If you have done this, can I learn from you please? |
hello, I want to know if you secceed using image input directly. I am trying to do so, but I do not know how. If you have done this, can I learn from you please? |
@sufengniu |
Hello, @chouer19 I haven't work on this problem for a long time. I haven't really seen anyone success in the public repo using pure image-based input (I didn't count anyone who claim they success but didn't release the code or proof). All success cases are based on low-dimensional sensor features. I ever tried use supervised learning to train the agent, once switch to DDPG or Actor critic, the algorithm would diverge. I think if you pre-train the image representation by a image auto-encoder (ex: U-net), then reuse it for agent, it might be helpful. that is all I know. |
Hello,
Has anyone tried using image as input to train the network? I have worked that for couple of days using a 3 layers conv net to process image substituted original low-dimensional states, but it doesn't work properly.
The text was updated successfully, but these errors were encountered: