How are the total_reward and test/train_return in the DeepMimic environment related? #4317

rabgg2 · 2022-08-09T21:00:07Z

rabgg2
Aug 9, 2022

Hi,

I am training the humanoid agent to walk using the DeepMimic environment.
While training the policy, the terminal prints the total_reward for each episode, equal to the sum of rewards at each time step.

Then after 40 episodes or so (1 iteration), the terminal prints the train_return and test_return values.

How do these values relate to the total_reward for the episodes? I tried manually finding the mean and the discounted sum using a lambda value of 0.95, but the result is not close to 34.6 or 40, as the train and test return indicates.

Iteration example:

total_reward= 304.7070782231422
total_reward= 330.99031506948995
total_reward= 280.0899647972968
total_reward= 334.5682120720093
total_reward= 290.48607507379035
total_reward= 296.15922621557917
total_reward= 284.28576137796716
total_reward= 318.3853249960263
total_reward= 281.1954503632689
total_reward= 291.4806676815156
total_reward= 267.16685971352155
total_reward= 296.23791982481396
total_reward= 276.2614277167039
total_reward= 347.34052600783497
total_reward= 307.8560193518319
total_reward= 318.5019787110523
total_reward= 283.2503021802854
total_reward= 302.85406186996715
total_reward= 292.4121275202293
total_reward= 302.8634105168602
total_reward= 295.9168667624474
total_reward= 334.2352692753193
total_reward= 294.5151168261536
total_reward= 290.95920850744614
total_reward= 306.1276442673896
total_reward= 308.0391413994197
total_reward= 285.98186639238116
total_reward= 308.9052466366138
total_reward= 291.3991421620044
total_reward= 286.29836297186966
total_reward= 314.10028170590556
total_reward= 254.62273445146707
total_reward= 290.8562960379172
total_reward= 272.62704895129536
total_reward= 325.0583622036573
total_reward= 273.9253170482888
Model saved to: /home/.local/lib/python3.6/site-packages/pybullet_data/data/policies/humanoid3d/agent0_model.ckpt
Agent 0

| Iteration | 71830 |
| Wall_Time | 134 |
| Samples | 313248246 |
| Train_Return | 34.6 |
| Test_Return | 40 |
| State_Mean | 0.107 |
| State_Std | 2.63 |
| Goal_Mean | 0 |
| Goal_Std | 0 |
| Exp_Rate | 0.2 |
| Exp_Noise | 0.05 |
| Exp_Temp | 0.001 |
| Critic_Loss | 0.00183 |
| Critic_Stepsize | 0.01 |
| Actor_Loss | 0.327 |
| Actor_Stepsize | 2.5e-06 |
| Clip_Frac | 0.251 |
| Adv_Mean | -0.19361387 |
| Adv_Std | 0.77200764 |

Any help would be very appreciated. I want to understand the plots from the plot_return.py script, but they're not making much sense to me now.

Thanks!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How are the total_reward and test/train_return in the DeepMimic environment related? #4317

{{title}}

Replies: 0 comments

Select a reply

How are the total_reward and test/train_return in the DeepMimic environment related? #4317

rabgg2 Aug 9, 2022

Replies: 0 comments

rabgg2
Aug 9, 2022