GCP Tutorial may not be using the GPU #52

Sohojoe · 2019-02-14T22:10:57Z

When following the GCP Tutorial - I see Tensorflow warning that the version of Tensorflow is not optimized for the cpu.

Given that the cloud instance does include the optimized version of Tensorflow, I wonder if installing obstacle-tower-env overrides the optimized version. If this is the case, then it may mean it has installed the unoptimized CPU only Tensorflow as ml-agents has the requirement 'tensorflow>=1.7,<1.8'

The training speed seems slow: 56 steps per second compared with 130 steps per second on my home pc:

The text was updated successfully, but these errors were encountered:

awjuliani · 2019-02-14T22:12:39Z

@ervteng Can you speak to this? We had validated internally that we were using GCP/GPU.

ervteng · 2019-02-19T22:12:53Z

Hi @Sohojoe, the obstacle-tower-env doesn't have a TensorFlow requirement as it doesn't install ml-agents. You can check the GPU usage with nvidia-smi. What type of GPU are you running locally?

Sohojoe · 2019-02-20T05:47:20Z

@ervteng - I have a GTX 1080 locally. How many training steps per second do you see?

running nvidia-smi shows that it is using the GPU so I wass wrong:

I guess the default tensorflow does not include cpu optimizations and that is why it shows the warning:

kwea123 · 2019-02-20T06:39:20Z

@ervteng what do you mean by

the obstacle-tower-env doesn't have a TensorFlow requirement as it doesn't install ml-agents.

? Then what does this mean in the README?

Requirements
The Obstacle Tower environment runs on Mac OS X, Windows, or Linux.

Python dependencies (also in setup.py):

Unity ML-Agents v0.6
OpenAI Gym
Pillow

Also I remember that my tensorflow version was overwritten with 1.7.1 when running pip install -e . from this repo. Although I re-installed 1.9.0 and found that there was no problem running the obstacle tower environment...

Sohojoe · 2019-02-20T07:32:16Z

@kwea123 obstacle tower installs a special version of ml-agents that doesn't specify tensorflow in its' install requirements file.

obstacle tower does need tensorflow to run.

The normal ml-agents specifies tensorflow 1.7.x as this is required for running the trained models from within until. obstacle tower doesn't need this.

kwea123 · 2019-02-20T07:35:49Z

@Sohojoe Oh, I see. Sorry for the misunderstanding @ervteng

ervteng · 2019-02-20T19:53:24Z

@Sohojoe you are correct, the Readme is wrong (and we'll fix it). The newest versions of OTC no longer uses ML-Agents in its entirety, and doesn't require TensorFlow. Dopamine does require TensorFlow, but as far as I know will work with most recent versions.

I'm getting about 45.61 steps per second on a T4 on GCP, but it's using only about 10% of the GPU. In our past testing, we found that the OTC environment tends to be CPU-bound. What CPU do you have on your desktop machine? I'm curious to see how we can get the environments training faster.

Sohojoe · 2019-02-21T17:36:53Z

I have an i7-8700k @ 3.7GHz which has 6 processors / 12 cores

A big help to performance would be to support multiple instances of the environment within the Unity level. I regularly train with 128 concurrent agents and I'm reading some papers where they go up to 2048. I made a modification to ml-agents in my dev branch of marathon-envs which enables one to set --num-agents=128 in the command line to specify the number of agents. I would be happy to work on a PR. But, it does require the environment to work relative to its spawn position.

I have also been working on adapting large-scale-curiosity to work with obstacle tower as it supports instancing via MPI. I have been able to get it training on windows at 400-500 fps but it is not learning yet. Also, MPI on windows is not very stable and I've only been able to get 16-24 instances running (but this should not be a problem on linux servers). My code is here

awjuliani assigned ervteng Feb 14, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GCP Tutorial may not be using the GPU #52

GCP Tutorial may not be using the GPU #52

Sohojoe commented Feb 14, 2019

awjuliani commented Feb 14, 2019

ervteng commented Feb 19, 2019

Sohojoe commented Feb 20, 2019

kwea123 commented Feb 20, 2019

Sohojoe commented Feb 20, 2019

kwea123 commented Feb 20, 2019

ervteng commented Feb 20, 2019 •

edited

Loading

Sohojoe commented Feb 21, 2019 •

edited

Loading

GCP Tutorial may not be using the GPU #52

GCP Tutorial may not be using the GPU #52

Comments

Sohojoe commented Feb 14, 2019

awjuliani commented Feb 14, 2019

ervteng commented Feb 19, 2019

Sohojoe commented Feb 20, 2019

kwea123 commented Feb 20, 2019

Sohojoe commented Feb 20, 2019

kwea123 commented Feb 20, 2019

ervteng commented Feb 20, 2019 • edited Loading

Sohojoe commented Feb 21, 2019 • edited Loading

ervteng commented Feb 20, 2019 •

edited

Loading

Sohojoe commented Feb 21, 2019 •

edited

Loading