Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Initial integration with AutoDist #72

Open
wants to merge 9 commits into
base: master
Choose a base branch
from
Open

Initial integration with AutoDist #72

wants to merge 9 commits into from

Conversation

DachengLi1
Copy link
Contributor

No description provided.

adaptdl/adaptdl/torch/__init__.py Outdated Show resolved Hide resolved
examples/integration/Dockerfile Show resolved Hide resolved
sched/adaptdl_sched/supervisor.py Outdated Show resolved Hide resolved
examples/integration/Dockerfile Show resolved Hide resolved
WORKDIR /root
COPY bert_config.json bert_config.json
COPY tf_examples.tfrecord tf_examples.tfrecord
COPY autodist autodist
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some of these COPY commands cannot work in a fresh clone of the AdaptDL repo? Can you make sure this example can work in that setting? Maybe git clone autodist instead of assuming it exists locally?

@codecov-io
Copy link

codecov-io commented Dec 7, 2020

Codecov Report

Merging #72 (c012cde) into master (780db06) will increase coverage by 0.01%.
The diff coverage is 0.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master      #72      +/-   ##
==========================================
+ Coverage   61.16%   61.18%   +0.01%     
==========================================
  Files          30       30              
  Lines        2243     2334      +91     
  Branches      338      357      +19     
==========================================
+ Hits         1372     1428      +56     
- Misses        806      834      +28     
- Partials       65       72       +7     
Impacted Files Coverage Δ
sched/adaptdl_sched/supervisor.py 0.00% <0.00%> (ø)
adaptdl/adaptdl/checkpoint.py 84.50% <0.00%> (-7.60%) ⬇️
adaptdl/adaptdl/torch/adascale.py 89.41% <0.00%> (-4.75%) ⬇️
adaptdl/adaptdl/goodput.py 97.54% <0.00%> (+0.08%) ⬆️
adaptdl/adaptdl/torch/_metrics.py 80.80% <0.00%> (+0.31%) ⬆️
adaptdl/adaptdl/torch/parallel.py 64.21% <0.00%> (+0.76%) ⬆️
adaptdl/adaptdl/torch/data.py 77.41% <0.00%> (+0.89%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 780db06...c012cde. Read the comment docs.

@aurickq aurickq changed the title integration Initial integration with AutoDist Dec 7, 2020
return_list = [(pod_ip_list[i], pod_gpu_list[i])
for i in range(len(pod_ip_list))]
LOG.info(return_list)
return web.json_response(return_list)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shall we unify the return values with L87?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants