You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
While executing above script, the speed is positive correlate to n_cpu and negative correlate to n_worker.
Is it right result or I did something wrong?
If it was right, is that meaning I should always choose 1 worker with multi cpu , instead of multi worker with 1 cpu?
And what should I do if i can run on multi machines, but only a little cpu core every machine.
samples: execute time in seconds.
Worker\CPU
1
2
4
8
16
1
1100
546
291
184
124
2
\
857
360
211
150
4
\
\
836
383
234
8
\
\
\
417
310
16
\
\
\
\
555
The text was updated successfully, but these errors were encountered:
cmsxbc
changed the title
What
What's best practice for setup cluster?
Jul 5, 2022
@qinxuye And I found there were very busy network traffic. When doing mt.linalg.inv on a 20k square matrix (the size is about 2.9GiB), there will be about 40GiB - 50 GiB data be transmit between nodes, from 2 workers cluster to 16 workers cluster.
I deployed the cluster over ray backend. As I known, the background network throughput of ray is about 10-20 KiB/s.
ENV: python 3.7.11 mars 0.9.0
While executing above script, the speed is positive correlate to
n_cpu
and negative correlate ton_worker
.Is it right result or I did something wrong?
If it was right, is that meaning I should always choose 1 worker with multi cpu , instead of multi worker with 1 cpu?
And what should I do if i can run on multi machines, but only a little cpu core every machine.
samples: execute time in seconds.
The text was updated successfully, but these errors were encountered: