What's best practice for setup cluster? #3180

cmsxbc · 2022-07-05T07:52:06Z

ENV: python 3.7.11 mars 0.9.0

import mars
import mars.tensor as mt

n = 20000
n_worker = 1
n_cpu = 1
mem_bytes = 20 * 2 ** 30

mars.new_session(init_local=True, n_worker=n_worker, n_cpu=n_cpu, mem_bytes=mem_bytes)

X = mt.random.RandomState(0).rand(n,n)
invX = mt.linalg.inv(X).execute()

While executing above script, the speed is positive correlate to n_cpu and negative correlate to n_worker.
Is it right result or I did something wrong?
If it was right, is that meaning I should always choose 1 worker with multi cpu , instead of multi worker with 1 cpu?
And what should I do if i can run on multi machines, but only a little cpu core every machine.

samples: execute time in seconds.

Worker\CPU	1	2	4	8	16
1	1100	546	291	184	124
2	\	857	360	211	150
4	\	\	836	383	234
8	\	\	\	417	310
16	\	\	\	\	555

The text was updated successfully, but these errors were encountered:

qinxuye · 2022-07-06T02:39:58Z

Nice question, the performance may be related with many factors, seems the implementation of mt.linalg.inv does not scale well when workers grow.

We need to do some investigation to see what happened.

cmsxbc · 2022-07-07T02:39:50Z

@qinxuye And I found there were very busy network traffic. When doing mt.linalg.inv on a 20k square matrix (the size is about 2.9GiB), there will be about 40GiB - 50 GiB data be transmit between nodes, from 2 workers cluster to 16 workers cluster.
I deployed the cluster over ray backend. As I known, the background network throughput of ray is about 10-20 KiB/s.

qinxuye · 2022-07-07T04:31:55Z

@cmsxbc Do you mind to join slack https://join.slack.com/t/mars-computing/shared_invite/zt-1c39tdh83-K1AT9FmtKkUgOzmM6~Nwbg

So that we can know more info about the problem you are solving.

cmsxbc changed the title ~~What~~ What's best practice for setup cluster? Jul 5, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

What's best practice for setup cluster? #3180

What's best practice for setup cluster? #3180

cmsxbc commented Jul 5, 2022 •

edited

Loading

qinxuye commented Jul 6, 2022

cmsxbc commented Jul 7, 2022 •

edited

Loading

qinxuye commented Jul 7, 2022 •

edited

Loading

What's best practice for setup cluster? #3180

What's best practice for setup cluster? #3180

Comments

cmsxbc commented Jul 5, 2022 • edited Loading

qinxuye commented Jul 6, 2022

cmsxbc commented Jul 7, 2022 • edited Loading

qinxuye commented Jul 7, 2022 • edited Loading

cmsxbc commented Jul 5, 2022 •

edited

Loading

cmsxbc commented Jul 7, 2022 •

edited

Loading

qinxuye commented Jul 7, 2022 •

edited

Loading