-
Notifications
You must be signed in to change notification settings - Fork 3
Remote Optimization Example
Performing optimizations with remote execution allows you to farm out the work
of running simulations to many machines in parallel. the pswarmdriver
tool
supports sending its objective evaluations (i.e. Cyclus simulations) to a
running cloudlus server. This guide assumes you are familiar with running
optimizations locally (i.e. info in Simple-Optimization-Example) and have
all the cloudlus tools installed and have the binaries on your $PATH.
First, you need to set up a cloudlus server running in a location where your workers can reach it over the network (i.e. a cloud virtual server, etc.). Just copy the cloudlus binary to the server and start it running on the port you want:
$ scp $(which cloudlus) your-server.com:./
$ ssh [email protected]
$ cloudlus -addr=0.0.0.0:4242 serve -dblimit 2000 &> server.log &
$ exit
The 0.0.0.0
address tells the server to accept incoming requests from any ip
address, and the (arbitrarily) chosen port is 4242
. Note - you can use your
local machine as the server with external workers, but you need to be sure all
your workers have network access to the server (i.e. set up port-mapping,
etc.)
Then you need to start up some workers. This step depends heavily on what type
of computing infrastructure you are using to set up your workers. Each worker
needs cyclus
and cycobj
commands installed and available on the $PATH.
Options include:
-
If your cluster has a shared file-system you can install Cyclus and
cycobj
commands to a location there. Then you will need to make sure you add their install locations to the $PATH of the cloudlus workers. -
You can use something like cde (http://www.pgbovine.net/cde.html) to package up a cyclus/cycobj environment from your local machine that you copy to each worker.
For a high-throughput HT-Condor environment with a shared file-system, you might create a condor submit file like this:
universe = vanilla
executable = runfile.sh
transfer_input_files = init.sh
should_transfer_files = yes
when_to_transfer_output = ON_EXIT_OR_EVICT
output = worker.$(PROCESS).output
error = worker.$(PROCESS).error
log = workers.log
Disk = 1048576
request_cpus = 1
request_memory = 1024
Rank = KFlops
+is_resumable = true
requirements = OpSys == "LINUX" && Arch == "x86_64" && (OpSysAndVer =?= "SL6") && (IsDedicated == true) && KFlops >= 300000
queue 300
where init.sh
is:
#!/bin/bash
env PATH=$PATH:/path/to/shared/dir/with/cyclus/and/cycobj/ /path/to/shared/dir/with/cloudlus -addr=your-server.com:4242 work -whitelist=cyclus,cycobj
and cloudlus
is just the cloudlus binary. Then you would run condor_submit [your-condor-submit-file]
to queue up 300 workers.
Then you can start an optimization by running something like this:
$ pswarmdriver -addr=your-server.com:4242 -scen=my-scen-file.json &> optim.log
You can follow summary stats/progress of the running jobs/simulations on the server dashboard by visiting http://your-server.com:4242 in a web browser.
If you don't have access to external computational resources and just want to test the remote execution setup out locally, you are in luck. All you have to do (assuming Cyclus and cloudlus are installed and on your $PATH) is run:
$ cloudlus serve &> server.log &
$ cloudlus work &> worker1.log &
$ pswarmdriver -addr=127.0.0.1:9875 -scen=my-scen-file.json &> optim.log
Note that 127.0.0.1 means "the local machine" in networking lingo and port
9875 is the port cloudlus defaults to if you don't specify one manually. You
can then watch your worker1.log
file fill up with output from the cyclus
simulations being run by the pswarm optimizer. You can start up as many
workers as you like:
$ cloudlus work &> worker2.log &
$ cloudlus work &> worker3.log &
$ cloudlus work &> worker4.log &
$ cloudlus work &> worker5.log &
$ cloudlus work &> worker6.log &
...
and the parallelism will be utilized automatically by the optimizer and cloudlus server.