-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
Documenter.jl
committed
Dec 15, 2023
1 parent
7022f76
commit cc9a0a5
Showing
10 changed files
with
77 additions
and
15 deletions.
There are no files selected for viewing
Large diffs are not rendered by default.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,62 @@ | ||
<!DOCTYPE html> | ||
<html lang="en"><head><meta charset="UTF-8"/><meta name="viewport" content="width=device-width, initial-scale=1.0"/><title>Cluster Support · Experimenter.jl</title><script data-outdated-warner src="../assets/warner.js"></script><link rel="canonical" href="https://JamieMair.github.io/Experimenter.jl/clusters/"/><link href="https://cdnjs.cloudflare.com/ajax/libs/lato-font/3.0.0/css/lato-font.min.css" rel="stylesheet" type="text/css"/><link href="https://cdnjs.cloudflare.com/ajax/libs/juliamono/0.045/juliamono.min.css" rel="stylesheet" type="text/css"/><link href="https://cdnjs.cloudflare.com/ajax/libs/font-awesome/5.15.4/css/fontawesome.min.css" rel="stylesheet" type="text/css"/><link href="https://cdnjs.cloudflare.com/ajax/libs/font-awesome/5.15.4/css/solid.min.css" rel="stylesheet" type="text/css"/><link href="https://cdnjs.cloudflare.com/ajax/libs/font-awesome/5.15.4/css/brands.min.css" rel="stylesheet" type="text/css"/><link href="https://cdnjs.cloudflare.com/ajax/libs/KaTeX/0.13.24/katex.min.css" rel="stylesheet" type="text/css"/><script>documenterBaseURL=".."</script><script src="https://cdnjs.cloudflare.com/ajax/libs/require.js/2.3.6/require.min.js" data-main="../assets/documenter.js"></script><script src="../siteinfo.js"></script><script src="../../versions.js"></script><link class="docs-theme-link" rel="stylesheet" type="text/css" href="../assets/themes/documenter-dark.css" data-theme-name="documenter-dark" data-theme-primary-dark/><link class="docs-theme-link" rel="stylesheet" type="text/css" href="../assets/themes/documenter-light.css" data-theme-name="documenter-light" data-theme-primary/><script src="../assets/themeswap.js"></script></head><body><div id="documenter"><nav class="docs-sidebar"><div class="docs-package-name"><span class="docs-autofit"><a href="../">Experimenter.jl</a></span></div><form class="docs-search" action="../search/"><input class="docs-search-query" id="documenter-search-query" name="q" type="text" placeholder="Search docs"/></form><ul class="docs-menu"><li><a class="tocitem" href="../">Home</a></li><li><a class="tocitem" href="../getting_started/">Getting Started</a></li><li><a class="tocitem" href="../execution/">Running your Experiments</a></li><li><a class="tocitem" href="../distributed/">Distributed Execution</a></li><li><a class="tocitem" href="../store/">Data Store</a></li><li><a class="tocitem" href="../snapshots/">Custom Snapshots</a></li><li class="is-active"><a class="tocitem" href>Cluster Support</a><ul class="internal"><li><a class="tocitem" href="#SLURM"><span>SLURM</span></a></li></ul></li><li><a class="tocitem" href="../api/">Public API</a></li></ul><div class="docs-version-selector field has-addons"><div class="control"><span class="docs-label button is-static is-size-7">Version</span></div><div class="docs-selector control is-expanded"><div class="select is-fullwidth is-size-7"><select id="documenter-version-selector"></select></div></div></div></nav><div class="docs-main"><header class="docs-navbar"><nav class="breadcrumb"><ul class="is-hidden-mobile"><li class="is-active"><a href>Cluster Support</a></li></ul><ul class="is-hidden-tablet"><li class="is-active"><a href>Cluster Support</a></li></ul></nav><div class="docs-right"><a class="docs-edit-link" href="https://github.com/JamieMair/Experimenter.jl/blob/main/docs/src/clusters.md" title="Edit on GitHub"><span class="docs-icon fab"></span><span class="docs-label is-hidden-touch">Edit on GitHub</span></a><a class="docs-settings-button fas fa-cog" id="documenter-settings-button" href="#" title="Settings"></a><a class="docs-sidebar-button fa fa-bars is-hidden-desktop" id="documenter-sidebar-button" href="#"></a></div></header><article class="content" id="documenter-page"><h1 id="Clusters"><a class="docs-heading-anchor" href="#Clusters">Clusters</a><a id="Clusters-1"></a><a class="docs-heading-anchor-permalink" href="#Clusters" title="Permalink"></a></h1><p>This package provides some basic support for running an experiment on a HPC. This uses <code>ClusterManagers.jl</code> under the hood.</p><p>At the moment, we only support running on a SLURM cluster, but any PRs to support other clusters are welcome.</p><h2 id="SLURM"><a class="docs-heading-anchor" href="#SLURM">SLURM</a><a id="SLURM-1"></a><a class="docs-heading-anchor-permalink" href="#SLURM" title="Permalink"></a></h2><p>Normally when running on SLURM, one creates a bash script to tell the scheduler about the resource requirements for a job. The following is an example:</p><pre><code class="language-bash hljs">#!/bin/bash | ||
|
||
#SBATCH --nodes=2 | ||
#SBATCH --ntasks=2 | ||
#SBATCH --cpus-per-task=2 | ||
#SBATCH --mem-per-cpu=1024 | ||
#SBATCH --time=00:30:00 | ||
#SBATCH -o hpc/output/test_job_%j.out</code></pre><p>The function <a href="@ref"><code>Experimenter.Cluster.create_slurm_template</code></a> provides an easy way to create one of these bash scripts with everything you need to run.</p><h3 id="Example"><a class="docs-heading-anchor" href="#Example">Example</a><a id="Example-1"></a><a class="docs-heading-anchor-permalink" href="#Example" title="Permalink"></a></h3><p>Let us take the following end-to-end example. Say that we have an experiment script at <code>my_experiment.jl</code> (contents below), which now initialises the cluster:</p><pre><code class="language-julia hljs">using Experimenter | ||
|
||
config = Dict{Symbol,Any}( | ||
:N => IterableVariable([Int(1e6), Int(2e6), Int(3e6)]), | ||
:seed => IterableVariable([1234, 4321, 3467, 134234, 121]), | ||
:sigma => 0.0001) | ||
experiment = Experiment( | ||
name="Test Experiment", | ||
include_file="run.jl", | ||
function_name="run_trial", | ||
configuration=deepcopy(config) | ||
) | ||
|
||
db = open_db("experiments.db") | ||
|
||
# Init the cluster | ||
Experimenter.Cluster.init() | ||
|
||
@execute experiment db DistributedMode</code></pre><p>Additionally, we have the file <code>run.jl</code> containing:</p><pre><code class="language-julia hljs">using Random | ||
using Distributed | ||
function run_trial(config::Dict{Symbol,Any}, trial_id) | ||
results = Dict{Symbol, Any}() | ||
sigma = config[:sigma] | ||
N = config[:N] | ||
seed = config[:seed] | ||
rng = Random.Xoshiro(seed) | ||
# Perform some calculation | ||
results[:distance] = sum(rand(rng) * sigma for _ in 1:N) | ||
results[:num_threads] = Threads.nthreads() | ||
results[:hostname] = gethostname() | ||
results[:pid] = Distributed.myid() | ||
# Must return a Dict{Symbol, Any}, with the data we want to save | ||
return results | ||
end</code></pre><p>We can now create a bash script to run our experiment. We create a template by running the following in the terminal (or adjust or the REPL)</p><pre><code class="language-bash hljs">julia --project -e 'using Experimenter; Experimenter.Cluster.create_slurm_template("myrun.sh")'</code></pre><p>We then modify the create <code>myrun.sh</code> file to the following:</p><pre><code class="language-bash hljs">#!/bin/bash | ||
|
||
#SBATCH --ntasks=4 | ||
#SBATCH --cpus-per-task=2 | ||
#SBATCH --mem-per-cpu=1024 | ||
#SBATCH --time=00:30:00 | ||
#SBATCH -o hpc/logs/job_%j.out | ||
|
||
julia --project my_experiment.jl --threads=1 | ||
|
||
# Optional: Remove the files created by ClusterManagers.jl | ||
rm -fr julia-*.out | ||
</code></pre><p>Once written, we execute this on the cluster via</p><pre><code class="language-bash hljs">sbatch myrun.sh</code></pre><p>We can then open a Julia REPL (once the job has finished) to see the results:</p><pre><code class="language-julia hljs">using Experimenter | ||
db = open_db("experiments.db") | ||
trials = get_trials_by_name(db, "Test Experiment") | ||
|
||
for (i, t) in enumerate(trials) | ||
hostname = t.results[:hostname] | ||
id = t.results[:pid] | ||
println("Trial $i ran on $hostname on worker $id") | ||
end</code></pre><p>Support for running on SLURM is based on <a href="https://gist.github.com/JamieMair/0b1ffbd4ee424c173e6b42fe756e877a">this gist</a> available on GitHub. This gist also provides information on how to adjust the SLURM script to allow for one GPU to be allocated to each worker.</p></article><nav class="docs-footer"><a class="docs-footer-prevpage" href="../snapshots/">« Custom Snapshots</a><a class="docs-footer-nextpage" href="../api/">Public API »</a><div class="flexbox-break"></div><p class="footer-message">Powered by <a href="https://github.com/JuliaDocs/Documenter.jl">Documenter.jl</a> and the <a href="https://julialang.org/">Julia Programming Language</a>.</p></nav></div><div class="modal" id="documenter-settings"><div class="modal-background"></div><div class="modal-card"><header class="modal-card-head"><p class="modal-card-title">Settings</p><button class="delete"></button></header><section class="modal-card-body"><p><label class="label">Theme</label><div class="select"><select id="documenter-themepicker"><option value="documenter-light">documenter-light</option><option value="documenter-dark">documenter-dark</option></select></div></p><hr/><p>This document was generated with <a href="https://github.com/JuliaDocs/Documenter.jl">Documenter.jl</a> version 0.27.25 on <span class="colophon-date" title="Friday 15 December 2023 15:18">Friday 15 December 2023</span>. Using Julia version 1.9.4.</p></section><footer class="modal-card-foot"></footer></div></div></div></body></html> |
Oops, something went wrong.