forked from jhammond/lltop
-
Notifications
You must be signed in to change notification settings - Fork 0
/
README
203 lines (154 loc) · 8.5 KB
/
README
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
*Lltop*
Lltop[0] is a command line utility which gathers I/O statistics from
Lustre[1] filesystem servers, along with job assignment data from
cluster batch schedulers, to give a job-by-job accounting of
filesystem load. Under typical usage, lltop is invoked with the name
of a filesystem, runs for a configurable interval (10 seconds say),
and outputs a table summarizing I/O and RPC loads indexed by job
identifier; for example:
$ lltop work
JOB WR_MB RD_MB REQS OWNER WORKDIR
12101 15925 67630 133694 jfourier /work/jfourier/fftw_run
10322 2254 1027 2504 claude /work/claude/viscous-flow-08
13007 756 21024 10007 ludwig /work/ludwig/boltzeq.mvapich2
...
Normally, lltop is run in response to observations of excessive load
on file servers or degraded filesystem performance, and is used to
assist system administrators in identifying jobs (and users) with
problematic I/O patterns. A potential secondary use is to determine
the I/O profiles of applications running at scale. lltop is designed
to run as a point and shoot diagnostic utility, and is not a
replacement for continuous monitoring tools such as LMT[2] or
Collectl[3].
*Overview*
Lltop has two executable components, lltop itself, and lltop-serv.
lltop is usually run directly and given the name of a filesystem to
query. From the filesystem name, it derives a list of servers (MDSs
and OSSs), and for each it forks and execs ssh to run a copy of
lltop-serv on the server.
On the server, lltop-serv scrapes the per-client stats files
/proc/fs/lustre/{mds,obdfilter}/<target>/exports/<client>/stats
to determine each client's load in terms of bytes written, bytes read,
and requests processed. It actually makes two passes through the
stats files[4], sleeping for a configurable interval between, and
returns the differences. The output of lltop-serv consists of lines
[5] of the form
<ipv4-addr>@<lnet-net-name> <wr_B> <rd_B> <reqs>
where
<ipv4-addr>@<lnet-net-name> is the client address according to Lustre,
for example 192.0.32.10@tcp,
<wr_B> and <rd_B> are the number of bytes written and read,
<reqs> is the number of request other than pings[6].
Lltop reads this output and translates client addresses to hostnames,
and hostnames to jobids[7, 8], to account for each client's load against
its current job. If lltop cannot find a job assignment for a given
client then considers the client to be the sole member of a job whose
jobid is the clients hostname. Similarly, if lltop cannot find a
hostname for a given client IP address, it uses the address as the
clients name and current jobid. This allows us to handle load
generated by login or admin nodes in the same band.
*Configuring lltop*
To get lltop to work on your site you probably need to override some
of the default configuration. Most of this can be accomplished
through command line options, but the source is organized so that the
same effects (and more) can be acheived by modifying the functions in
hooks.c. Here are the main things you may need to do, along with some
suggestions.
1. Tell lltop on which servers it should run lltop-serv. You have
three options:
a. Modify the function get_serv_list() in hooks.c, so that lltop may
be invoked with the filesystem name as an argument.
b. Use the -l (--server-list) option to specify a list of servers
directly:
lltop -l mds1.example.com oss{01..27}.example.com
c. Provided that FILESYSTEM is mounted on the current host, use some
crazy pipeline, like:
sed 's/@.*$//' /proc/fs/lustre/{mdc,osc}/FILESYSTEM-*/*_conn_uuid | sort | uniq | xargs lltop -l
2. Tell lltop how to translate Lustre client addresses (usually dotted
quads with the @<lnet-net-name> stripped off) to hostnames. How well
does reverse DNS work at your site? If the answer is "Uhhh, not real
well.", or if you have some weird LNET with a weird address format
like qswlnd, whatever that is, then keep reading, otherwise skip to 3.
The default address to host lookup uses getnameinfo(), which should
work fine given a correct site config. If not, here are three
possibilities:
a. Using getnameinfo_get_host() as a template, add the function
my_site_get_host() to hooks.c and tell lltop to use it.
b. Use the -g (--get-host) option to specify an external command
which should take the address as its only argument and print a
hostname. If it succeeds, your exernal command should return 0,
otherwise lltop will treat the dotted quad as if it is the client's
hostname.
c. Fix /etc/hosts, /etc/nsswitch.conf, /etc/resolv.conf,..., so
that getnameinfo() works on the host where you run lltop.
3. Tell lltop how to lookup the current job for a host. Lltop was
originally written for TACC Ranger which uses SGE for batch
scheduling. Under that setup the JOBID of the current job on HOST is
determined from the existence of a file
/share/sge6.2/execd_spool/HOST/active_jobs/JOBID.*
This is the default method in lltop. Otherwise:
a. If you run SGE but you need to override the execd_spool path then
do so by modifying hooks.c or passing --execd-spool=PATH.
b. Using execd_spool_get_job() as a template, add the function
my_site_get_job() to hooks.c and tell lltop to use it.
c. Use the -j (--get-job) option to specify an external command to
do job lookup. It should function like the external host lookup
command described above.
d. Use the -m (--job-map) option to specify an external command
which produces a "job map." This is useful if you use something
like qhost for job lookup, since using 'qhost -j -h <host>' to get
the current job of a single takes about the same time as calling
'qhost -j' to get the current job of all nodes at once. See the
attached script qhost_job_map.
*Installing lltop*
Run make, put lltop somewhere in your path on an admin node, put
lltop-serv somewhere in your path on the Lustre servers. Also see the
included script tacc_lltop which we use to add job owner and workdir
to the output of lltop.
*Getting Help*
$ lltop --help
Usage: lltop [OPTION]... FILESYSTEM
or: lltop [OPTION]... -l SERVER...
Report load by job for Lustre FILESYSTEM or SERVER(s).
Mandatory arguments to long options are mandatory for short options too.
-f, --fqdn use fully qualified domain names for clients
-g, --get-host=COMMAND use COMMAND for reverse DNS lookups
-h, --help display this help and exit
-i, --interval=NUMBER report load over NUMBER seconds
-j, --get-job=COMMAND use COMMAND for job lookup
-l, --server-list report load on servers given as arguments
-m, --job-map=COMMAND use COMMAND to get job map
-n, --limit=NUMBER limit output to NUMBER jobs
--no-header do not display header
--lltop-serv=PATH use lltop-serv at PATH on servers
--remote-shell=PATH use remote shell at PATH to execute lltop-serv
--execd-spool=PATH use execd_spool directory PATH for job lookup
lltop GitHub repository: <https://github.com/jhammond/lltop>
Otherwise, please send me any comments, questions, improvements. I am
especially interested in receiving/including any code/scripts to do
job lookup for batch schedulers other than SGE. Please, put lltop in
the subject line.
John L. Hammond
TACC, The University of Texas at Austin
--
0. lltop is a recursive anagram of lltop.
1. According to the headers, Lustre is a trademark of Sun
Microsystems.
2. Lustre Monitoring Tool: http://code.google.com/p/lmt/
3. Collectl: http://collectl.sourceforge.net/
4. Note that lltop-serv does not clear the stats files. In fact
clearing stats files while lltop-serv is running may cause it to
misreport or under report usage. Client evictions can also affect the
accuracy of the data returned, but lltop-serv does use some simple
heuristics to mitigate their effects. However it should be remembered
that lltop is not an exact tool and should be used with judgement.
5. Lltop-serv does not count pings because doing so tends to distort
the statistics for large jobs.
6. As an optimization, if a client fails to geterate any load during
the interval, then lltop-serv omits that client from its output.
7. Lltop keeps a cache of address to jobid mappings so that the
hostname and jobid lookups are done at most once per client.
8. If your site runs multiple concurrent jobs on single hosts then it
may be hard to adapt lltop. I welcome suggestions on how to handle
this case.