Timeline with engine instrumentation #973

lukego · 2016-07-22T10:04:43Z

This is the implementation of Timeline (#916):

Adds the core.timeline module for efficient logging to binary ring buffers.
Adds log messages to core.engine to keep track of engine events.

Controversy?

This PR is a relatively large change. It introduces a new mechanism for defining detailed operational events for logging purposes and this is relatively verbose. It also introduces the practice of calling logging hooks even in relatively performance-sensitive code paths which could be hazardous (although I have already taken considerable pains to make this efficient). Reasonable people may have objections.

Implementation notes

The log is a 64MB (1M x 64B) shared memory object. This can be accessed as /var/run/snabb/$pid/engine/timeline while the Snabb is running and will be automatically removed on shutdown (thanks to #930).

The log compresses well and in the near future we should follow up with a way for the Hydra CI to archive timeline logs for all tests (e.g. via a hook in the Snabb shutdown/cleanup process that creates a compressed tarball somewhere).

This implementation automatically randomizes the log level on each breath. Most breaths (99%) are not logged at all; some breaths (1%) have their start and end logged, and a few breaths (0.1%) are logged in more detail ("app level"), while extremely few (0.01%) are logged in great detail ("packet level"). The intention is to get a representative sample of log messages and for each sample to be a whole breath that is logged at a consistent level.

Example log excerpt

The complete timeline log is up to a million lines long. Here is a tiny snippet that shows ~10 breaths. One of these is logged in detail ("app level") while the others with low detail (start and end only). The events each include parameters that could be used to calculate interesting metrics (cycles per packet, etc).

numa core       cycles category         message
---- ---- ------------ ---------------- -------
0    3        15449112 engine           start breath(1418481): total packets(64180039) bytes(100406899796) etherbits(807876161304)
0    3          341564 engine           end breath(1418481): processed packets(242) bytesPerPacket(644)
0    3         7143964 engine           start breath(1418509): total packets(64187365) bytes(100411625144) etherbits(807914491560)
0    3          249408 engine           end breath(1418509): processed packets(242) bytesPerPacket(644)
0    3        27673240 engine           start breath(1418614): total packets(64214647) bytes(100429219436) etherbits(808057210200)
0    3          244564 engine           end breath(1418614): processed packets(242) bytesPerPacket(644)
0    3         9542868 engine           start breath(1418651): total packets(64224261) bytes(100435419968) etherbits(808107506664)
0    3          245384 engine           end breath(1418651): processed packets(242) bytesPerPacket(644)
0    3        76336068 engine           ran timers
0    3            1800 engine           start breath(1418940): total packets(64294376) bytes(100480636610) etherbits(808474288080)
0    3             264 engine           initialized breath
0    3            4156 engine           called app A_NIC (type: apps.intel.intel_app)
0    3           14404 engine           called app B_NIC (type: apps.intel.intel_app)
0    3           39008 engine           called app B_Virtio (type: apps.vhost.vhost_user)
0    3           90292 engine           called app A_Virtio (type: apps.vhost.vhost_user)
0    3             220 engine           pulled input packets
0    3            4504 engine           called app B_NIC (type: apps.intel.intel_app)
0    3          105588 engine           called app B_Virtio (type: apps.vhost.vhost_user)
0    3           13732 engine           called app A_NIC (type: apps.intel.intel_app)
0    3            7864 engine           called app A_Virtio (type: apps.vhost.vhost_user)
0    3           27460 engine           pushed output packets
0    3             840 engine           pushed output packets
0    3             220 engine           end breath(1418940): processed packets(242) bytesPerPacket(644)
0    3        40522220 engine           start breath(1419083): total packets(64331528) bytes(100504588274) etherbits(808668576336)
0    3          333708 engine           end breath(1419083): processed packets(317) bytesPerPacket(646)
0    1        41323384 engine           start breath(1419233): total packets(64368118) bytes(100528182974) etherbits(808859968416)
0    1          357952 engine           end breath(1419233): processed packets(242) bytesPerPacket(644)
0    1        65394108 engine           start breath(1419463): total packets(64428194) bytes(100566922310) etherbits(809174208576)
0    1          247936 engine           end breath(1419463): processed packets(244) bytesPerPacket(639)
0    1        18768960 engine           start breath(1419530): total packets(64445616) bytes(100578159890) etherbits(809265363600)
0    1          277872 engine           end breath(1419530): processed packets(242) bytesPerPacket(644)

Note that the first breath is #1418481 and the last breath is #1419530 and so during this interval there were 1049 breaths executed but only these ones were sampled for the log. (The more breaths we include in the sample the faster the log wraps around.)

How to use it

This PR introduces a discipline of registering detailed log messages ahead of time. This is to make the storage efficient (reference predefined strings) and the logging efficient (reuse a dedicated closure for each log message).

Here is an example of how log messages for sleeping and waking up can be defined:

local event_sleep_idle = define_event('trace', [[sleeping $usec on idle                                                                                                                       
The engine requests that the kernel suspend this process for a period                                                                                                                         
of microseconds in order to reduce CPU utilization because idleness                                                                                                                           
has been detected (a breath in which no packets were processed.)]])

local event_wakeup = define_event('trace', [[wakeup from sleep                                                                                                                                
The engine resumes operation after sleeping voluntarily.]])

and here is how those example messages are logged:

event_sleep_idle(sleep)
C.usleep(sleep)
event_wakeup()

and then the log will contain messages like this:

numa core       cycles category         message
---- ---- ------------ ---------------- -------
0    3             404 engine           sleeping usec(100) on idle
0    3          535156 engine           wakeup from sleep
0    0             272 engine           sleeping usec(100) on idle
0    0          533624 engine           wakeup from sleep
0    0             360 engine           sleeping usec(100) on idle
0    0          533640 engine           wakeup from sleep
0    0             372 engine           sleeping usec(100) on idle
0    0          534732 engine           wakeup from sleep

which are interesting for a couple of reasons. First we can look at the cycles (delta) column to see how long we really sleep for and compare this with the time that we requested. Second we can look at the core column to see whether we have woken up on the same CPU core as we slept on.

The idea is that the log would include many such events of different kinds and a developer can sift through to answer questions. The log format is also relatively easy to process e.g. if one would want to plot the distribution of actual cycles spent sleeping vs requested sleeping time to estimate the reliability of usleep.

Formatting timeline logs

The log excerpts above were created with a command like:

sudo ./snabb snsh -e 'require("core.timeline").dump("./timeline")'

and to remove the verbosity (complete log message text) I added `| grep '^[0-9]'.

This could be greatly improved in the future e.g. with a snabb log program or an extension of snabb top to access timelines. There is also a graphical timeline log analyzer in the works.

I spotted two typos and fixed them (entires -> entries). And a question: In the example, shouldn't the string `example` appear somewhere in the resulting log ("The log now contains there entries")?

Typo fixes, and a question

@sleinen

Added category to example log output (thanks for feedback @sleinen). Added functions: priority(), save(), dump() Fixed formatting (more word wrap)

Updated the timeline implementation to match the documented API. This is still a work in progress: The messages printed by the selftest all have zeros for arguments for unknown reasons; The overhead of logging has doubled for unknown reasons, possibly due to allocating with core.shm and putting a performance-critical cacheline on a 4096-aligned address that may trigger associativity problems (wild speculation, have to dig in).

I realized that when we take the timestamp with RDTSCP this is also loading the current CPU core number and NUMA node into ECX. This patch includes that register in the log dump. This seems useful so that we can account for scheduling issues from the log. Strictly speaking the value in ECX is the contents of the TSC_AUX register but it seems that we can rely on Linux to load the core ID + NUMA node there. Here is where I got the idea: http://stackoverflow.com/questions/22310028/is-there-an-x86-instruction-to-tell-which-core-the-instruction-is-being-run-on

The string pattern matching for detecting arguments ($arg) was wrong and this caused all args to be logged as 0.

Show core number and NUMA node. Write slightly better log messages for selftest. Simplify the logging loop in the selftest to simply log three messages per iteration. This way the timestamps will indicate the logging overhead. (Previous version had some funny branches that could add a few dozen cycles to the timestamps which is a bit distracting.)

Fixed an off-by-one error (braino) in choosing which timestamps to calculate the delta from. Make the cycle column wider to better accommodate large values.

The selftest function now executes a few loops to see how long it takes to log a million events that are disabled by priority. Log excerpt: numa core -cycles category message 0 3 5014856 selftest invoked many events(1000000) 0 3 5025592 selftest invoked many events(1000000) 0 3 5176004 selftest invoked many events(1000000) So looks like five cycles per event is the ballpark cost (including the loop that makes the individual FFI calls).

The events are currently always enabled. This needs to be optimized. Goal for now is simply to have interesting information in the log.

Use shm.create() instead of shm.map(). Required after update in 9fe397e.

Sample different priorities (log levels) on different breaths. Try to always include interesting details in the log but without wrapping too quickly.

end breath now reports number of packets and bits freed.

The timeline text dump now formats the log starting from the first message and continuing towards the last. The previous behavior was to print the log backwards but this was confusing and ill-considered.

Merged conflict in core/app.lua

Requires a math.max() on an argument that is otherwise FFI cdata.

eugeneia · 2016-07-25T13:10:05Z

🎆 Looking great! Can't wait to look at some cool plots. :-)

Depending on whether #972 gets accepted, I would adapt the API of core.timeline to provide create and open (instead of new) and register it as the type timeline. Makes sense? It would probably also make sense to add a snabb top -t option to pretty-print regions of timeline objects.

Super minor nitpick, question: When formatting optional parameters in Markdown we used *[param]*, but I prefer to use [*param*] (and that is what I adopted for #972). E.g. italic marks the symbol, braces are semantic hints. Do you mind if I change that to the latter style when merging this?

Edit: I am now realizing why we used *[param]*, the alternative renders to a link... hmm.
Edit2: Apparently only in GitHub comments.

lukego · 2016-07-26T09:57:47Z

@eugeneia That's okay for me.

Hey there are a few outstanding issues here that I would like to resolve for this release but maybe we can take on next. The main thing is that I want Hydra to snapshot the shm folder for relevant process(es) and archive this for reference. This way we would have the timelines recorded for later inspection. I wonder what is the least-hacky way to do this? Basically we need to either run a hook before we delete the shm dir (which could be used to copy a snapshot somewhere) or we need to create the shm directory in a special location and then not delete it on termination.

This branch actually contains a couple of hacks in this direction that are not very effective and likely should be rolled back. One is adding the env var SNABB_SHM_ROOT that can force a new location for the shm directory. The other is in snabbnfv.traffic to explicitly save the timeline in $pwd at the end of test. I actually pushed reverts for these but cheekly reset that away quickly because I am not sure if they are part of the solution or not.

Like I say these issues could be resolved on next so needn't block merge of this branch. I do really want to pick a solution for preserving the shm directory for Hydra though.

eugeneia · 2016-07-26T12:21:00Z

I think adding SNABB_SHM_ROOT is an appropriate solution, combined with SNABB_DEBUG=yes (which causes the runtime directory to persist after exit) it should do what we need.

lukego · 2016-07-27T10:07:21Z

I notice that we already have _G.developer_debug inhibiting removal of the shm directory. That is probably sufficient for getting this hooked into Hydra. Good enough for now.

Generally I am uncomfortable with debug modes like _G.developer_debug and SNABB_DEBUG because I don't understand their consequences. Do they reduce performance? Leave files on the file system? Create verbose output? Likely I willl want some of these but not others, e.g. if I am benchmarking then I want the state kept on the filesystem for examination but I don't want to enable debug features that will impact performance. So a global debug=yes/no flag seems unlikely to do what I mean.

Have to make a snabb.conf with a well-defined format e.g. YANG schema. One step at a time...

Multiqueue fixes

lukego and others added 29 commits March 30, 2016 07:19

dasm_x86: Added RDTSCP instruction to Lua-mode dynasm

82b68bc

core.timeline: High-resolution event log [wip]

5c4dabd

core.timeline: Wrote draft documentation in core/timeline.md

0acf545

core/timeline.md: Fixed markdown for diagram

83d94d8

Typo fixes, and a question

051b23a

I spotted two typos and fixed them (entires -> entries). And a question: In the example, shouldn't the string `example` appear somewhere in the resulting log ("The log now contains there entries")?

Merge pull request #7 from sleinen/patch-1

8fb1020

Typo fixes, and a question

core/timeline.md: Improved timeline readme

3568683

Added category to example log output (thanks for feedback @sleinen). Added functions: priority(), save(), dump() Fixed formatting (more word wrap)

timeline.md: Documented defaults for new()

0629dcf

core.timeline: Fix bug in arg parsing

2928818

The string pattern matching for detecting arguments ($arg) was wrong and this caused all args to be logged as 0.

core.timeline: Implemented updated priority() API

3047bd6

core.timeline: Fixed and improved dump()

003f1cb

Fixed an off-by-one error (braino) in choosing which timestamps to calculate the delta from. Make the cycle column wider to better accommodate large values.

core/timeline.md: Added example of dump() output

3214339

core/timeline.md: Add 5 cycle estimate w/ disabled

111b1bf

engine: Add timeline events for the breathe cycle

b81cbb5

The events are currently always enabled. This needs to be optimized. Goal for now is simply to have interesting information in the log.

engine: More timeline events for breathe cycle

6898d6a

nfv: Save timeline at end of benchmark-mode

ed63a79

Merge branch 'master' into timeline-engine

388bbfe

core.timeline: Update for new core.shm API

7629b0f

Use shm.create() instead of shm.map(). Required after update in 9fe397e.

Merge branch 'lukego/timeline-engine' into optimize

9697486

core.shm: $SNABB_SHM_ROOT overrides /var/run/snabb

9746f24

engine: Randomize timeline priority

cb66e34

Sample different priorities (log levels) on different breaths. Try to always include interesting details in the log but without wrapping too quickly.

engine: Revise start_breath / end_breath events

56d0d08

end breath now reports number of packets and bits freed.

core/timeline: dump() now prints first-to-last

9501c60

The timeline text dump now formats the log starting from the first message and continuing towards the last. The previous behavior was to print the log backwards but this was confusing and ill-considered.

Merge tag 'v2016.07' into 'timeline-detail'

8c461c8

Merged conflict in core/app.lua

core/app.lua: Fix end_breath timeline event arg

56eda3d

Requires a math.max() on an argument that is otherwise FFI cdata.

eugeneia self-assigned this Jul 25, 2016

lukego force-pushed the timeline-detail branch from be86d8f to 56eda3d Compare July 26, 2016 09:24

lukego added a commit to lukego/snabb that referenced this pull request Jul 27, 2016

Merge snabbco#973 branch 'lukego/timeline-detail' into hydra-debug

ad22827

takikawa pushed a commit to takikawa/snabb that referenced this pull request Sep 27, 2017

Merge pull request snabbco#973 from Igalia/multiqueue-fixes

edfb477

Multiqueue fixes

eugeneia mentioned this pull request Dec 7, 2018

Merge eugeneia/snabb:timeline-raptorjit into Vita inters/vita#65

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Timeline with engine instrumentation #973

Timeline with engine instrumentation #973

lukego commented Jul 22, 2016

eugeneia commented Jul 25, 2016 •

edited

Loading

lukego commented Jul 26, 2016

eugeneia commented Jul 26, 2016

lukego commented Jul 27, 2016

Timeline with engine instrumentation #973

Are you sure you want to change the base?

Timeline with engine instrumentation #973

Conversation

lukego commented Jul 22, 2016

Controversy?

Implementation notes

Example log excerpt

How to use it

Formatting timeline logs

eugeneia commented Jul 25, 2016 • edited Loading

lukego commented Jul 26, 2016

eugeneia commented Jul 26, 2016

lukego commented Jul 27, 2016

eugeneia commented Jul 25, 2016 •

edited

Loading