-
Notifications
You must be signed in to change notification settings - Fork 0
/
conclusions.tex
45 lines (34 loc) · 1.69 KB
/
conclusions.tex
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
\section{Conclusions}
\label{s:conclusion}
%%%%%%%%%%%%%%%%%%%%%
The Holistic Measurement Driven Resilience (HMDR) project~\cite{HMDRweb}
seeks to characterize faults in modern large-scale systems
in terms of root and/or most probable cause, likelihood
of detection, frequency of occurence, timescales for
resultant system impact, and efficiency of error recovery.
In addition we seek to determine instrumentation that can be used for
fault detection, characterization, and triggering
of response mechanisms.
In this work we developed a basis - in the form of a machine-readable
vocabulary and an annotation schema - for cataloging and discovering
collections of log-like data and for annotating them to expose
a tractable view of significant events, expert commentary and
contextual notes. We also developed tools that use this basis
to find and filter log data in support of bring failure analysis to
a tractable scope.
Our annotations of key events enable
more efficient search and indentification of
events, locations, and timescales of interest.
Further, the annotations enable identification
of external events, such as fault injections
and component replacements, that will enable
more accurate characterization of fault occurrences
and impact.
In addtion, HMDR releases datasets for resilience research.
We will be releasing the annotations to augment the dataset~\cite{Mutrino3mo} used in this
work, which is currently available
In addition, we will be releasing an annotated dataset from controlled, complex single
and multi fault injection tests~\cite{CieloFICUG2017} on the 9000 node Cielo Cray XE system.
The annotations will facilitate understanding
of the datasets.
Next steps include \RED{XXXX}