forked from stevecheckoway/ucsddissertation
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathconclusion.tex
134 lines (118 loc) · 7.43 KB
/
conclusion.tex
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
\chapter{Conclusion}
\label{chapter:conclusion}
This dissertation focuses on using empirical approach to study
problems surrounding threat intelligence.
In Chapter~\ref{chapter:data_character}, I proposed
a set of simple but also fundamental metrics, and measured a broad
set of threat intelligence sources, and reported the
characteristics and limitations of \ti\ data. In
Chapter~\ref{chapter:data_usage}, I designed a method and conducted
a large scale measurement on how online hosts are using threat
intelligence data. To summarize, I will first highlight the lessons
I learned from my studies, and then discuss the takeaways of my
study for the community. Here are the high-level lessons from my
threat intelligence study:
\begin{itemize}
\item Threat intelligence feeds, far from containing
homogeneous samples of some underlying truth, vary tremendously in the
kinds of data they capture based on the particularities of their
collection approach. Unfortunately, few \ti\ vendors explain the
mechanism and methodology by which their data are collected and thus
\ti\ consumers must make do with simple labels such as
``scan'' or ``botnet'', coupled with inferences about the
likely mode of collection. Worse, a significant amount of data does not
even have a clear definition of category, and is only labelled as
``malicious'' or ``suspicious'', leaving the ambiguity to consumers to
decide what action should be taken based on the data.
\item There is little evidence
that larger feeds contain better data, or even that there are
crisp quality distinctions between feeds across different categories
or metrics (i.e., that a \ti\ provider whose feed performs well on one
metric will perform well on another, or that these rankings will hold
across threat categories). How data is collected also does not
necessarily imply the feeds' attributes. For example, crowdsourcing-based feeds (\eg\ Badips feeds), are not always slower in reporting data
than the self-collecting feeds (like \feedetiprep).
\item Most IP-based \ti\ data sources are collections of
singletons (i.e., that each IP address appears in at most one source)
and even the higher-correlating data sources frequently have
intersection rates of only 10\%. Moreover, when comparing with broad
sensor data in known categories with broad effect (e.g., random
scanning) fewer than 2\% of observed scanner addresses appear in most of
the data sources I analyzed; indeed, even when focused on the largest
and most prolific scanners, coverage is still limited to 10\%. There
are similar results for file hash-based sources with little overlap
among them.
\item Security related network blocking is relative prevalent on the
Internet. The measurement has shown that many online hosts, even with
our biased selection of low security hygiene hosts, are blocking network
traffic based on threat intelligence data, or part of the data. When
studying network connectivity disruption, researchers should not ignore
the effect of these security related blocking, instead, we should model
these behavior when conducting measurements and analyzing the results.
\item The network behaviors within a subnet are not always the same,
different hosts in the same network can have different behavior, and
these cases are not neglectable. Researchers should not make such
assumptions when measuring the behaviors of a network, and one should
always be careful when trying to conclude a behavior with only a few
vantage points in a network.
\end{itemize}
The low intersection and coverage of \ti\ feeds could be the result of
several non-exclusive possibilities. First is that the underlying space
of indicators (both IP addresses and malicious file hashes) is large and
each individual data source can at best sample a small fraction thereof.
It is almost certain that this is true to some extent. Second, different collection methodologies---even for the same threat category---will select
for different sub distributions of the underlying ground truth data.
Third, this last effect is likely exacerbated by the fact that not all
threats are experienced uniformly across the Internet and, thus,
different methodologies will skew to either favor or disfavor targeted
attacks.
There are many ways people can use threat intelligence data.
It can be used to \emph{enrich} other information
(e.g., for investigating potential explanations of a security
incident), as a probabilistic canary (i.e., identifying an overall
site vulnerability via a single matching indicator may have value even
if other attacks of the same kind are not detected) or in providing a
useful source of ground truth data for supervised machine learning
systems. However, even given such diverse purposes, organizations still
need some way to prioritize which threat intelligence products to invest
in. The metrics I proposed in Chapter~\ref{chapter:data_character}
provide some direction for such choices. For example, an analyst who
expects to use \ti\ interactively during incident response would be better
served by feeds with higher coverage, but can accommodate poor accuracy,
while an organization trying to automatically label malicious instances
for training purposes (e.g., brute force attacks) will be better served by
the converse.
Based on my experience analyzing \ti\ data and how people are using it,
I try to provide several recommendations for the security community on this
topic moving forward:
\begin{itemize}
\item The threat intelligence community should standardize data labeling,
with a clear definition of what the data means and how the data is
collected. Security experts can then assess whether the data fit their
need and the type of action should be taken on this data.
\item There are few rules of thumb in selecting among \ti\ feeds,
as there is not a clear correlation between different feed properties.
Consumers need empirical metrics, such as those I describe, to
meaningfully differentiate data sources, and to prioritize certain
metrics based on their specific need.
\item Blindly using \ti\ data---even if one could afford to acquire
many such sources---is unlikely to provide better coverage and is
also prone to collateral damage caused by false positives. Customers
need to be always aware of these issues when deciding what action
should be taken on this data.
\item Besides focusing on the \ti\ data itself, future work should
investigate the operational uses of threat intelligence in industry,
as the true value of \ti\ data can only be understood in operational
scenarios. Moreover, the community should explore more potential ways
of using the data, which will extend people's understanding of threat
intelligence and also influence how vendors are curating the data and
providing the services.
\item Follow up on the previous point, it is also important to for the
community to study the potential impact of people using threat
intelligence. Since this is security related data, certain use cases,
like blocking network traffic (either IP or DNS), could have broader
impact on the Internet. As threat intelligence getting more and more
popular, researchers should closely follow the impact of these products,
so we can spot problems early on and make changes before heavy damage
is made.
\end{itemize}