-
Notifications
You must be signed in to change notification settings - Fork 30
/
Copy pathLecture16.tex
190 lines (152 loc) · 7.31 KB
/
Lecture16.tex
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
%!TEX root = InfoSec.tex
% Lecture 16: 10 November 2014
\sektion{16}{Spam}
Scope of problem:
\begin{itemize}
\item Vast majority of email is spam (99+\%)
\item Lots is fraudulent (or inappropriate)
\item 5\% of US users have bought something from a spammer
The anonymity makes this attractive for certain kinds of products
\item Spamming often pays (low cost to send, so need little success to
profit)
\end{itemize}
\sidenote{{\bf How email works}
\begin{itemize}
\item Messages written in standard format
\begin{itemize}
\item Headers: To, From, Date, ...
\item Body: can encode different media types in body
\end{itemize}
\item Traditionally:
\framebox{\parbox[c]{2cm}{sender's computer}} $\xrightarrow{\text{SMTP}}$
\framebox{\parbox[c]{2cm}{sender's MTA}} $\xrightarrow{\text{SMTP}}$
\framebox{\parbox[c]{2cm}{recipient's MTA}} $\xrightarrow{\text{IMAP}}$
\framebox{\parbox[c]{2cm}{recipient's computer}}
(MTA: Mail Transfer Agent)
\item Webmail model:
\framebox{\parbox[c]{2cm}{sender's computer}} $\xrightarrow{\text{HTTP(S)}}$
\framebox{\parbox[c]{2cm}{sender's mail\\service}} $\xrightarrow{\text{SMTP}}$
\framebox{\parbox[c]{2cm}{recipient's mail\\service}} $\xrightarrow{\text{HTTP(S)}}$
\framebox{\parbox[c]{2cm}{recipient's computer}}
\item More complexities:
\begin{itemize}
\item Forwarding
\item Mailing lists
\item Autoresponders
\end{itemize}
\end{itemize}
}
\subsektion{Spam as a market failure}
It is very cheap to send email.
Most of the cost falls on recipient
\begin{itemize}
\item Needs to store message
\item Human takes the time to actually read the message
\end{itemize}
\subsektion{Anti-spam strategies}
Laws, technology, or combination
Note that most spam is already illegal (either fraudulent offer, false adverising, or offering an illegal product).
\begin{definition}{Spam}
\begin{enumerate}
\item Email the recipient doesn't want to receive
Problems:
\begin{itemize}
\item Defined after the fact
\item Legally problematic (anyone can say they didn't want some message)
\item Not what you want (just not wanting it doesn't make it spam)
\end{itemize}
\item Unsolicited email
Problems:
\begin{itemize}
\item What does this mean? (May not explicitly have asked for a given email)
\item Lots of unsolicited email is wanted
\end{itemize}
\item Unsolicited commercial email
Problems: less than in definition 2, but still the same issues
\end{enumerate}
\end{definition}
{\bf Free speech:} (legal constraint, principle we'd like to honor)
\begin{itemize}
\item Minimum: don't stop a communication if both parties want it
\item Legally, there's no absolute right not to hear undesired speech.
\item Commercial speech gets less protection than political speech.
\end{itemize}
\begin{definition}{Spam (CAN SPAM Act)}
Any commercial, non-political email is spam unless:
\begin{enumerate}[(a)]
\item recipient has explicitly consented, or
\item sender has a continuing business relationship with recipient, or
\item email relates to an ongoing commercial transaction between the sender and receiver
\end{enumerate}
\end{definition}
None of which can really be distinguished/enforced with software.
There exists vigorous enforcement against wire fraud, false medical claims, etc. because it's relatively easy to follow
the money.
Law against forging the From address is surprisingly effective\\
Disadvantages for spammer:
\begin{itemize}
\item Forced to identify themself
\item Easy to filter
\end{itemize}
Private lawsuits
\begin{itemize}
\item ISP sue spammer to cover their server resources, etc.
(AOL: sue spammers and give a random user their stuff!)
\item Serves as a deterrent
\end{itemize}
Anti-spam technologies:
\begin{itemize}
\item Blacklist:
List of ``known spammers'', refuse to accept mail from them (usually by silently discarding them)
\begin{itemize}
\item If list holds From addresses: Spammer will spoof From address (though then spammer is also breaking the law)
\item If list holds IP addresses: Spammers move around, compromise innocent users' machines and send spam from there (very common);
also note that outgoing email IP address is often shared
\end{itemize}
How aggressive should you be about putting people on the blacklist?
\begin{itemize}
\item Too slow: spammers can get away with spamming
\item Too quick: false blocking
\item Also need to worry about DoS attack; forge spam ``from'' the victim
\end{itemize}
\item Whitelist:
List of people you know, reject email from everybody else
\begin{itemize}
\item Blocks too much if you forget to add people to your ``allow'' list
\item But: can combine with other countermeasures (exempt certain people from another countermeasure)
\item Better: allow from a trusted list and be more skeptical of others
\end{itemize}
\item Require payment:
\begin{itemize}
\item Pay in money:
Sender pays receiver\\
OR sender pays receiver IF receiver reports messsage as spam
(generates incentive problem for reciever)\\
OR sender pays charity if receiver reports as spam
Problem: really expensive for large mailing lists
\item Pay in wasted computing time:
Sender must solve some difficult computational puzzle
Works internationally, but big problem for large mailing lists,
destroys computing time
\item Pay in human attention:
CAPTCHA %(Completely Automated Public Turing test to tell
%Computers and Humans Apart)
Can hire solving of CAPTCHAs in various ways (sweatshops, make people solve to see porn, ...)
\end{itemize}
General problems: often raises cost of legit mail, hard on legit mailing lists, often wastes resources rather than transferring them, spammers sometimes willing to pay anyways because it often costs them less
\item Sender authentication
Address-based: princeton.edu says which IPs can send a @princeton.edu ``From'' address.
\item Content-based filtering:
Recipient applies filter algorithm to content of incoming email
\begin{itemize}
\item Early days: keyword-based
Lots of false positives, ways to work around these
\item Now: word-based machine learning, often also personalized
(relying on user to label as spam or not)
Spammer can test the non-personalized part by making an account
and seeing what gets marked as spam -- until filters started
looking for the word-salad test messages
\item Collaborative filtering: use other users' reports as indicator of spam
\end{itemize}
\end{itemize}
Robocalls now a huge problem: FTC gave public challenge to solving this. A kind of fun defense? Pretend to beep and be an answering machine.