-
Notifications
You must be signed in to change notification settings - Fork 8
/
Copy pathsemantic-sieve.html
448 lines (409 loc) · 21.7 KB
/
semantic-sieve.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
<!DOCTYPE html>
<html>
<head>
<link rel="canonical" href="https://hardmath123.github.io/semantic-sieve.html"/>
<link rel="stylesheet" type="text/css" href="/static/base.css"/>
<title>All about that basis - Comfortably Numbered</title>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8"/>
<meta charset="utf-8"/>
<meta name="viewport" content="width=device-width, initial-scale=1.0, maximum-scale=1.0, user-scalable=no" />
<link rel="alternate" type="application/rss+xml" title="Comfortably Numbered" href="/feed.xml" />
<!--
<script src="https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.1/MathJax.js?config=TeX-AMS-MML_HTMLorMML"></script>
<script>
MathJax.Hub.Config({
tex2jax: {inlineMath: [['$','$']]}
});
</script>
-->
<link rel="stylesheet" href="https://cdn.jsdelivr.net/npm/[email protected]/dist/katex.min.css" integrity="sha384-Um5gpz1odJg5Z4HAmzPtgZKdTBHZdw8S29IecapCSB31ligYPhHQZMIlWLYQGVoc" crossorigin="anonymous">
<script defer src="https://cdn.jsdelivr.net/npm/[email protected]/dist/katex.min.js" integrity="sha384-YNHdsYkH6gMx9y3mRkmcJ2mFUjTd0qNQQvY9VYZgQd7DcN7env35GzlmFaZ23JGp" crossorigin="anonymous"></script>
<script defer src="https://cdn.jsdelivr.net/npm/[email protected]/dist/contrib/auto-render.min.js" integrity="sha384-vZTG03m+2yp6N6BNi5iM4rW4oIwk5DfcNdFfxkk9ZWpDriOkXX8voJBFrAO7MpVl" crossorigin="anonymous"></script>
<script>
document.addEventListener("DOMContentLoaded", function() {
renderMathInElement(document.body, {
// customised options
// • auto-render specific keys, e.g.:
delimiters: [
{left: '$$', right: '$$', display: true},
{left: '$', right: '$', display: false},
{left: '\\begin{align}', right: '\\end{align}', display: true},
{left: '\\(', right: '\\)', display: false},
{left: '\\[', right: '\\]', display: true}
],
// • rendering keys, e.g.:
throwOnError : false
});
});
</script>
</head>
<body>
<header id="header">
<script src="static/main.js"></script>
<div>
<a href="/"><span class="left-word">Comfortably</span> <span class="right-word">Numbered</span></a>
</div>
</header>
<article id="postcontent" class="centered">
<section>
<h1>All about that basis</h1>
<center><em><p>Three ways to compute semantic primes</p>
</em></center>
<h4>Sunday, November 13, 2016 · 9 min read</h4>
<blockquote>
<p>I recently had the privilege of attending a talk by Guy Steele: and that
reminded me of one of his most famous lectures, <em>Growing a Language</em>. I don’t
want to spoil the talk for you, so I encourage you to watch it <a href="https://www.youtube.com/watch?v=_ahvzDzKdB0">on
YouTube</a>, or at least to read a
<a href="https://www.cs.virginia.edu/~evans/cs655/readings/steele.pdf">transcript</a>. I
promise that it’s relevant.</p>
</blockquote>
<p>In the ring of integers, prime numbers can be enumerated using the “Sieve of
Eratosthenes”. In <em>linguistics</em>, we define a <a href="https://en.wikipedia.org/wiki/Semantic_primes">semantic
prime</a> as an idea that can’t be
expressed in simpler terms (for example, “there exists” or “because”). The
question I ask in this blog post is, how do I enumerate semantic primes? In
other words, <em>does there exist an efficient semantic sieve?</em></p>
<p>The motivation here isn’t to build some sort of <a href="http://ogden.basic-english.org">Basic
English</a>—or Newspeak—but rather to better
understand the building blocks of our communication. However, this problem has
more than just linguistic appeal. Consider the popular computer game Alchemy,
which lets you combine elements like “fire” and “water” to produce “steam”.
Given all the elements of alchemy, as well as the possible recipes, what are
the possible starting elements? Similarly: approximately twice a year like
clockwork, someone in the Scratch community notices that <a href="https://wiki.scratch.mit.edu/wiki/List_of_Block_Workarounds">some Scratch blocks
can be defined in terms of
others</a>, and then
proceeds to ask what the minimal set of Scratch blocks is (you can see, for
example, the <a href="https://scratch.mit.edu/discuss/topic/223486/">October</a> and
<a href="https://scratch.mit.edu/discuss/topic/195968/">April</a> editions of this
conversation). Clearly, any block that has no “workaround” (and, analogously,
any word without a definition) must remain in a set of semantic primes, which I
will call the “basis set” from now on.</p>
<p>Beyond that, though, it is not immediately obvious which blocks or words must
be kept in order to minimize the basis. If you define a “puppy” as a “small
dog” and a “dog” as a “large puppy”, then which should you keep in your basis
set? Perhaps you already have “small”, in which case it makes sense to keep
“dog” to get “puppy” for free. Clearly, there are many possible basis sets
(trivially, all words together form a basis). We’re just searching for the
<em>smallest</em> such set (which may not even be unique!).</p>
<p>All this is to say, this is not an easy problem.</p>
<p>If you’re impatient for a solution (or just want to get the answer from a more
reliable source), David P. Dailey has the answer in his paper, <a href="http://www.aclweb.org/anthology/J86-4003">The extraction
of a minimum set of semantic primitives from a monolingual dictionary is
NP-Complete</a>, published in 1986.</p>
<p>Here, I want to present three subtly different ways to solve this problem.
While the solutions I will present are perhaps evident to you already (and
honestly not that profound), I would like to take this as an opportunity to
explore the delightful subtlety that connects various areas of math—something
that is getting dangerously close to becoming a theme of this blog.</p>
<h3 id="act-i-in-which-we-dissolve-our-problems-to-get-a-solution">Act I: In which we dissolve our problems to get a solution</h3>
<p>A dictionary is something that takes knowledge and gives you more knowledge.
This is immediately reminiscent of formal logic, where you can take two
statements and derive a new statement using an inference rule (we explored much
of this theory in my 4-part series on formal logic and automated
theorem-proving, <a href="meet-the-robinson.html">Meet the
Robinson</a>).</p>
<p>Let each word be associated with a proposition. We can now reason about the
provability of propositions equivalently with the definability of the
associated words. In particular, a definition corresponds to an implication,
where the conjunction of the definition’s propositions together imply the
defined word’s proposition. Thus, “a bird is a winged animal” turns into
<code>winged</code> ∧ <code>animal</code> → <code>bird</code>.</p>
<p>Now, to reason about provability, I used what I knew about
resolution-refutation. We want to prove the conjunction of the propositions of
all the words in the dictionary by adding as few fresh axioms as possible,
where each added axiom is just a lone proposition. One way to do this is to
resolution-refute against the negation of this conjunction, searching for
clauses that <em>only</em> have terms in negative polarity. The smallest such clause
represents the minimal set of axioms needed to make the negation disprovable,
and thus the desired conjunction provable.</p>
<p>Notice, however, that we can’t use the full power of resolution-refutation
here, because we are limited to only one incomplete inference rule: <em>modus
ponens</em>. For instance, if <code>P</code> → (<code>Q</code> ∨ <code>R</code>) and <code>Q</code> → <code>Z</code> and <code>R</code>
→ <code>Z</code>, then we cannot prove <code>Z</code> via modus ponens even though it is
“obviously” true.</p>
<p>So, rather than “resolving” two clauses, we need to do some other kind of
operation, which I’m going to call “dissolve” for lack of a better name.
Dissolving is kind of the opposite of modus ponens. To solve the problem at
hand, you compute the closure of the set of axioms (along with the negation you
want to disprove) under dissolution, and filter out clauses that only have
negative terms. Once you’ve saturated the set, the smallest such clause gives
you the basis.</p>
<pre><code class="lang-haskell">
import Data.List(nub, union, (\\))
import qualified Data.Set as Set
import Control.Exception.Base(assert)
type Clause = (Set.Set String, Set.Set String)
-- Polarity = Positive Negative
isOnlyNegative :: Clause -> Bool
isOnlyNegative (a, _) = Set.null a
-- Only applies when dissolving (a & b & ...) => e against !x | !y | ...
dissolve :: Clause -> Clause -> Clause
dissolve (a, x) (b, y) =
let u = Set.intersection x b in -- u is probably a singleton set
assert (Set.size u <= 1) -- or so we hope!
(Set.union a (Set.difference b u),
Set.union (Set.difference x u) y)
dissolveAll :: [Clause] -> [Clause] -> [Clause]
dissolveAll x y =
let apex =
filter isOnlyNegative $ nub [dissolve b a | a <- x, b <- y] \\ x in
if null apex then
[]
else
apex ++ (dissolveAll (x++apex) apex)
dissolveStream :: [Clause] -> [Clause]
dissolveStream cs =
let bad = [(Set.empty, Set.unions $ map fst cs)] in
dissolveAll (cs++bad) bad
solutions cs =
map snd $ filter isOnlyNegative (dissolveStream cs)
-- You gotta know when to fold 'em.
bestSolutions [] n = []
bestSolutions (s:ss) n =
if n<0 || Set.size s < n then
s:(bestSolutions ss (Set.size s))
else
bestSolutions ss n
solve d = bestSolutions (solutions d) (-1)
mkdfn (a:as) = (Set.singleton a, Set.fromList as)
w = map mkdfn [
"water":["hydrogen", "oxygen", "fire"],
"oxygen":["plant", "water"],
"fire":["oxygen", "heat", "hydrogen"],
"wood":["plant", "soil"],
"plant":["water", "time", "soil"],
"soil":["nitrogen", "time"],
"nitrogen":["hydrogen", "heat"],
"heat":["time"]
]
main = putStrLn $ show $ solve w
</code></pre>
<p>Against all odds, this seems to work, and with a little thought you can
probably convince yourself that it will always work.</p>
<h3 id="act-ii-what-is-an-animal-but-a-bird-without-wings-">Act II: What is an animal, but a bird without wings?</h3>
<p>What else lets us take things and make new things? We talked about statements
in logic, so how about the other kind of mathematical statement: the equation?
We can certainly combine equations to derive more knowledge, so it doesn’t
sound far-fetched to model words with variables and definitions with equations,
and then reason about the <em>solvability</em> of the equations (rather than the
<em>provability</em> of the terms).</p>
<p>More formally: assign each word a unique positive integer and a variable.
Construct linear equations using the sum of the definition-words on the
left-hand-side, and the word-to-be-defined with the correct coefficient on the
right-hand-side. So, if a bird=4 is a winged=2 animal=3, then construct
the equation “winged + animal = (5/4) bird”, which is clearly true.</p>
<p>How does this help?</p>
<p>Well, solvability of linear equations is well-studied stuff. If you encode the
equations in a matrix, and convert it to reduced-row echelon form using
Gauss-Jordan elimination, the result is a matrix where</p>
<ol>
<li>The first few rows show you how to get some variables in terms of other
variables – indeed, the maximal linearly-independent set</li>
<li>The remaining rows say that those variables can’t be solved for in terms of
the rest</li>
</ol>
<p>Thus, if we supply “values” of all variables in the latter rows, then we can
deduce values of the variables in the former. Since the Gauss-Jordan
elimination procedure minimizes this set, we’re already done.</p>
<pre><code class="lang-python">
from __future__ import division
import sympy
n_cache = []
def numify(g):
def n(x):
try:
return n_cache.index(x)
except ValueError:
n_cache.append(x)
return len(n_cache)-1
return [map(n, r) for r in g]
def mktrix(g):
count = len(g)
m = []
for d in g:
row = [0]*(len(n_cache)+1)
for k in d[1:]:
row[k] = 1
row[d[0]] = -(sum(d[1:])+len(d[1:]))/(d[0]+1)
m.append(row)
return m
def solve(m):
return sympy.Matrix(m).rref()[1]
def basis(g):
return set(n_cache) - {n_cache[x] for x in solve(mktrix(numify(g)))}
test = [
["water", "hydrogen", "oxygen", "fire"],
["oxygen", "plant", "water"],
["fire", "oxygen", "heat", "hydrogen"],
["wood", "plant", "soil"],
["plant", "water", "soil"],
["soil", "nitrogen", "time"],
["nitrogen", "hydrogen", "heat"],
["heat", "time"]
]
print basis(test)
</code></pre>
<p>But: Gauss-Jordan elimination is O(n<sup>3</sup>) and Dailey claims this
problem is NP-complete. Clearly, we haven’t just proven that P=NP. So what
gives?</p>
<p>It turns out that we made a subtle extra assumption when encoding a
“provability” problem in terms of a “solvability” problem. Logical implications
are one-way: if P implies Q, then Q does not imply P. However, the linear
equations we’re working with are <em>not</em> one-way! If you can solve for x in terms
of y, then you can usually also solve for y in terms of x.</p>
<p>So, for example, given definitions of “bird” and “winged”, and the relationship
“a bird is a winged animal”, the linear equation model assumes we can deduce
what an “animal” is. And in many cases, this is a reasonable assumption:
indeed, it’s kind of how the game <em>Jeopardy!</em> works: “if you add wings to
<em>this</em>, you get a bird” is not the worst way to hint at the word “animal”. It
relies, in the end, on what you mean by “define”.</p>
<p>That being said, I do want to note that this kind of reasoning can lead you
straight to deriving Buckingham’s Π Theorem in dimensional analysis (we
discussed this theorem in detail <a href="dimensional-analysis.html">in a recent
article</a>). This fact shouldn’t come as a surprise to
anyone: dimensional analysis, too, deals with getting more knowledge from
existing knowledge. The reasoning shown here can help identify redundant units,
and thus discover the “fundamental” kinds of quantities in the physical world.
Do we necessarily need units for frequency <em>and</em> for time?</p>
<h3 id="act-iii-come-on-we-ve-got-a-diem-to-karp-e-">Act III: Come on! We’ve got a diem to Karp(e)!</h3>
<p>In our hearts, we always knew this was a graph theory problem, and so naturally
I feel compelled to present a graph-theoretic solution. This is the one Dailey
presents in his 1986 paper cited above.</p>
<p>Consider a graph where each word corresponds a vertex, and a directed edge
travels from a word to another if the former is used in the definition of the
latter. Naively, we may imagine that all we need to do is find all the leaves
of this graph. However, this doesn’t work in the presence of cycles (if “GNU”
is defined as “GNU’s Not UNIX”, then you definitely want “GNU” in your basis
set).</p>
<p>What we really seek is the smallest possible set of vertices such that:</p>
<ol>
<li>All leaves of the graph are in the set</li>
<li>If each vertex in the set is removed from the graph, the result is
cycle-free</li>
</ol>
<p>In other words, what we want is the minimum <a href="https://en.wikipedia.org/wiki/Feedback_vertex_set">feedback vertex
set</a> of the graph, which is
one of Karp’s 21 NP-Complete problems. This is unfortunate, but on the bright
side, computers are fast and so we can at least hope to find solutions.</p>
<p>There exist a bunch of algorithms to compute the FVS, but I chose to stick with
the theme of logic by using an SMT solver (inspired by
<a href="https://github.com/firedrakeproject/glpk/blob/master/examples/mfvsp.mod">this</a>).
That is, I encoded each vertex with (1) a “guard” bit that says “am I part of
the basis”, and (2) an integer that corresponds to an index into a topological
sort. Then, for each pair of vertices, I check that if neither vertex is part
of the basis, then the topological indices obey any edges between the vertices.
This gives me a cheap cyclic-ness check that is easy to execute symbolically
because it is nonrecursive. Finally, using
<a href="http://emina.github.io/rosette">Rosette</a> to run the algorithm symbolically, I
can solve for guard bits and indices that make the graph cycle-free. All that’s
left is to minimize the number of guard bits that are “true”, which is
supported by Z3 via a “minimize” call.</p>
<pre><code class="lang-scheme">#lang rosette
(define graph
'((water hydrogen oxygen fire)
(oxygen plant water)
(fire oxygen heat hydrogen)
(wood plant soil)
(plant water time soil)
(soil nitrogen time)
(nitrogen hydrogen heat)
(heat time)
(hydrogen)
(time)
))
(define (cycle-free? graph)
(andmap
(λ (vertex)
(define v (car vertex))
(define children (cdr vertex))
(define topology (cdr (assoc v s-cache)))
(andmap
(λ (c)
(define topology+ (cdr (assoc c s-cache)))
(if (or (car topology) (car topology+))
#t
(> (cadr topology) (cadr topology+))))
children))
graph))
(displayln "Initializing...")
(define s-cache
(map (λ (v)
(define name (car v))
(define-symbolic* guard boolean?) ; am I part of the initial spawn
(define-symbolic* index integer?)
(list name guard index))
graph))
(define fvs-count (apply + (map (λ (x) (if (cadr x) 1 0)) s-cache)))
(displayln "Solving...")
(define sol
(optimize #:minimize (list fvs-count)
#:guarantee (assert (cycle-free? graph))))
(if (unsat? sol)
(displayln "dude you failed")
(append
; leaves
(map car (filter (λ (x) (= (length x) 1)) graph))
; feedback vertex set
(map car
(filter
(λ (x)
(define u (evaluate (cadr x) sol))
(and (not (constant? u)) u))
s-cache))))
</code></pre>
<p>Tim scraped the Scratch Wiki’s workaround database for me, and though the
scraping is dodgy at best, my analysis showed that of the 103 blockspecs
mentioned, only about half are needed (Brian, of course, pointed out that in
Snap<em>!</em> all you need is lambda). So that answers <em>that</em> question. As for the
English language, well, I would almost certainly need a bigger computer to deal
with millions of words. I guess I could scrape from <a href="http://gcide.gnu.org.ua">GNU’s free
dictionary</a>, but that’s a project for another day.</p>
<p>Meanwhile, I want to note that this procedure doesn’t allow one word to have
multiple equivalent definitions, unlike both the propositional-provability
approach <em>and</em> the linear-algebra approach.</p>
<h3 id="conclusion">Conclusion</h3>
<p>If nothing else, I loved this little question because it forced me to think
deeply about how to model something new and unfamiliar in terms of math I
already knew.</p>
<p>My chemistry teacher used to put at least one “toolbox” question on his tests.
A toolbox question is one that doesn’t imply what tool you should use to solve
it: unlike standard textbook stoichiometry problems that can all be solved with
the same trick, a toolbox problem forces you to look at your toolbox and pick
the tools yourself.</p>
<p>Well, in the same spirit—and I’ve <a href="no-rules-in-math.html">used this metaphor
before</a>—I feel that high school mathematics has given
me an incredible set of tools, but not nearly enough opportunities to use them.
Sure, I’ve often had to solve linear equations and prove theorems in
propositional logic: but I have rarely <em>chosen</em> to do so in order to solve a
problem.</p>
<p>Maybe that’s why I care so much about finding math in the world around me.</p>
</section>
<div id="comment-breaker">◊ ◊ ◊</div>
</article>
<footer id="footer">
<div>
<ul>
<li><a href="https://github.com/kach">
Github</a></li>
<li><a href="feed.xml">
Subscribe (RSS feed)</a></li>
<li><a href="https://twitter.com/hardmath123">
Twitter</a></li>
<li><a href="https://creativecommons.org/licenses/by-nc/3.0/deed.en_US">
CC BY-NC 3.0</a></li>
</ul>
</div>
<script>
(function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){
(i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o),
m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m)
})(window,document,'script','//www.google-analytics.com/analytics.js','ga');
ga('create', 'UA-46120535-1', 'hardmath123.github.io');
ga('require', 'displayfeatures');
ga('send', 'pageview');
</script>
</footer>
</body>
</html>