-
Notifications
You must be signed in to change notification settings - Fork 8
/
Copy pathbinarystrings.html
196 lines (184 loc) · 10.5 KB
/
binarystrings.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
<!DOCTYPE html>
<html>
<head>
<link rel="canonical" href="https://hardmath123.github.io/binarystrings.html"/>
<link rel="stylesheet" type="text/css" href="/static/base.css"/>
<title>Notes on Binary Strings - Comfortably Numbered</title>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8"/>
<meta charset="utf-8"/>
<meta name="viewport" content="width=device-width, initial-scale=1.0, maximum-scale=1.0, user-scalable=no" />
<link rel="alternate" type="application/rss+xml" title="Comfortably Numbered" href="/feed.xml" />
<!--
<script src="https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.1/MathJax.js?config=TeX-AMS-MML_HTMLorMML"></script>
<script>
MathJax.Hub.Config({
tex2jax: {inlineMath: [['$','$']]}
});
</script>
-->
<link rel="stylesheet" href="https://cdn.jsdelivr.net/npm/[email protected]/dist/katex.min.css" integrity="sha384-Um5gpz1odJg5Z4HAmzPtgZKdTBHZdw8S29IecapCSB31ligYPhHQZMIlWLYQGVoc" crossorigin="anonymous">
<script defer src="https://cdn.jsdelivr.net/npm/[email protected]/dist/katex.min.js" integrity="sha384-YNHdsYkH6gMx9y3mRkmcJ2mFUjTd0qNQQvY9VYZgQd7DcN7env35GzlmFaZ23JGp" crossorigin="anonymous"></script>
<script defer src="https://cdn.jsdelivr.net/npm/[email protected]/dist/contrib/auto-render.min.js" integrity="sha384-vZTG03m+2yp6N6BNi5iM4rW4oIwk5DfcNdFfxkk9ZWpDriOkXX8voJBFrAO7MpVl" crossorigin="anonymous"></script>
<script>
document.addEventListener("DOMContentLoaded", function() {
renderMathInElement(document.body, {
// customised options
// • auto-render specific keys, e.g.:
delimiters: [
{left: '$$', right: '$$', display: true},
{left: '$', right: '$', display: false},
{left: '\\begin{align}', right: '\\end{align}', display: true},
{left: '\\(', right: '\\)', display: false},
{left: '\\[', right: '\\]', display: true}
],
// • rendering keys, e.g.:
throwOnError : false
});
});
</script>
</head>
<body>
<header id="header">
<script src="static/main.js"></script>
<div>
<a href="/"><span class="left-word">Comfortably</span> <span class="right-word">Numbered</span></a>
</div>
</header>
<article id="postcontent" class="centered">
<section>
<h1>Notes on Binary Strings</h1>
<center><em><p>A memory dump of useful functions</p>
</em></center>
<h4>Thursday, November 13, 2014 · 3 min read</h4>
<p>One of the things I had to do for PicoCTF was learn how to wrangle binary
strings in various languages. The idea is that you think of a string as an
array of numbers instead of an array of characters. It’s only coincidental that
some of those numbers have alternate representations, such as “A”. The
alphabet-number correspondence is an established table. Look up
<a href="http://wikipedia.org/wiki/ASCII">ASCII</a>.</p>
<p>Each number is a byte (aka an <code>unsigned char</code>), so it ranges from 0 to 255.
This means it’s convenient to express them in hex notation—each number is two
hex digits, so <code>0xff</code> is 256.</p>
<p>Using this, we can turn strings into hex sequences (by doubling the number of
printed characters), and then turn the hex sequence into a decimal number. This
is great for crypto, because many algorithms (including RSA) can encrypt a
single number.</p>
<p>We can also use <a href="http://en.wikipedia.org/wiki/Base64">base64</a> scheme to turn
binary strings into printable strings. It uses case-sensitive alphabet (52),
numbers (10), <code>+</code>, and <code>/</code> (2) as the 64 symbols. Each set of three bytes is
represented by four base64 symbols. Note that this means we need to pad the
string if it isn’t a multiple of 3 bytes. The padding is indicated with <code>=</code> or
<code>==</code> at the end of the encoded message.</p>
<p>This post summarizes some really useful functions for working with binary
strings.</p>
<h3 id="python">Python</h3>
<p>You can use hexadecimal literals in Python strings with a <code>\x</code> escape code:</p>
<pre><code class="lang-python">s = '\x63\x6f\x77'
</code></pre>
<p>To get this representation of a string that’s already in memory, use <code>repr</code>. It
will turn unprintable characters into their escape codes (though it will prefer
abbreviations like <code>\n</code> over hex if possible).</p>
<p>You can use <code>ord</code> to turn a character into a number, so <code>ord('x') == 120</code> (in
decimal! It’s equal to <code>0x78</code>). The opposite function is <code>chr</code>, which turns a
number into a character, so <code>chr(120) == 'x'</code>. Python allows hex literals, so
you can also directly say <code>chr(0x78) == 'x'</code>.</p>
<p>To convert a number to a hex string, use the (guesses, anyone?) <code>hex</code> function.
To go the other way, use <code>int(hex_number, 16)</code>:</p>
<pre><code class="lang-python">hex(3735928559) == '0xdeadbeef'
int('deadbeef', 16) == 3735928559
</code></pre>
<p>To convert a string to or from hex, use <code>str.encode</code> and <code>str.decode</code>:</p>
<pre><code class="lang-python">>>> 'cow'.encode('hex')
'636f77'
>>> '636f77'.decode('hex')
'cow'
</code></pre>
<p>The pattern <code>hex(number).decode('hex')</code> is quite common (for example, in RSA
problems). Keep in mind that you need to strip the leading <code>0x</code> and possibly a
trailing <code>L</code> from the output of <code>hex</code>, and also make sure to pad with a leading
<code>0</code> if there are an odd number of characters.</p>
<p>Finally, Python handles base64 with the <code>base64</code> module, but you can also just
use <code>str.encode('base64')</code> and <code>str.decode('base64')</code>. Keep in mind that it
tacks on trailing <code>\n</code>s. I don’t know why.</p>
<h3 id="javascript">JavaScript</h3>
<p>JavaScript is pretty similar. It supports <code>\x12</code> notation, and <code>0x123</code> hex
literals. The equivalent of <code>ord</code> and <code>chr</code> are <code>"a".charCodeAt(0)</code> and
<code>String.fromCharCode(12)</code>, respectively.</p>
<p>You can convert a hex string to decimal with <code>parseInt(hex_string, 16)</code>, and go
the other way with <code>a_number.toString(16)</code>:</p>
<pre><code class="lang-javascript">parseInt("deadbeef", 16) == 3735928559
3735928559.toString(16) == 'deadbeef'
</code></pre>
<p>Note the lack of <code>0x</code>.</p>
<p>Unfortunately, there isn’t a built-in string to hex string encoding or decoding
built into JavaScript, but it isn’t too hard to do on your own with some clever
Regexes. The tricky part is knowing when to pad.</p>
<p>Browser JS has <code>atob</code> and <code>btoa</code> for base64 conversions (read them as
“ascii-to-binary” and “binary-to-ascii”). You can install both of those as
Node modules from npm: <code>npm install atob btoa</code>.</p>
<h3 id="bash">Bash</h3>
<p>For the sake of completeness, I wanted to mention how to use Bash to input
binary strings to programs. Use the <code>-e</code> flag to parse hex-escaping in string
literals, and <code>-n</code> to suppress the trailing <code>\n</code> (both of these are useful to
feed a binary a malformed string):</p>
<pre><code class="lang-bash">$ echo "abc\x78"
abc\x78
$ echo -e "abc\x78"
abcx
$ echo -ne "abc\x78"
abcx$ # the newline was suppressed so the prompt ran over
</code></pre>
<p>Alternatively, <code>printf</code> does pretty much the same thing as <code>echo -ne</code>.</p>
<p>Sometimes you want to be able to write more data after that, but the binary is
using <code>read()</code>. In those cases, it’s helpful to use <code>sleep</code> to fool <code>read</code> into
thinking you finished typing:</p>
<pre><code class="lang-bash">{ printf "bad_input_1\x00 mwahaha";
# the zero char signals end-of-string
# in C, which can be used to wreak all
# sorts of havoc. :)
sleep 0.1;
printf "bad_input_2";
sleep 0.1;
cat -; # arbitrary input once we have shell or something
} | something
</code></pre>
<p>Or, if you’re intrepid, you can use Python’s <code>subprocess</code> or Node’s
<code>child_process</code> to pipe input to the binary manually.</p>
<p>UNIX comes with the <code>base64</code> command to encode the standard input. You can use
<code>base64 -D</code> to decode.</p>
<h3 id="parting-tips">Parting tips</h3>
<p>Use <code>hex</code> when your binary string is a giant number, and use <code>base64</code> when
you’re simply turning a binary string into a printable one.</p>
<p>Use <code>wc -c</code> to get the character count of a binary file.</p>
<p>Use <code>strings</code> to extract printable strings from a binary file, though ideally
<a href="https://sourceware.org/bugzilla/show_bug.cgi?id=17512">not on trusted files</a>.</p>
<p>Finally, use <code>od</code> or <code>xxd</code> to pretty-print binary strings along with their hex
and plaintext representations.</p>
</section>
<div id="comment-breaker">◊ ◊ ◊</div>
</article>
<footer id="footer">
<div>
<ul>
<li><a href="https://github.com/kach">
Github</a></li>
<li><a href="feed.xml">
Subscribe (RSS feed)</a></li>
<li><a href="https://twitter.com/hardmath123">
Twitter</a></li>
<li><a href="https://creativecommons.org/licenses/by-nc/3.0/deed.en_US">
CC BY-NC 3.0</a></li>
</ul>
</div>
<script>
(function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){
(i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o),
m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m)
})(window,document,'script','//www.google-analytics.com/analytics.js','ga');
ga('create', 'UA-46120535-1', 'hardmath123.github.io');
ga('require', 'displayfeatures');
ga('send', 'pageview');
</script>
</footer>
</body>
</html>