-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathREADME_ORIG
369 lines (241 loc) · 10.7 KB
/
README_ORIG
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
SubC Compiler
By Nils M Holm, 2011--2016
Placed in the public domain
SUMMARY
SubC is a compiler for a (mostly) strict and sane subset of
C as described in "The C Programming Language", 2nd Ed (also
known informally as "ANSI C" or "C89").
A previous version of the compiler is described in great detail
in the book "Practical Compiler Construction", which can be
purchased at Lulu.com. See http://www.t3x.org/reload/ for
ordering information.
The SubC compiler can compile itself. Unlike many other small C
compilers, it does not bend the rules, though. Its code passes
"gcc -Wall -pedantic" with little or no warnings (depending on
the gcc version used). Of course, you can also bootstrap it with
other C compilers, such as Clang or PCC.
SubC is fast and simple. Its output is statically linked (where
available) and typically small due to a non-bloated library). It
uses a simple optimizer on per-expression basis.
SUPPORTED SYSTEMS
SubC generates code for GAS, the GNU assembler (except for the
DOS version, which emits TASM-style syntax). It targets the
following processors and operating systems:
FreeBSD 386 armv6 x86-64
Linux 386 - x86-64
NetBSD 386 - x86-64
OpenBSD 386% - -
Windows/MinGW 386 - -
Darwin - - x86-64%(/)
DOS 8086!
% uses the syscall layer of the host libc
* untested
! experimental
(/) broken
Platforms tagged "untested" are not regularly tested by myself
and are therefore subject to potential bit rot. You can help
me improve SubC by running "make tests" on an "untested"
platform and let me know about the results.
Platforms using the system's libc as a thin system call layer
often cause build/stability problems due to the omnipresence of
the GNU libc, which is not "thin" at all. Expect trouble on
those systems!
Platforms tagged "broken" currently will not compile or run
properly for some reason. See the Todo file for details.
The DOS version brings its own toolchain, which can be found in
the s86/ directory, so no pre-existing DOS assembler or linker
is required to compile SubC programs on DOS.
Porting SubC to other 32-bit or 64-bit platforms should be
quite straight-forward. See the file "Porting" and/or the book
for a general road map.
CHANGES TO THE BOOK VERSION
Note: The book version runs on FreeBSD/386 exclusively.
The current version uses an improved code generator, which
emits much smaller and faster code than the book compiler.
The techniques are described in the book, though.
The current version of the SubC compiler adds support for
the following parts of C language to the version described
in "Practical Compiler Construction":
o &array is now valid syntax (you no longer have to write
&array[0]).
o the auto, register, and volatile keywords are recognized
(as no-ops). Yes, volatile is safe, because SubC does not
have register variables.
o enums may now be local.
o extern identifiers may now be declared locally.
o Prototypes may have the static storage class.
o There is support for structs and unions.
o jmp_buf is now a struct; setjmp() and longjmp() must be
called with &jmp_buf.
o FILEs are now structs and can no longer be mistaken for
ints by the type checker.
o The #error, #line, and #pragma commands have been added.
o There is a (non-standard) kprintf() function, which is
like fprintf(), but uses a file descriptor.
o There is now a (slightly incompatible) varargs mechanism.
Here is how it works:
#include <varargs.h>
void p(int a, int b, ...) {
int first;
void *ap;
ap = _va_start(&b);
first = (int) _va_arg(&ap);
vprintf("other args: %d %d %d\n", ap);
_va_end(&ap);
}
o The vprintf(), vfprintf(), and vsprintf() functions have
been added to the runtime library.
o A broader subset of C expression syntax is accepted
in constant expression contexts. For example, pointer
variables can be initialized with NULL.
DIFFERENCES BETWEEN SUBC (THIS VERSION) AND FULL C89
o The following keywords are not recognized:
const, double, float, goto, long, short, signed, typedef,
unsigned.
o There are only two primitive data types: the signed int and
the unsigned char; there are also void pointers, and there
is limited support for int(*)() (pointers to functions
of type int).
o No more than two levels of indirection are supported, and
arrays are limited to one dimension, i.e. valid declarators
are limited to x, x[], *x, *x[], **x (and (*x)()).
o K&R-style function declarations (with parameter declarations
between the parameter list and function body) are not
accepted.
o There are no ``const'' variables.
o There is no typedef.
o There are no unsigned integers, long integers, or signed
chars.
o Struct/union declarations must be separate from the
declarations of struct/union objects, i.e.
``struct p { int x, y; } q;'' will not work.
o Struct/union declarations must be global (struct and union
objects may be declared locally, though).
o There is no support for bit fields.
o Only ints, chars, and arrays of int and char can be
initialized in their declarations; pointers can be
initialized with 0 or NULL.
o Local arrays cannot have initializers.
o Local declarations are limited to the beginnings of function
bodies (they do not work in other compound statements).
o Arguments of prototypes must be named.
o There is no goto.
o There are no parameterized macros.
o The #if and #elif preprocessor commands are not recognized.
o The preprocessor does not accept multi-line commands.
o The preprocessor does not accept comments in (some) commands.
o The preprocessor does not recognize the # and ## operators.
o There may not be any blanks between the # that introduces
a preprocessor command and the subsequent command (e.g.:
"# define" would not be recognized as a valid command).
o The sizeof operator requires parentheses.
o Subscripting an integer with a pointer (e.g. 1["foo"]) is
not supported.
o Function pointers are limited to one single type, int(*)(),
and they have no argument types. Note that this declaration
will in fact generate a pointer to int(*)(void).
o There is no assert() due to the lack of parameterized macros.
o The atexit() mechanism is limited to one function (this may
even be covered by TCPL2).
o The signal() function returns int due to the lack of a more
sophisticated type system; the return value must be casted to
int(*)() manually.
o Most of the time-related functions are missing, in particular:
asctime(), gmtime(), localtime(), mktime(), and strftime().
o The clock() function is missing, because CLOCKS_PER_SEC
varies among systems.
o The ctime() function ignores the time zone.
o The varargs mechanism is slightly incompatible.
o The SubC compiler accepts // comments in addition to /* */.
SELECTING A TARGET PLATFORM
The easiest way to prepare a build is to run the configure
script in this directory. Don't worry, it is just a simple
script that will figure out the host platform via uname and
link a few machine-dependent files into place.
If you want to configure the compiler manually: select one of
the target descriptions (cg*.c) files in src/targets/cg and
symlink it to src/cg.c. Also link the corresponding header
file into place:
(cd src && ln -fs targets/cg/cg386.c cg.c)
(cd src && ln -fs targets/cg/cg386.h cg.h)
Next select the C startup (crt0) file for your OS and CPU type
from src/targets/OS-CPU/ and link it to src/lib/crt0.s, e.g.:
(cd src/lib && \
ln -fs ../targets/freebsd-386/crt0-freebsd-386.s \
crt0.s)
If your OS/CPU combination is not supported, you might try
to port the compiler. See the file "Porting" for details.
You will also need some operating system-dependent
definitions, which are kept in files named "sys-OS-CPU.h"
in src/targets/OS-CPU/. Just symlink the appropriate file
to src/sys.h:
(cd src && \
ln -fs targets/freebsd-386/sys-freebsd-386.h sys.h)
Finally, select a limits-*.h file from targets/include/ that
reflects the machine word size of your target and link it to
src/include/limits.h:
(cd src/include && \
ln -fs ../targets/include/limits-32.h limits.h)
COMPILING THE COMPILER
The compiler sources are contained in the "src" directory,
so all the subsequent steps assume that this is your current
working directory. (I.e. do a "cd src" now.)
On a supported system, just type "make".
Without "make" the compiler can be bootstrapped by running:
cc -o scc0 *.c
To compile and package the runtime library:
./scc0 -c lib/*.c
ar -rc lib/libscc.a lib/*.o
ranlib lib/libscc.a
To compile the startup module:
as -o lib/crt0.o lib/crt0.s
To test the compiler, either run "make test" or perform the
following steps:
./scc0 -o scc1 *.c
./scc1 -o scc *.c
cmp scc1 scc
There should not be any differences between the scc1 and scc
executables.
INSTALLING THE COMPILER
The easy way would be to set up the PREFIX (and optionally
SCCDIR and BINDIR) variables in src/Makefile to suit your
taste and then run
make dirs # to create the directories
make install
If you want to install the SubC compiler manually, you will
have to change the SCCDIR variable in the compiler itself.
It points to the base directory which will contain the SubC
headers and runtime library. SCCDIR defaults to ".", but can
be overridden on the command line:
./scc1 -o scc -D 'SCCDIR="INSTALLDIR"' *.c
(where INSTALLDIR is where the compiler will be installed.)
You can place the 'scc' executable wherever you want, as long
as its location is covered by the PATH environment variable.
The headers (include/*) go to INSTALLDIR/include, the library
'lib/libscc.a' and the startup module 'lib/crt0.o' go to
INSTALLDIR/lib.
To test the installation just re-compile the compiler:
rm scc && scc -o scc *.c
DOS SUPPORT
Please see the NOTES-DOS file!
WINDOWS SUPPORT
Please see the NOTES-WINDOWS file!
THANKS
To the Super Dimension Fortress (SDF.ORG) for providing
free shell accounts on 64-bit NetBSD machines.
To Bakul Shah for granting me remote access to a 64-bit
FreeBSD system and a Linux VM.
To "minux" for porting the runtime module to Linux/x86-64.
To Jean-Marc Lienher (cod5.org) for porting the runtime module
to MinGW Windows/386.
To Romain LWPB for porting the runtime module to OpenBSD/386
and Darwin/x86-64 as well as for modifying the x86-64 code
generator to emit proper code for Darwin.
To everybody who test-drove SubC and submitted bug reports.
To the Unknown Hacker for various minor and not so minor
contributions.
CONTACT
Send feedback, suggestions, etc to:
n m h @ t 3 x . o r g
See http://t3x.org/contact.html for current ways through my
spam filter.