-
Notifications
You must be signed in to change notification settings - Fork 41
/
Copy pathch02.txt
1431 lines (894 loc) · 68.6 KB
/
ch02.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
// kate: font Liberation Sans; font-size 16; syntax None; bom off; indent-mode none;
.bookmark hello-world
+ The Scalable C Language
In this chapter we'll look at the C style we use in Scalable C. Like all styles it's a mix of taste and pragmatism. I'll explain this using the problem-solution approach. This lets you critique our decisions, and improve on our answers.
++ Problem: blank page syndrome
C has few abstractions. It's a blank page language: you can write code in any shape and form. Yet this creates many problems. The worst problem is that every developer does it their own way. Every project is unique. Often, even inside a project there is little or no consistency.
The economics work against creating new projects. It is cheaper and easier to extend existing ones, as you can use the work already done. This is a Bad Thing. Creating new projects must cost nothing. This frees us to experiment, reshape, copy, and learn.
One reason git beat old Subversion hands down is that it erased the cost of creating a code repository. In the Old Times, creating a repository and setting it up for remote access was days of work. In my firm we could only afford one repository (we were poor if not humble). All projects sat inside that.
To fix blank page syndrome, we look at C projects and we realize, they could all look much the same. Sure, they all look different today. Yet that's just historical accident. With a little care and design we can model them all around the same template. Then, to create a new project, we just grab an empty template.
//Solution: use a standard project template.//
We already saw the basics for that:
* One project = one git source code repository. Is that obvious? It wasn't, a few years ago.
* Each project has a unique name. The name space is GitHub, though it can be a given language community. I doubt that Java developers care what names Perl projects use.
++ Problem: how do I explain my project to others?
You could hire a designer, and build a beautiful web site. Yet the essence of "scalable" is So Cheap It Costs You Nothing To Fail. Hand-crafted web sites aren't scalable.
GitHub to the rescue: stick a README in your project root, and it appears on your project's home screen.
//Solution: write a minimal README.//
You'll want to use {{README.md}}, which uses Markdown formatting. Your README has to explain at least:
* The goal of the project (or better, the broad problem it aims to solve).
* The license (under what terms people can use, distribute, and remix the code).
* The contribution policy (how people can contribute their patches).
And then if you like:
* A style guide (what the code should look like).
* How to use the project's tools and APIs.
++ Problem: my public project has no license
Many public projects on GitHub don't use a license. Don't follow their example. Without a license, others cannot use, distribute, or remix your code. It doesn't matter that you've published it. If your code has no license, only uninformed people use it, or send you patches. The failure to license code the right way can kill a project.
For reasons I'll explain in the dialectics, I recommend the Mozilla Public License version 2.0 (MPLv2) for public works.
//Solution: use the Mozilla Public License version 2.0.//
Copy [https://www.mozilla.org/en-US/MPL/2.0/ the whole license text] into a file called {{LICENSE}}. Put this into the root directory of your project. Then, add the following blurb to the header of every source file:
[[code]]
This Source Code Form is subject to the terms of the Mozilla Public
License, v. 2.0. If a copy of the MPL was not distributed with this
file, You can obtain one at http://mozilla.org/MPL/2.0/.
[[/code]]
Remember this lesson:
> "Most people do X" is not a recipe for success.
++ Problem: how do I manage copyrights?
I'll assume you are making public software, and you accepted my recommendation to use the MPLv2. We now come to the question of ownership. The copyright to any non-trivial work (thus, ownership of code) lies with its author, a person or business. By default, no-one can use or distribute the work without the owner's OK.
A license grants others the rights to use, remix, and distribute the work under certain conditions. It is like putting up a sign saying, "You may walk on my lawn if you don't damage it."
Asking contributors to give copyrights to a project is clumsy and ponderous. It is simpler that they license their contributions under the project license. This creates a collective work owned by many people, under a single license. If you use the MPLv2 and the GitHub fork and merge model, then patches are by default also licensed under MPLv2.
Thus, you can merge them without asking the contributor for a license grant, and without risk.
You do need to watch out for "unsafe" patches. This means, ones that change the project LICENSE or the blurb in any source, or which add sources with new blurbs.
//Solution: everyone keeps ownership of their own copyrights.//
A key side-effect to this arrangement is that it is //expensive// to change the license on an existing work with many owners. You need explicit permission from every contributor. Or, you must rewrite or remove their patches. This side-effect is often desirable, as it is a poison pill against hostile takeover.
++ Problem: how do I manage contributions?
You need a way to collect patches and merge them onto master. Some projects use email lists. Some projects have maintainers who pick patches, review them, merge the ones they like.
You need to avoid commits straight to master, as these are silent. It is more fun to have a ping-pong between the person who wrote a patch, and another human. This is a nominal maintainer.
My pattern for success is to get "pull requests" onto master, then to merge them as fast as possible. One can discuss them after merging.
//Solution: use pull requests and merge with haste.//
I'll explain the "with haste" part in the dialectics of this chapter. There are a few rules:
* You never merge your own pull requests. Every project needs at least two minds.
* It is better to make a new pull request with changes, than to discuss a commit. The former creates a team; the latter creates an argument.
* Continuous integration testing (CI) is a Good Idea yet it's not essential. Errors are an opportunity for others to get involved.
* The only good reason to refuse a change is, "the author is a bad actor and we banned them."
Remember this lesson:
> People are more important than code.
++ Problem: how do I keep a consistent code style?
It is painful to read code that has no style. A good project looks like it has a single author. Consistency is gold. Yet every contributor comes with their own habits.
One common answer is to clean up code using a code beautifier. This does create a consistent style. Yet that does more harm than good, in my experience. It turns out that "cannot respect project style" is key data for detecting bad actors. It's a specific case of their general disrespect for social norms and rules.
Thus it is better to document the project's style, and ask people to respect it. They won't, and so you can fix their patches and they should learn. If they don't, over time, you start to build a case for banning them.
//Solution: use a style guide document.//
You should be totalitarian about style. Every space and dot matters. Compare these two fragments of C:
```
int i;
for( i=0 ; i<10; i++ )
{
printf ("%d\n", i);
}
```
and
```
int counter;
for (counter = 0; counter < 10; counter++)
printf ("%d\n", counter);
```
Remember this lesson:
> Consistency matters.
I think there are some basic rules, such as using whitespace and punctuation as we do in English. Code should be compact as screen space is always precious, yet not cryptic. It makes no sense to use short variable names like 'i' and then put { on a line by itself. I'll come to the specifics of a Good Style for C as we continue.
++ Problem: where do I put my sources?
Finally, a non-contentious problem.
//Solution: put headers into include, and sources into src.//
If we have private headers (that only sources in this project use), place them in {{src}} as well. This way, {{include}} contains our public API.
++ Problem: how do I organize my code?
Even a C application (a command-line tool, perhaps) needs some internal structure. Some tools exist as massive single C files. It's not a good way to work. It is far better to build up libraries, which the final application uses.
For example, I've written a messaging broker called Malamute. It's a C application. Here is the command line {{malamute.c}} tool (stripped down to show the essence):
[[code]]
#include <malamute.h>
int main (void)
{
...
zactor_t *server = zactor_new (mlm_server, "Malamute");
...
zactor_destroy (&server);
return 0;
}
[[/code]]
All the actual server code is in a class called {{mlm_server}}. The command line tool parses arguments, mucks about with configs, then starts the server. It runs until interrupted, then destroys the server (ending it).
This is a clean and powerful way to write services and other code. In fact, all C code except the thin user interface.
//Solution: organize your code into classes.//
Remember this lesson:
> Everything is a class. You can definitely make singleton methods (which do not work on a specific instance).
By freaky coincidence, we called [http://rfc.zeromq.org/spec:21 the style guide for Scalable C] "CLASS." What can I say... acronyms came back into fashion around 2001.
++ Problem: what compilers can I rely on?
In general, every C compiler worth using will support the C99 standard. We use two specific C99 features a lot: in-line declarations and in-line comments.
So we can write this:
[[code]]
// Declare and initialize list in one step
zlist_t *list = zlist_new ();
[[/code]]
Instead of the old C89 style:
[[code]]
/* All declarations at start of function */
zlist_t *list;
...
/* Code starts after all declarations */
list = zlist_new ();
[[/code]]
On Windows, Microsoft never got around to upgrading their C compiler to C99, so we have to use the misnamed "Visual" C++. Luckily C++ is almost a pure superset of C99. (Some unkind folks say that the C99 committee stole the few bits of C++ that weren't utter mind rotting garbage. That seems unfair. "Stole" is such a harsh word.)
//Solution: use C99 on real operating systems, and C++ on Windows.//
And further, only use C99 syntax that is a pure subset of C++. Otherwise, no portability. It is rather useful to be able to use C++ compilers to build your projects.
Remember this lesson:
> Don't use C++ keywords like {{class}} as variables.
++ Problem: how do I name my source files?
Let me ask you a question. Imagine I show you this code:
```
zactor_t *server = zactor_new (mlm_server, "Malamute");
```
Better still, don't imagine it, since I just showed you the code. Twice, since you weren't paying attention the first time. Where would you expect to find the method called {{zactor_new}}?
The best solutions to problems are the most obvious ones, if they work. This takes out the guesswork. The most obvious place to find this method is in a file called {{src/zactor.c}}. It would be bizarre to put every method into its own source file. It would be silly to put more than one public class into one source file. (While it is obvious to put private classes into the source file that uses them.)
//Use the class name as the source file name.//
So for a class called {{zactor}} we want {{src/zactor.c}} with the code, and {{include/zactor.h}} with the public API. That is, function prototypes, typedefs, and constants.
Remember this lesson:
> Be fanatic about consistency. Your users will love you for surprising them in nice ways only.
++ Problem: I need to name my classes
Naming is like all hard problems: break it down, and it becomes easy. As often, look for the obvious and most usable answers rather than the "best" or "most consistent" answers.
A "best" name for a human is a 12-digit number that encodes their date of birth and acts like a global roaming phone number. Yet it is neither obvious nor usable. A person needs a unique name within their close family (a "personal name"). Then, a family name that identifies them to strangers (a "family name"). Then, decoration to make their name unique (middle initials, titles). Then, short names for their social networks (GitHub login).
When choosing a name, the more often we use a name, the shorter it should be. This is why we like short personal names, and tolerate long family names. The other way around is surprising to us.
A class needs a unique name within their library. Try to find a single word that expresses what the class does. It then needs a family name that identifies it to strangers. We use this family name most often of all, so it must be even shorter than the class name.
//Solution: use a unique prefix for classes in a project.//
You do not need //global// uniqueness. Somewhere out there, people may be writing C code with the same class names. That is fine so long as your prospective users aren't pulling in both libraries.
The prefix I used for CZMQ was "z" since this started life as a ZeroMQ wrapper, and I wanted the shortest possible prefix. For Zyre I chose "zyre" since that is short, and unique, and clear. For Malamute I chose "mlm" since "malamute" felt too long.
I'll use "myp" as the prefix, in example code that follows. We usually use an underscore between the prefix and the rest of the name.
Remember this lesson:
> Use simple English words for class names, then prefix them with the project prefix.
++ Problem: how do we invoke class methods?
C has no support for classes. So we have to invent this. People have tried various approaches. One way is to create an object that contains pointers to functions. You might hope to invoke methods like this:
```
myobject->method (arguments)
```
Except the method still needs the object to work with, so it looks like this:
```
myobject->method (myobject, arguments)
```
In theory you could get rid of the myobject argument. You'd need to create a structure that holds the object reference together with each method pointer. If we were generating code, this is how I might do it. Yet we want a design that fits our hand, and which is simple and obvious. Code generation often adds too much of its own complexity.
//Solution: construct a full method name out of project prefix, class, and method.//
So we get:
```
myp_myclass_mymethod (myobject)
```
From experience, people get this style at once, and it works. It is a little more to type. Yet it has the advantage that construction, destruction, and methods all have a consistent style. Take a look at this fragment, without comments or explanation:
[[code]]
mlm_client_t *writer = mlm_client_new ();
mlm_client_set_plain_auth (writer, "writer", "secret");
mlm_client_connect (writer, "tcp://127.0.0.1:9999", 1000, "writer");
mlm_client_set_producer (writer, "weather");
mlm_client_sendx (writer, "temp.moscow", "10", NULL);
mlm_client_sendx (writer, "temp.london", "15", NULL);
mlm_client_sendx (writer, "temp.madrid", "32", NULL);
mlm_client_destroy (&writer);
[[/code]]
Remember this lesson:
> The eye likes patterns in columns. Use this to your advantage.
++ Problem: how do we isolate our objects?
The natural way to represent a random constructed "thing" in C is a structure. You can, as POSIX often does, make these structures public, and document them. The problem with this is that it creates a complex and fragile contract. What happens if the caller modifies a field? How do you extend and evolve the structure over time?
//Solution: use an opaque structure, and getter-setter methods.//
C lets us make "opaque structures" which callers know nothing about except their name. In the public header file {{include/myp_myclass.h}}, we write:
```
typedef struct _myp_myclass_t myp_myclass_t;
```
In the class source file {{src/myp_myclass.c}} we define the structure and provide methods to work with it:
```
struct _myp_myclass_t {
...
char *myprop;
...
};
// Get myprop property. Note that it's defined as 'const' so
// the caller cannot modify it.
const char *
myp_myclass_myprop (myp_myclass_t *self)
{
assert (self);
return self->myprop;
}
// Set myprop property
void
myp_myclass_set_myprop (myp_myclass_t *self, const char *myprop)
{
assert (self);
free (self->myprop);
self->myprop = strdup (myprop);
}
```
++ Problem: how do we manage memory?
C has no garbage collection, and it's not something you can add into a language. Yet allowing random blocks of memory and strings to float around your code is fragile. It leads to fuzzy internal contracts, memory leaks, bugs.
After much experimentation, we learned how to hide almost all memory management inside classes. That is:
* Every class has a constructor and a destructor.
* The constructor allocates the object instance.
* Further methods can allocate properties and object structures (lists, and such).
* When you call the destructor, it frees all memory that the class allocated.
The caller never sees this work, it hides inside the class. This means we can change it as we like, so long as we don't change the methods (the class API).
//Solution: hide all allocations inside the class.//
Remember this lesson:
> The power of abstraction comes from hiding irrelevant details.
++ Problem: how do we return freshly-allocated data?
Here is a method that returns a fresh buffer holding some content:
[[code]]
byte *
myp_myclass_content (size_t *content_size)
{
...
*content_size = ...
byte *content = malloc (*content_size);
...
return content;
}
[[/code]]
The author wants to return a buffer, yet also needs to return the buffer size. So, they add an argument which is a pointer to a returned content_size.
When you call this method, it's not immediately obvious what it's doing:
```
size_t content_size;
byte *content = myclass_content (&content_size);
...
free (content);
```
If we're designing from the user's perspective (always a better idea), we'd want to get a buffer object that we could destroy. We don't need to invent a buffer type, since CZMQ gives us a zchunk class. So, we can write:
```
zchunk_t *content = myclass_content ();
...
zchunk_destroy (&content);
```
Which is rather cleaner. It is also fully abstract. Perhaps zchunk consists just of a size and data. As it turns out, it has other, useful properties. Such as, the ability to resize chunks and append data to them.
//Solution: return objects, not blocks of memory.//
The only exception that works is strings, which are a native C object. It is safe to return a fresh string and tell the caller to free it when done. Inventing a more abstract string type is fun, yet it breaks the standard C library. I don't recommend doing it.
Remember this lesson:
> A method should return a single value, or nothing at all.
++ Problem: how do we pass the object to methods?
Not all methods work on objects. Some are "singletons" which just means "not a class method but that other kind of thing we used to call a 'function' and now call 'singletons'."
Apart from singletons, all methods take an object reference. This is a pointer. It is the thing that constructors (the _new method) return. As objects are abstract and hidden inside their classes, we work with them only via methods. There are exceptions -- private classes -- that I'll explain later.
In C there is no real convention for the order of arguments. The standard C library often puts destination arguments first. This perhaps comes from right-to-left assignment. That in turn is a hangover from assembler. {{MOV X, Y.}} A good designer aims to make the order obvious, unsurprising. Yet that can lead to inconsistency. What's the obvious order for "plot X,Y on map M?" Is it {{mylib_plot (x, y, map)}}?
The obvious rule when we imitate objects is to pass the object reference as first argument. So we'd say {{mymap_plot (map, x, y)}}.
//Solution: pass the object reference as first argument to methods.//
Remember this lesson:
> Don't surprise your future self.
++ Problem: what do we call the object reference, in a method?
//Solution: use 'self' inside methods to refer to the object reference.//
Remember this lesson:
> Don't use C++ keywords like {{this}} as we need to be nice to C++ compilers.
++ Problem: how does a constructor work?
A constructor must allocate the memory for an object, and then initialize it. This is easy to do once you've learned a few subtle and non-obvious rules:
* Try to keep constructors simple, and only pass arguments if it is a natural part of the constructor.
* Use the zmalloc macro to allocate and nullify memory. It means you don't need to initialize individual properties. This is like calloc with some extra wrapping. Take a look at {{czmq_prelude.h}} if you want to know more.
* Aim to initialize all properties to null/zero/false/empty by default. This means choosing names with care. For example if you have an active yes/no property, and the object starts active, then use "disabled" instead of "active" as property name.
* If your object contains large blocks of memory, do not use zmalloc as it takes more time. Instead, use malloc and then initialize properties one by one.
* If memory allocation fails, in general, give up with an assertion. In specific cases you can hope to catch and deal with the error. Most often you can't. Too little memory is a configuration error in most cases.
//Solution: use the standard constructor style.//
So let's look at a the standard constructor style:
[[code]]
struct _myp_myclass_t {
char *myprop;
zlist_t *children;
};
myp_myclass_t *
myp_myclass_new (void)
{
myp_myclass_t *self = (myp_myclass_t *) zmalloc (sizeof (myp_myclass_t));
assert (self);
self->zlist = zlist_new ();
return self;
}
[[/code]]
Note how the code does a cast from zmalloc. We need this on Windows to keep the C++ compiler happy.
++ Problem: how does a destructor work?
A destructor does the opposite of the constructor. That's a comfortable statement, isn't it.
Yet it's not obvious. The biggest gotcha with destructors in C is how to make them idempotent. It is something the standard C library got wrong. Let me show you:
```
byte *buffer = malloc (100);
free (buffer);
...
free (buffer);
```
//Wham!// You have corrupted the heap. What happens next is anyone's guess. The standard advice is to add {{buffer = NULL;}} after the free. Yet if a developer is weak enough to lose track of their pointers, will they remember to nullify them? No, they won't.
We need a style that removes the guess work. It's easy and it works well. My team invented this (as far as I know, in 2006. It was part of another [http://www.openamq.org/doc:tech-icl object oriented C language] as a platform for OpenAMQ:
```
safe_free (&buffer);
```
//Solution: pass a pointer to the object reference, so the destructor can nullify it.//
This gives us the following destructor template:
[[code]]
void
myp_myclass_destroy (myp_myclass_t **self_p)
{
assert (self_p);
if (*self_p) {
myp_myclass_t *self = *self_p;
zlist_destroy (&self->children);
free (self);
*self_p = NULL;
}
}
[[/code]]
Remember this lesson:
> If you see '&' before an argument, that means "destructive"
The normal use for '&' is to return values by reference. That is a bad idea in most cases, as I'll explain later.
++ Problem: how do we deal with exceptions?
Speaking of exhaustion, let's discuss what we do when things don't work as planned. Classic C error handling assumes we're tired/dumb enough to make silly requests, yet smart enough to handle complex responses. I've used plenty of systems that returned dozens of different error codes. It becomes a leaky and fuzzy contract.
The theory that rich exception handling makes the world a better place is widespread. It's a bogus theory, in my experience. Simplicity is always better than complexity.
To get to specific answers, we must untangle the different kinds of failure in software. We can then deal with them one-by-one.
//Solution: use simple, foolproof exception handling.//
Let's break down the kinds of exceptions we tend to hit, and solve each one in the simplest way.
+++ Problem: nothing to report
In a real time system, "nothing" is such a common case that it's not exceptional. The simplest solution is to return "nothing" to the caller. If there are different kinds of "nothing" that we must distinguish, turn these into meaningful pieces of the API.
While you may feel compelled to tell the caller //why// nothing happened ("timeout error!"), this is like talking to strangers about your private life. It's what you don't say that lets people respect you.
//Solution: return NULL or zero.//
Examples:
* Return next item on list, or NULL if there are no more.
* Return next message received, or NULL if there is none.
* Return number of network interfaces, or zero if there is no networking.
When you do this well, your API fits like a soft glove. For instance, imagine these two methods for iterating through the users in a group:
```
myp_user_t *myp_group_first (myp_group_t *group);
myp_user_t *myp_group_next (myp_group_t *group);
```
Here is how I print the names of each user in a group:
[[code]]
myp_user_t *user = myp_group_first (group);
while (user) {
printf ("%s\n", myp_user_name (user));
user = myp_group_next (group);
}
[[/code]]
Which is tidy, safe and hard to get wrong.
Remember this lesson:
> Design your API so that it's a pleasure to use.
+++ Problem: caller passed us garbage
Library authors (as we strive to be, when we write C) get this a lot. Things crash with weird errors. It's always our fault. We hunt and dig, and finally we discover the cause. The calling code, our dear users, passed us garbage. We didn't check it, and our own state got corrupted.
Even the standard C libraries have this problem. What does code do, if you call {{free ()}} twice on the same pointer? The results are not defined. It may do nothing. It may crash immediately. It may run a while, then start to do strange stuff.
Passing garbage to library functions is a common mistake, especially with beginners. There are three things you should aim to do, as library author:
* Design your APIs to remove the potential for obvious mistakes.
* Be cynical about what people give you, and use techniques to detect mistakes.
* When you detect a mistake in your calling code, assert immediately and without pity.
//Solution: detect garbage, then fail fast.//
I've explained our destructor pattern, and how we nullify the caller's reference. This fixes the common mistake of trying to work with a destroyed object. Code can still do that, and it will pass NULL to a method.
It is trivial and costs nothing to check for NULL, so you will see this in all well-written methods:
[[code]]
void *
myp_myclass_mymethod (myp_myclass_t *self)
{
assert (self);
...
}
[[/code]]
Since we use strong types, it is hard to pass random data to a method. One must do extra work like adding a cast. That excludes innocent mistakes.
Why assert, instead of returning an error code? There are a few good reasons:
* If a developer is making such mistakes, they won't be capable of handling errors.
* If the code is faulty, it is irresponsible to continue running it. Bad Things can happen.
* The fastest way to fix the problem is to assert and tell the developer exactly when it broke.
An assert that creates a core dump and call stack gives a developer the means to fix common mistakes.
Remember this lesson:
> Developers make mistakes. You cannot expect perfection. Asserts are a good teacher.
+++ Problem: the outside world passed us garbage
We assert when calling code makes mistakes so that production code //should always work//. Do not assert when the outside world gets it wrong.
Here's an example to illustrate. We're writing a HTTP server. It has a routine to parse a HTTP request and return us all the values in a neat hash table. Now, the outside world (arbitrary browsers) can and will often send us garbage. Our parsing routing //must never crash//. Rather, it should treat garbage recognition as its main job.
If [https://xkcd.com/327/ little Bobby Tables] taught us anything, it is that all data received from the outside world is toxic garbage until proven otherwise. Any fool can write a parser for correct input. The real art in parser writing is to deal with garbage.
//Solution: treat garbage as the problem to solve.//
To deal with garbage input depends on how well you know the culprit:
* When you get garbage from total strangers on the Internet, you discard it.
* When you get garbage from your dear users, you try to tell them what they did wrong. Then you discard it.
So in the second case we return an "invalid" response to the caller, and provide the details via some other means. Here is how I'd design this for a HTTP parser:
[[code]]
// http_client_t holds a connection to a remote web browser
// client is an instance of that class
http_request_t *request = http_client_parse (client);
if (request) {
... start to process the request
}
else {
zsys_debug ("invalid HTTP request from %s: %s",
http_client_address (client),
http_client_parse_error (client));
http_client_destroy (&client);
}
[[/code]]
Remember this lesson:
> Some garbage is malicious, and some is just ignorant.
+++ Problem: bad input caused my code to crash
The security industry calls such vulnerabilities "lunch." Don't feed the security industry.
//Solution: be paranoid about foreign data.//
There are a few basic rules to observe:
* Always treat compiler warnings as fatal. Modern C compilers do a good job of telling you if your code looks like it is doing stupid things. Listen to the compiler.
* Don't assign void pointers to typed pointers without a cast. Dereferencing the wrong pointer type will cause trouble. The cast is optional in C99, yet it forces you to double-check your code. C++ (as on Windows) insists on the cast.
* Do compile your code on different platforms, often. Different compilers catch different mistakes.
* Always use {{return}} in non-void functions (and never do this in void functions).
* Never use a variable as a format string in {{printf}}-style calls. It invites disaster. A good compiler will complain if you try to do this.
* When you read input from the network, assume the sender is a malicious psychopath. If the input is too long, chop it and throw away the excess.
* Learn which system calls are unsafe. Like {{gets ()}} for example. Again, good compilers will warn you. Use 'man' to learn about library calls.
+++ Problem: our own state is garbage
As well as checking for caller mistakes, we use asserts to check internal consistency. After all, we also make errors in our code, at a constant rate. These often show up as data with impossible values.
//Solution: use asserts to catch impossible conditions.//
Some people may complain that a library filled with {{assert}} statements is untrustworthy. Ignore such people. They are poor contributors, and worse clients. The truth is that a C library which does not use assertions to self-check is unreliable.
Remember this lesson:
> The faster you fail, the faster you can recover.
When you use assertions, do no work in an assertion (a so-called "side-effect"). Naive users looking for a cheap yet meaningless kick may remove assertions. Any side-effects also disappear. This is an example of what //not// to do:
[[code]]
// This is unsafe as whole assert () may disappear
// if the user is foolish
assert (myp_myclass_dowork (thing) != -1);
[[/code]]
+++ Problem: a library misbehaved
A working piece of code can stop working for the stupidest reasons. One classic cause is when a sub-library changes its behavior. ZeroMQ used to be guilty of this until we banned such changes. (Changing a version number doesn't help applications that break.)
The user can't do much except complain and report an error message to the developers. Then the wailing and gnashing of teeth begins. After a while, maybe, there is a new release that works again.
//Solution: if components don't behave as documented, assert.//
Remember this lesson:
> Make sure you blame the library in question, in any error message.
+++ Problem: system ran out of resources
This is I think the hardest problem to handle. Most developers are not aware of the specific limits of every operating system. On OS/X there is a default limit of 255 sockets per process. A busy server will soon run out.
In theory a server can adapt its behavior to the capabilities of the system. Yet in practice that is close to impossible. Even if your code handles "out of memory" failures, modern systems use virtual memory. Long before {{malloc}} calls start to fail, your program is thrashing in and out of swap.
Trying to recover from resource exhaustion makes code more complex. That makes it more fragile, and more likely to have hidden errors. This is not a good path towards stable, long-running code.
//Solution: if you do run out of memory, assert.//
There are several winning strategies to deal with resource exhaustion:
* Print a helpful error message, then assert. This forces someone to re-tune the system.
* Preallocate all resources (sockets, memory, threads) in a pool, then work only from that pool.
* Use deliberate strategies to reduce resource consumption, such as bounded queues.
Remember this lesson:
> When your system runs above 50% capacity, it is already overloaded. Always aim for under 50% use of disk, memory, CPU, and network.
++ Problem: we need consistent return values
I've already argued against returning values via parameters. In C, functions return one thing. Here are the rules that work best, in my experience:
* Return nothing.
* Return success/failure as {{int}}, with values zero and -1.
* Return yes/no as {{bool}}, with values true and false (works best if the method takes the form of a question).
* Return a fresh string to the caller as {{char *}}; caller owns and must free such strings.
* Return a constant string to the caller as {{const char *}}; the caller may not change or free these.
* Return a ordinal value (positions, quantities, indexes) as {{size_t}}.
* Return an object property (works best if the method has the name of the property).
* Return other integer values using the least surprising type.
* Return a composed value (list, hash, array, buffer) as a fresh object instance. Try to avoid returning composed values that the user may //not// change, as this is asking for trouble.
Remember this lesson:
> Design your APIs by using them. Be intolerant when an API is irritating.
++ Problem: how do I export my APIs?
After lots of writing, compiling, testing, cursing, and repeating, you get two things. One, a "library file" that contains your precious "object code," which is the compiled version of your source code. These terms were invented by mad scientists at IBM in the 1970s.
Libraries come in two flavors: static {{libmyp.a}} and dynamic {{libmyp.so}} on Linux. If you are curious, use the {{file}} command to ask Linux what any given file is. Here's the kind of fun you can have with {{file}}:
```
$ file /usr/local/lib/libmyp.la
/usr/local/lib/libmyp.la: libtool library file,
$ file /usr/local/lib/libmyp.a
/usr/local/lib/libmyp.a: current ar archive
$ file /usr/local/lib/libmyp.so
/usr/local/lib/libmyp.so: symbolic link to `libmyp.so.0.0.1'
$ file /usr/local/lib/libmyp.so.0.0.1
/usr/local/lib/libmyp.so.0.0.1: ELF 64-bit LSB
shared object, x86-64, version 1 (SYSV),
dynamically linked, BuildID[sha1]=007...
not stripped
```
I'll explain in [#zproject] how we build and install these. Don't stress, it's simpler than you might think. (Hint: magic.)
As well as these library files, your users need header files to define prototypes for all the methods you export.
//Solution: export your API as a single public header file.//
In practice we use one main header file plus one header file per class. Take a look at {{/usr/local/include}} and you'll see what I mean. If this mass of header files distresses you, take a pill. There is no cost. In older projects we used to generate single project header files with all classes included inline. That turns out to be more work than it's worth.
The project header file goes into {{include/myproject.h}}. The library files will be {{libmyp.something}}.
Your project may also produce command line tools (aka "binaries" or "mains"). You may want to install some of these too.
Remember this lesson:
> Give your users a single header file that does everything.
This means, for instance, including all dependent header files. It's just polite.
++ Problem: how do I version my API?
This is one of the harder problems to solve, and people have been gleefully solving it badly for a long time.
Look at the Smart Peoples' Choice for Versioning, aka [http://semver.org/ Semantic Versioning]. It starts by saying, "increment the major version when you make incompatible API changes." Yay, breaking user space is legal, yay!
This teaches us an important lesson about the stupidity of smart people. Breaking user space is not OK. It doesn't matter what numbers you stick on things. Yes, vendors do this all the time. No, it's still not OK.
There are several difficulties in versioning an API:
* Different pieces of the API evolve at different speeds. Some are stable while others are experimental. So, sticking a single version number on the API is like giving a family of thirteen children a single first name. It's so simple, yet so wrong.
* Software versions are often a marketing tool. People like to see general progress. So, smart projects make new releases to create buzz. It is a valid problem: no buzz, no users. Yet it has nothing to do with API versions.
* Shareable libraries, under Linux, get named with an "ABI version" which has nothing to do with the software version. Ah, and sometimes the library version is just one digit. And sometimes it is three digits. It depends on what distribution you use.
The science of API versioning has a way to go. [http://hintjens.com/blog:85 I've proposed] that we version individual methods and classes using a "software bill of materials." As you'll learn later, we're developing the tools for this.
For today, the best solution we've found is to not break APIs that people depend on.
//Solution: don't break user space.//
If you do need to change stable APIs, do it by adding new classes and methods, and deprecating the old ones.
This means a new version of your library is always backwards compatible with older ones. At least where it matters. Then, the actual numbers you use become secondary.
Remember this lesson:
> Versioning is an unsolved mess.
++ Problem: I need to define my software version somewhere
Ignoring the ABI version (as far as we can) makes life simpler. The ABI/API problem [https://github.com/zeromq/zproject/issues/409 will come back to bite us again]. One thing at a time though. It's our software version that people care most about. We need a way to stamp this into the code.
//Solution: define the version in your public header file.//
Here is our standard way of doing this:
[[code]]
// MYPROJ version macros for compile-time API detection
#define MYPROJ_VERSION_MAJOR 1
#define MYPROJ_VERSION_MINOR 0
#define MYPROJ_VERSION_PATCH 0
#define MYPROJ_MAKE_VERSION(major, minor, patch) \
((major) * 10000 + (minor) * 100 + (patch))
#define MYPROJ_VERSION \
MYPROJ_MAKE_VERSION(MYPROJ_VERSION_MAJOR, \
MYPROJ_VERSION_MINOR, \
MYPROJ_VERSION_PATCH)
[[/code]]
Once we've defined it like this, we can extract the version number in build scripts, and use it in the API.
Remember this lesson:
> Put the version number in a single place only, or you will make mistakes as you change it.
++ Problem: my users demand documentation
As they should. Documentation makes or breaks a project. We all know this: shitty docs means shitty code. Look at the code someone writes, and you get an instant "like" or "dislike" emotion. Pay attention to this emotion! It will save you from pain, if you listen to it.
People have tried to automate API documentation using tools like doxygen. The results [http://www.alsa-project.org/alsa-doc/alsa-lib/ tend to be mediocre]. Look at [http://api.zeromq.org/czmq3-0:_start CZMQ's documentation]. It's far simpler and yet at once familiar.
As I keep saying, when we write C, we build APIs. That means we talk to other programmers. The most accurate language for explaining a C API is more C. Period.
When we reach for documentation we are looking for something specific. The documentation must give us the fastest path to this answer. No waffle or preamble.
In an ideal world, the answers lie in the source code. Reading source code is not a failure of documentation. It is a success of style. This chapter is all about structure and readability. The goal is to produce source code that people can enjoy reading for profit.
Code is language, and the classes and methods we write are a form of literature. I'm not being poetic. This is key to writing systems that survive over the long term.
//Solution: focus on code quality, and extract key pieces as documentation.//
The key pieces we need are:
* The public API for a class and method. This must show the prototype, plus a few lines of explanation. It does not need to be pretty in the "ooh sans-serif and pastels!" sense. In fact, if it looks like C code it's easier to read and understand.
* Examples of using the API. These must be simple, reusable, and clean. Also, they must work. That means, they must be part of the project, built and tested with classes.
External examples are also great, especially if you want to build larger teaching projects. I've done a lot of this. Yet it comes second to API man pages. People need to learn one step at a time.
Remember this lesson:
> The best way to teach code is to show code.
++ Problem: how do I test my API?
When someone says "trust me, I've tested it," your natural reaction should be cynical. So tests that are part of a project are only good up to a point. Any smart user builds their own tests.
Yet we need to know if a patch broke something. When we work in groups, this translates to "I trust your patch so long as it didn't break our test cases." In the ZeroMQ core library we turned this around to encourage people to write test cases. "If you write a test case for method X, there's less chance someone will break it in the future."
When working with others, test cases are a form of insurance. They also teach users how to use the API. More users means extra lives. The more thousands of people use a piece of code, the better its chances of survival.
//Solution: every class has a test method.//
We can then call the test methods when we do "make check" and in continuous integration testing. This turns out to be a good place to stick our example code too.
The test method needs no error handling. If any given test fails, it asserts. This kills the crab and makes sure someone steps up to fix things. Or not, if no-one cares. Both are valid scenarios.
Remember this lesson:
> When writing a test method, you are teaching others how to use the API. Make it readable.
++ Problem: how do I actually produce the docs?
This rule applies to generated documentation: garbage in, garbage out. We still want to generate the docs, for several reasons:
* It is the safest and fastest way to produce accurate docs.
* It lets us produce many targets from the same inputs.
* It encourages a literate coding style.
* It exposes poor code, so we can fix or remove it.
In technical terms:
* We scan the class sources and headers for specific sections of code and text.
* We merge these with templates to produce text files in various formats.
* We call external tools like {{asciidoc}} to convert these into further formats.
* We publish the results on-line, or in our git repository, or as man pages.
We use a tool [https://github.com/zeromq/gitdown called gitdown] to do all this. It also produces a detailed {{README.md}} file with class and method documentation. Install that tool, you will appreciate it, and we'll depend on it later.
I need to explain how to tag your sources to tell {{gitdown}} what is what. Each tag sits on a line by itself, with or without a comment:
* In the class header, mark the public API with {{@interface}}, ending with {{@end}}.
* In your class source, explain the class using {{@header}} to mark a summary, {{@discuss}} for details, and {{@end}} to finish.
* In the test method, mark example code with {{@selftest}} and {{@end}}.
Take a look at any CZMQ source or header to see what I mean. It looks like this (from {{zuuid.h}}):
[[code]]
// @interface
// Create a new UUID object.
CZMQ_EXPORT zuuid_t *
zuuid_new (void);
// Create UUID object from supplied 16-byte value.
CZMQ_EXPORT zuuid_t *
zuuid_new_from (const byte *source);
...
// Self test of this class.
CZMQ_EXPORT void
zuuid_test (bool verbose);
// @end
[[/code]]
And this (from {{zuuid.c}}):
[[code]]
@header
The zuuid class generates universally-unique IDs (UUIDs) and provides
methods for working with them. A UUID is a 16-byte blob, which we print
as 32 hex chars.
@discuss
If you build CZMQ with libuuid, on Unix/Linux, it will use that
library. On Windows it will use UuidCreate(). Otherwise it will use a
random number generator to produce convincing imitations of UUIDs.
Android has no uuid library so we always use random numbers on that
platform.
@end
[[/code]]
And later,
[[code]]
// @selftest
// Simple create/destroy test
assert (ZUUID_LEN == 16);
assert (ZUUID_STR_LEN == 32);
zuuid_t *uuid = zuuid_new ();
assert (uuid);
assert (zuuid_size (uuid) == ZUUID_LEN);
assert (strlen (zuuid_str (uuid)) == ZUUID_STR_LEN);
zuuid_t *copy = zuuid_dup (uuid);
assert (streq (zuuid_str (uuid), zuuid_str (copy)));
...
zuuid_destroy (&uuid);
// @end
[[/code]]
Remember this lesson:
> Literate code is good code. This means, write the code as if you are documenting it.
++ Problem: I need private classes
Any realistic project needs private classes. Not every API is worth exporting, or desirable to export. There are two main cases we need to cover:
* Classes shared by other classes in the project, yet deemed too "internal" to offer to users.
* Classes used in a single source file only.
In both cases, keeping the class private lets us change it as we like.
+++ Problem: my library has private classes
A private class can follow almost the same style as a public class, except:
* Its header file should be in {{src}} and not in {{include}}.
* The project header file won't include it.
So we need a second include file in {{src}} that includes all private class headers.
//Solution: use two project headers, one public and one private.//
In CZMQ we call these {{include/czmq_library.h}} and {{src/czmq_classes.h}}. The project source files use the private project header. Calling applications use the public project header.
Remember this lesson:
> Your exported API is in {{include}}. All other sources go into {{src}}.
+++ Problem: my source file has private classes
When we start to manage data structures, we often need classes to hold individual pieces. It is simplest to write these in the source file. We can get away with less abstraction, and less work.
We define a private class as a structure: