-
Notifications
You must be signed in to change notification settings - Fork 381
VIP 19: Declare ENUM types in VCC for VMODs
The final solution ended up slightly different than proposed, and this seems a good place to explain why.
The C-language enums are not a type, they are merely labels stuck to particular numerical values, and just about the only advantage they have over #defines is that you don't have to pay attention to the values as a programmer, but can trust the compiler to pick distinct values.
That is of course a tangible benefit, but one almost always needs a enum_foo_2_string()
function for reporting, which need to enumerate all the values, and kept in sync with the enum's definition, and while the #include "tbl/bla" trick can help automate that, the economy of using C enums quickly become fata morgana.
I have increasingly started to use const char *enum_foo = "foo";
like values instead of enums, because they have the exact same performance, they don't mingle unnoticed with integers and they are self documenting.
VCL ENUMs were created halfway along this path, and therefore they were defined as "const char *" pointing to a string of their label, which a strcmp(3) could compare on.
What was done in c91a6cedf2dfffc84344554a206fc75f57ee6177 was to make sure that when ever a particular enum value is given to a VMOD function it will always be the same pointer, a pointer which is also available to the VMOD code in the vcc_if.h/c
files, thereby allowing a pointer comparison to replace the strcmp(3) call.
The downside of the const char *
pattern is that it cannot be used to index arrays.
The truly perfect form of an enum would be a struct enum_bla { int number; const char *name; }
type of construct, which would could do both functions, but this would require an ISO-C ephipany and take 10-20 years.
If the canonical pointers lived in an array, an integer can be produced by subtracting the base of the array from the value in question, but for now this idea is shelved here until we really need it.
With respect to the bogus default values, that was simply a bug, and it is fixed now.
Extend VCC to support declarations for ENUM types, something like this:
$Module foo 3 foos and bars in VCL
$Enum BAR { BAZ, QUUX }
The vmodtool would turn this into a C enum declaration in vcc_if.h
:
enum vmod_foo_BAR_e {
BAZ,
QUUX
};
Reminiscent of the way VMOD objects are translated to C struct types that are named with the vmod_foo_
prefix.
Probably even better: initialize the first enum to 0, and include a __MAX_
enum that is only visible in the C code, which can be used for array dimensions:
enum vmod_foo_BAR_e {
BAZ = 0,
QUUX,
__MAX_VMOD_FOO_BAR_E,
};
The ENUM type can then be used in the VMOD as a data type in function and method declarations:
$Function VOID do_foo(BAR)
$Function VOID do_baz_by_default(BAR bar=BAZ)
This would translate in the C interface to:
VCL_VOID vmod_foo_do_foo(VRT_CTX, enum vmod_foo_BAR_e);
VCL_VOID vmod_foo_do_baz_by_default(VRT_CTX, enum vmod_foo_BAR_e);
This would probably have to go along with removing the current VCL_ENUM
type, making it a breaking change. So it would be a major release change for VRT and Varnish, and would require VMODs currently using ENUMs to be changed.
The VCL_ENUM type has proven to be a useful way to keep VMOD interfaces simple -- operations that differ in a "small" and specific way can be implemented with a single function or method, with an ENUM argument that specifies the difference. In the Varnish standard distribution, the shard director and VMOD blob both make use of ENUMs for this purpose. Third-party VMODs have also taken advantage of ENUMs this way (for example here, here, here, here, here and here).
But the implementation of VCL_ENUM as typedef const char *
, that is as C strings, is problematic for several reasons.
A C enum is the natural C type to correspond to an ENUM in VCL; all of the VMODs named above begin functions with ENUM arguments by parsing the string passed in for a VCL_ENUM and determining a C enum value for it. This minor detail becomes an unnecessarily costly task.
A sequence of if (strcmp) else if (strcmp) else if (strcmp)
is ugly (although it could be captured with a clever macro), and is likely to use up a lot of CPU cycles inefficiently if the VMOD function with the ENUM is called often (say on every request).
The shard director and VMOD blob both have generated code for parsing ENUM values (see shard_parse_vcc_enums.c
and parse_encoding.c
). This does the job efficiently (the parsers are state machines), but the code is generated out-of-tree (since both of them began life as third-party VMODs), and the code is opaque, "you are not expected to understand this" stuff (gotos jumping around between switch statements). Something ought to replace that code in the Varnish source sooner or later.
A VMOD doesn't have to parse the ENUM "strings" all the way to the end; they could, for example, only search as far as the ENUM symbols become unique. One trick is to choose symbol names that each begin with a different letter (see, for example, the way VMOD blob parses the case
ENUM, with values DEFAULT
, LOWER
and UPPER
).
That doesn't help much for ENUM symbols with common prefixes (for example BASE64
, BASE64URL
and BASE64URLNOPAD
in VMOD blob). Picking symbol names with different initials just to make the parsing easier is forced and unnatural.
And parsing the strings all the way to the end is more in the spirit of defensive, assertion-happy coding that is applied in Varnish. The ENUMs-as-strings will have just the values they're supposed to have, as long as code generation from VCL works the way it's supposed to. If a change is made that introduces a bug, so that an incorrect string is passed in for an ENUM, better to explode with an assert right away to expose the error.
The gist of all this is: VMODs really are obliged to parse the strings passed in for ENUMs to ensure robustness in the long run, but that is an unnecessarily costly operation. Efficient parsers, like the state machine, are less costly but also less maintainable.
The shard director uses one set of ENUMs to represent hash algorithms, and the same set is repeated in the declarations of two object methods.
The blob VMOD uses one set of ENUMs to represent bin2text encoding/decoding schemes, and its declaration is repeated for three functions, an object constructor and an object method.
A number of the third-party VMODs mentioned above also declare the same ENUM set repeatedly for several functions and methods.
It would make more sense to declare an ENUM type once, and re-use the type.
A VCL_ENUM argument in a VCC function or method declaration can be specified with a default value:
$Function VOID foo(ENUM {BAZ, QUUX} bar="BAZ")
You can also specify that NULL is passed in for the const char *
by default:
$Function VOID foo(ENUM {BAZ, QUUX} bar=0)
The 0
actually means that NULL is passed in for the pointer, unless otherwise specified in the VCL call.
But when docs are generated for the VMOD, users see exactly those default values in the generated VCL prototype declarations, and it looks like type violations:
VOID foo(ENUM {BAZ, QUUX} bar="BAZ")
VOID foo(ENUM {BAZ, QUUX} bar=0)
The default for what should be a constant ENUM value is presented to the user to look like a STRING or an INT. VCL users who are not familiar with the innards of the C implementation of VMODs can be forgiven for finding this confusing.
It should be very straightforward to update the vmodtool so that it generates the C enum types, and uses those types in C function declarations.
The more difficult problem probably arises from the fact that this will be a breaking change, requiring existing third-party VMODs that use VCL_ENUM to change. We'll just have to document that fact prominently, and write some docs to help guide VMOD authors through the transition.