Customizable binary data parser
The goal is to be able to parse various binary formats (e.g. network protocols and data files) in a flexible rapid-development compatible way.
Download k.h and libq.a from Kx website.
An example build script is in b.cmd
. You might have to fix the -I and -L parameters
so they point to the directories of k.h and libq.a.
Requires KDB 3.5 for the enhanced lambda metadata
\l path/to/qbinparse/qbinparse.q
The parser uses a simple language to describe the schema of the data.
The schema is a series of record definitions in the form record fields end.
Each field definition is in the form field name type.
The possible types are:
byte
,char
,short
,int
,long
,real
,float
: same meaning as in quint
,ushort
: unsigned value that is represented in q with the next bigger integer type (anushort
is returned as an int and anuint
is returned as a long). There is noulong
because there is no integer type in q that is able to represent all of its values.dotnetVarLengthInt
: an integer that is stored in a way compatible with .NET's variable-length integer serialization format (each byte encodes 7 bits of the original integer, with the most significant bit indicating that there are more bytes to follow)record
recordName: a nested recordarray
elementType size: an array- elementType can be the atomic types or record, currently multi-dimensional array is not supported
- size can be specified as:
x
number: constant lengthxv
fieldName: length is the value of the specified fieldxz
: zero-terminated stringtpb
number: array has a guard byte with the value number after ittps
number: array has a guard short with the value number after ittpi
number: array has a guard int with the value number after itrepeat
: array extends up to the end of the available input - this should be the last element in the main record or used in aparsedArray
parsedArray
size elementType: an array with internal structure. The size specifies the number of bytes the array takes up, then the parsing process is recursively called on the array. elementType must be a full field type, typically an array or record. If a regulararray
is used within aparsedArray
, it will have its own size, which can berepeat
to make it cover the entireparsedArray
.case
fieldName val1 rec1 val2 rec2 ... [default
recD]: a variable-type field that is parsed as one of the specified records based on the value of the tag field. valN are either integers or four-character strings. An optional default case can be added that covers values not listed in the cases.
In addition the type may be preceded by an operator. The following operator is supported:
recSize
: the field contains the record size, during parsing the "end of record" (for determining which fields run past the end of the input and how much data arepeat
field can consume) is set according to this size.
First compile the schema:
schema:.binp.compileSchema schemaStr;
Then use the compiled schema on the data:
.binp.parse[schema;0x0000;`mainType]
This is the inverse operation of .binp.parse:
.binp.unparse[schema;`a`b!1 2;`mainType]
An array of 4 ints:
schemaStr:"
record simple
field nums array int x 4
end";
schema:.binp.compileSchema schemaStr;
.binp.parse[schema;0x01000000020000000300000004000000;`simple]
Returns: enlist[`nums]!enlist 1 2 3 4i
The inverse operation:
q).binp.unparse[schema;enlist[`nums]!enlist 1 2 3 4;`simple]
0x01000000020000000300000004000000
A string with a two-byte length prepended:
schemaStr:"
record stringWithShortLen
field length short
field str array char xv length
end";
schema:.binp.compileSchema schemaStr;
.binp.parse[schema;0x060048656c6c6f;`stringWithShortLen]
Returns: `length`str!(6h;"Hello")
See also examples/parse.q for parsing and examples/unparse.q for unparsing.
Parsing failures don't throw errors but instead return a partial object with the error inserted into the value of the problematic field as a symbol. Possible errors include:
endOfBuffer
: attempt to read a field when the read position is already at the end of the inputarrayRunsPastInput
: an array has a size that would make it cover data past the end of the inputtooLargeArray
: the array size wouldn't fit into 32 bitsnoCaseMatch
: acase
field encountered an input value that is not among the cases and there is nodefault
case
Furthermore if there are extra bytes left over after parsing the main record, the leftover bytes are added to the record with a field named xxxRemainingData
. This is also considered a type of error and in particular the .binp.unparse
function will ignore this field. To describe a format that allows garbage/padding/irrelevant data at the end, use an array byte repeat
field as the last field to capture all the remaining bytes.