You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am building a streaming application designed to ingest messages of varying schemas and analyse a subset of their fields, based on information held within the schema
E.g. Any producer can publish a message to the topic this service consumes from (as long as the message is an avro message and its schema is registered in the shared Schema Registry), and this app will analyse all string fields and publish results
Decoding the message works well using Faust's raw codec and this library's AvroMessageSerializer, but this doesn't provide access to the schema definition, which is generated within the MessageSerializer.decode_message method
To receive a message and access the decoded message plus the schema, I have extracted the relevant part from the MessageSerializer.decode_message method as a standalone function
importstructfromschema_registry.serializers.errorsimportSerializerErrorfromschema_registry.serializers.message_serializerimportContextStringIO, MAGIC_BYTEdefdecode_schema_id(message: bytes) ->int:
""" Decode the schema ID from a message from kafka that has been encoded for use with the schema registry. This function is an extension to the python-schema-registry-client, which only provides the deserialised message. Args: message: message to be decoded Returns: dict: The ID of the schema the message was encoded with """iflen(message) <=5:
raiseSerializerError("message is too small to decode")
withContextStringIO(message) aspayload:
magic, schema_id=struct.unpack(">bI", payload.read(5))
ifmagic!=MAGIC_BYTE:
raiseSerializerError("message does not start with magic byte")
returnschema_id
From the message bytes, I can then access the schema ID, then use the SchemaRegistryClient to get the schema and use the AvroMessageSerializer to get the decoded message
I would like to be able to get the Schema ID and / or the Schema from the decoded message via the MessageSerializer API
Access to these could be provided via additional methods like above, or perhaps a nicer API would be for MessageSerializer.decode_message to return an object containing the schema_id, the schema and the payload rather than just the payload especially given these are already extracted / retrieved within this method
The text was updated successfully, but these errors were encountered:
I am building a streaming application designed to ingest messages of varying schemas and analyse a subset of their fields, based on information held within the schema
E.g. Any producer can publish a message to the topic this service consumes from (as long as the message is an avro message and its schema is registered in the shared Schema Registry), and this app will analyse all string fields and publish results
Decoding the message works well using Faust's
raw
codec and this library'sAvroMessageSerializer
, but this doesn't provide access to the schema definition, which is generated within theMessageSerializer.decode_message
methodTo receive a message and access the decoded message plus the schema, I have extracted the relevant part from the
MessageSerializer.decode_message
method as a standalone functionFrom the message bytes, I can then access the schema ID, then use the
SchemaRegistryClient
to get the schema and use theAvroMessageSerializer
to get the decoded messageI would like to be able to get the Schema ID and / or the Schema from the decoded message via the
MessageSerializer
APIAccess to these could be provided via additional methods like above, or perhaps a nicer API would be for
MessageSerializer.decode_message
to return an object containing theschema_id
, theschema
and thepayload
rather than just thepayload
especially given these are already extracted / retrieved within this methodThe text was updated successfully, but these errors were encountered: