-
Notifications
You must be signed in to change notification settings - Fork 737
Databus 2.0 relay design
Databus Relays are responsible for
- Reading changed rows from the Databus sources in the source database (or other relays chained) and serialize them as Databus data change events in an in-memory buffer
- Listening for requests from Databus Clients (Bootstrap Producers, other chained relays or event consumers) and stream new Databus data change events to the clients.
Here is a high-level overview of the relay.
The relay has one or more circular event buffers that store the databus events in order of increasing system change numbers (SCNs). The memory for these buffers may be either direct or memory-mapped backed by a file system. Each buffer also has a corresponding in-memory sparse index called the SCN index, and a MaxSCN reader/writer. The MaxSCN reader/writer periodically persists the value of the highest SCN seen in the events pulled into the relay. The relays fields requests from databus clients via a request processor, which listens to a Netty channel. The relay exposes a RESTful interface both to pull events and to manage the relay itself.
If the contents of the event buffer are configured to be memory-mapped, then the relay also saves the SCN index and some metadata upon shutdown, so that the events are preserved upon restart.
The circular event buffers are discussed in the page on Event Buffer Design.
The Database Event Producer periodically polls the source primary Oracle database for changes in Databus sources. If it detects such changes, it reads all changed rows in those sources and converts them to Avro records.
The conversion from JDBC RowSets to AvroRecords is done automatically based on Avro Schemas stored in the Schema Registry. The schemas are automatically generated from the Oracle schema for the Databus sources.
The Avro records are serialized into Databus events (see Javadocs generated with the class DbusEvent) which contain Databus-specific meta data and the data payload with the binary serialization of the Avro records.
The AbstractEventProducer class provides a framework for developing event producers. OracleEventProducer generates DbusEvents while reading from an Oracle database that has been modified to contain the txlog table and triggers to update the txlog table.
The MaxSCN Reader/Writer is used to keep track of the progress of the Database Event Producer (DBEP). The reader is used during the start of the DBEP. If no starting SCN has been specified, one will be read using the reader so the DBEP can continue where it left last.
After each batch of updates that is read from the DBEP and stored in the relay event buffer, the DBEP stores using the MaxSCN writer the last SCN that has been processed.
Currently, the MaxSCN reader/writer stores the SCN locally in a file. An example of the max scn file contents is:
5984592768094:Thu Dec 13 00:54:07 UTC 2012
Other implementations can use a RDBMS or Zookeeper.
See Databus V2 Protocol.
JMX support comes standard with the Jetty container. We expose a variety of MBeans for monitoring the relay. Please refer the the Javadocs of the following classes:
ContainerStatsMBean
DbusEventsTotalStatsMBean
DbusEventsStatisticsCollectorMBean
The schema registry is a package that has the schemas for all database tables that are known to Databus.
After performing gradle assemble
, unpack the generated tarball from build/databus2-cmdline-tools-pkg/distributions/
and cd into the created directory. Then:
$ ./bin/dbus2-avro-schema-gen.sh -namespace com.linkedin.events.example.person -recordName Person \ -viewName "sy\$person" -avroOutDir /path/to/work/trunk/schemas_registry -avroOutVersion 1 \ -javaOutDir /path/to/work/trunk/databus2-example-events/databus2-example-person/src/main/java \ -userName person -password person Processed command line arguments: recordName=Person avroOutVersion=1 viewName=sy$person javaOutDir=/path/to/work/trunk/databus2-example-events/databus2-example-person/src/main/java avroOutDir=/path/to/work/trunk/schemas_registry userName=person password=person namespace=com.linkedin.events.example.person Generating schema for sy$person Processing column sy$person.TXN:NUMBER Processing column sy$person.KEY:NUMBER Processing column sy$person.FIRST_NAME:VARCHAR2 Processing column sy$person.LAST_NAME:VARCHAR2 Processing column sy$person.BIRTH_DATE:DATE Processing column sy$person.DELETED:VARCHAR2 Generated Schema: { “name” : “Person_V1”, “doc” : “Auto-generated Avro schema for sy$person. Generated at Dec 04, 2012 05:07:05 PM PST”, “type” : “record”, “meta” : “dbFieldName=sy$person;”, “namespace” : “com.linkedin.events.example.person”, “fields” : [ { "name" : "txn", "type" : [ "long", "null" ], "meta" : "dbFieldName=TXN;dbFieldPosition=0;" }, { "name" : "key", "type" : [ "long", "null" ], "meta" : "dbFieldName=KEY;dbFieldPosition=1;" }, { "name" : "firstName", "type" : [ "string", "null" ], "meta" : "dbFieldName=FIRST_NAME;dbFieldPosition=2;" }, { "name" : "lastName", "type" : [ "string", "null" ], "meta" : "dbFieldName=LAST_NAME;dbFieldPosition=3;" }, { "name" : "birthDate", "type" : [ "long", "null" ], "meta" : "dbFieldName=BIRTH_DATE;dbFieldPosition=4;" }, { "name" : "deleted", "type" : [ "string", "null" ], "meta" : "dbFieldName=DELETED;dbFieldPosition=5;" } ] } Avro schema will be saved in the file: /path/to/work/trunk/schemas_registry/com.linkedin.events.example.person.Person.1.avsc Generating Java files in the directory: /path/to/work/trunk/databus2-example-events/databus2-example-person/src/main/java Done.
(Note that /path/to/work
will be whatever path prefix you used in the -avroOutDir
and -javaOutDir
options.)