MySQL Binary Log connector.
Initially project was started as a fork of open-replicator, but ended up as a complete rewrite. Key differences/features:
- automatic binlog filename/position | GTID resolution
- resumable disconnects
- plugable failover strategies
- JMX exposure (optionally with statistics)
- availability in Maven Central
- no third-party dependencies
- binlog_checksum=CRC32 support (for MySQL 5.6.2+ users)
- test suite over different versions of MySQL releases
If you are looking for something similar in other languages - check out siddontang/go-mysql (Go), noplay/python-mysql-replication (Python).
Get the latest JAR(s) from here. Alternatively you can include following Maven dependency (available through Maven Central):
<dependency>
<groupId>com.github.shyiko</groupId>
<artifactId>mysql-binlog-connector-java</artifactId>
<version>0.2.4</version>
</dependency>
The latest development version always available through Sonatype Snapshots repository (as shown below).
<dependencies>
<dependency>
<groupId>com.github.shyiko</groupId>
<artifactId>mysql-binlog-connector-java</artifactId>
<version>0.2.5-SNAPSHOT</version>
</dependency>
</dependencies>
<repositories>
<repository>
<id>sonatype-snapshots</id>
<url>https://oss.sonatype.org/content/repositories/snapshots</url>
<snapshots>
<enabled>true</enabled>
</snapshots>
<releases>
<enabled>false</enabled>
</releases>
</repository>
</repositories>
File binlogFile = ...
BinaryLogFileReader reader = new BinaryLogFileReader(binlogFile);
try {
for (Event event; (event = reader.readEvent()) != null; ) {
...
}
} finally {
reader.close();
}
PREREQUISITES: Whichever user you plan to use for the BinaryLogClient, he MUST have REPLICATION SLAVE privilege. Unless you specify binlogFilename/binlogPosition yourself (in which case automatic resolution won't kick in), you'll need REPLICATION CLIENT granted as well.
BinaryLogClient client = new BinaryLogClient("hostname", 3306, "username", "password");
client.registerEventListener(new EventListener() {
@Override
public void onEvent(Event event) {
...
}
});
client.connect();
By default, BinaryLogClient starts from the current (at the time of connect) master binlog position. If you wish to kick off from a specific filename or position, use
client.setBinlogFilename(filename)
+client.setBinlogPosition(position)
.
client.connect()
is blocking (meaning that client will listen for events in the current thread).client.connect(timeout)
, on the other hand, spawns a separate thread.
You might need it for several reasons: you don't want to waste time deserializing events you won't need; there is no EventDataDeserializer defined for the event type you are interested in (or there is but it contains a bug); you want certain type of events to be deserialized in a different way (perhaps *RowsEventData should contain table name and not id?); etc.
EventDeserializer eventDeserializer = new EventDeserializer();
// do not deserialize EXT_DELETE_ROWS event data, return it as a byte array
eventDeserializer.setEventDataDeserializer(EventType.EXT_DELETE_ROWS,
new ByteArrayEventDataDeserializer());
// skip EXT_WRITE_ROWS event data altogether
eventDeserializer.setEventDataDeserializer(EventType.EXT_WRITE_ROWS,
new NullEventDataDeserializer());
// use custom event data deserializer for EXT_DELETE_ROWS
eventDeserializer.setEventDataDeserializer(EventType.EXT_DELETE_ROWS,
new EventDataDeserializer() {
...
});
BinaryLogClient client = ...
client.setEventDeserializer(eventDeserializer);
MBeanServer mBeanServer = ManagementFactory.getPlatformMBeanServer();
BinaryLogClient binaryLogClient = ...
ObjectName objectName = new ObjectName("mysql.binlog:type=BinaryLogClient");
mBeanServer.registerMBean(binaryLogClient, objectName);
// following bean accumulates various BinaryLogClient stats
// (e.g. number of disconnects, skipped events)
BinaryLogClientStatistics stats = new BinaryLogClientStatistics(binaryLogClient);
ObjectName statsObjectName = new ObjectName("mysql.binlog:type=BinaryLogClientStatistics");
mBeanServer.registerMBean(stats, statsObjectName);
- data of numeric types (tinyint, etc) always returned signed(!) regardless of whether column definition includes "unsigned" keyword or not
- data of var*/*text/*blob types always returned as a byte array (for var* this is true starting from [email protected]).
Q. How does a typical transaction look like?
A. GTID event (if gtid_mode=ON) -> QUERY event with "BEGIN" as sql -> ... -> XID event | QUERY event with "COMMIT" or "ROLLBACK" as sql.
Q. EventData for inserted/updated/deleted rows has no information about table (except for some weird id). How do I make sense out of it?
A. Each WriteRowsEventData/UpdateRowsEventData/DeleteRowsEventData event is preceded by TableMapEventData which contains schema & table name. If for some reason you need to know column names (types, etc). - the easiest way is to
select TABLE_SCHEMA, TABLE_NAME, COLUMN_NAME, ORDINAL_POSITION, COLUMN_DEFAULT, IS_NULLABLE,
DATA_TYPE, CHARACTER_MAXIMUM_LENGTH, CHARACTER_OCTET_LENGTH, NUMERIC_PRECISION, NUMERIC_SCALE,
CHARACTER_SET_NAME, COLLATION_NAME from INFORMATION_SCHEMA.COLUMNS;
# see https://dev.mysql.com/doc/refman/5.6/en/columns-table.html for more information
(yes, binary log DOES NOT include that piece of information).
You can find JDBC snippet here.
There are two entry points - BinaryLogClient (which you can use to read binary logs from a MySQL server) and BinaryLogFileReader (for offline log processing). Both of them rely on EventDeserializer to deserialize stream of events. Each Event consists of EventHeader (containing among other things reference to EventType) and EventData. The aforementioned EventDeserializer has one EventHeaderDeserializer (EventHeaderV4Deserializer by default) and a collection of EventDataDeserializer|s. If there is no EventDataDeserializer registered for some particular type of Event - default EventDataDeserializer kicks in (NullEventDataDeserializer).
For the insight into the internals of MySQL look here. MySQL Client/Server Protocol and The Binary Log sections are particularly useful as a reference documentation for the **.binlog.network
and **.binlog.event
packages.
Some of the OSS built on top of mysql-binlog-conector-java: shyiko/rook (generic Change Data Capture (CDC) toolkit), mardambey/mypipe (MySQL to Apache Kafka replicator), ngocdaothanh/mydit (MySQL to MongoDB replicator),
It's also used on a large scale in MailChimp. You can read about it here.
git clone https://github.com/shyiko/mysql-binlog-connector-java.git
cd mysql-binlog-connector-java
mvn # shows how to build, test, etc. project
In lieu of a formal styleguide, please take care to maintain the existing coding style.
Executing mvn checkstyle:check
within project directory should not produce any errors.
If you are willing to install vagrant (required by integration tests) it's highly recommended
to check (with mvn clean verify
) that there are no test failures before sending a pull request.
Additional tests for any new or changed functionality are also very welcomed.