Skip to content

QueuedSink and ThreadPoolQueuedSink

metacret edited this page Sep 9, 2014 · 1 revision

QueuedSink

QueuedSink is the abstract class for implementing the sink with the queue. Internally, writeTo(MessageContainer) method will put the message in the queue and background thread polls messages in a batch way and call the implementation of write(List<Message>). The following properties are needed for QueuedSink.

Properties Description type Default
queue4Sink Which type of queue should be used MessageQueue4Sink Memory based queue with capacity of 10,000
batchSize Same as queue.buffering.max.messages of Kafka Producer config int 1000
batchTimeout Same as queue.buffering.max.ms of Kafka Producer config int 1000 (ms)

ThreadPoolQueuedSink

When the sink IO implementation can be thread safe like ElasticSearch TransportClient or Kafka 0.8 Producer, we can make that IO part as thread pool. ThreadPoolQueuedSink is extending QueuedSink. LocalFileSink is not ThreadPoolQueuedSink because it has Hadoop SequenceFile.Writer which is not thread safe.

Properties Description type Default
jobQueueSize Capacity of workQueue argument in ThreadPoolExecutor constructor int 1000
corePoolSize corePoolSize argument in ThreadPoolExecutor constructor int 3
maxPoolSize maximumPoolSize argument in ThreadPoolExecutor constructor int 10
jobTimeout On closing the sink, how long it should wait before terminating the job forcefully in millisecond int INFINITE
Clone this wiki locally