Skip to content

Release v0.9.4-redis-connect

Pre-release
Pre-release
Compare
Choose a tag to compare
@viragtripathi viragtripathi released this 04 Oct 23:55
· 75 commits to main since this release
48ed69e

🚀 Changelog

🧰 Enhancements

  • Enhanced REDIS_STREAMS_SINK to support partitioning by publishing to separate Redis Stream keys. This includes partitioning at the Partitioned Job level (scale-out the source) and/or at the Target Sink level (scale-out the target).
  • Enhanced REDIS_STREAMS_SINK to pass through the timestamp which denotes when the source committed the transaction to its change log/table. This includes maintaining a unique sequence for transactions that occurred within the same timestamp. With this capability, users can maintain exact ordering as captured at the source, across partitioned Redis Stream keys, even if they arrive out of order, simply by reordering them within the target database without concern about managing conflicts.
  • Enhanced REDIS_STREAMS_SINK with optional maxLength configuration so the target Redis database can be protected from potentially running out of memory in the event the Redis Stream's consumer stops managing the stream's length.
  • Enhanced Initial Load to support credentials rotation without having to restart tasks. This is particularly useful for very long-running initial load and periodic ETL processes which might overlap with credential rotation schedules.
  • Enhanced Initial Load tasks to support the stop process with full feature parity to partitioned stream jobs. This includes handling cascading failure scenarios and graceful failure of all partitions in the event of a single partition failure. It is particularly useful for testing in development environments.
  • Enhanced Initial Load tasks to support the use of RowIndex as the primary key. This is particularly useful for ETL processes that replicate data from aggregated reporting tables which do not have a primary key.
  • Enhanced Initial Load tasks to support a customWhereClause configuration that is seamlessly added to the underlying select statement used for initial load. This is compatible with each variation of initial load configuration (primary key, RowIndex, and pass through).
  • Enhanced Initial Load tasks with circuit breaker protection and connection retry logic for parity with stream jobs.
  • Enhanced Initial Load tasks to quiesce each stage for all events published to its pipeline before notification of its completion, release of resources, and status update in the Job Manager database. This is particularly useful for long-running custom stages and pipelines with large buffer sizes which might require prolonged durations to fully quiesce.
  • Enhanced Initial Load tasks with new transition types so there is distinction between tasks that were COMPLETED, manually STOPPED, or abruptly FAILED.
  • Enhanced Initial Load tasks to share a data source across partitions significantly reducing connection overhead. This is only supported for JDBC-based tasks.
  • Added new REST (including SWAGGER) and CLI endpoints to access Job and Task (Initial Load) transition logs without having to access the Redis CLI or RedisInsight directly.
  • Enhanced stop/remove processes with quiesce capability so all events published to their pipeline (per partition) fully process each stage before shutdown (due to graceful stop or failure event). This includes bypass logic for certain failure cases in which the root cause would prevent writing to the target which avoids waiting to timeout each event.
  • Improved orchestration for stop process while backpressure protection is occurring (e.g. there is more load than the system was configured to handle) so all events already published to their pipeline (per partition) fully process each stage before shutdown.
  • Improved orchestration for graceful failures in the transformation layer in order to avoid race conditions between producer and stage(s) threads.
  • Enhanced credentials rotation by adding stop process to handle failed connections with new credentials. This avoids harder to troubleshoot downstream connection exceptions once the former credentials expire.
  • Enhanced stop/restart/migrate processes with parallelization so all partitions, owned by a single Redis Connect Instance, can begin their quiesce process at the same time instead of serial.
  • Added support for RAW and CLOB column types.
  • Added BaseCustomStageHandler to standardize logging and exception handling. Users now only need to extend this new handler and implement a single method when creating a custom stage.
  • Bumped debezium release version to v1.9.6.Final which includes our requested fix for RDB sources to parse JSON data without the constraint on CLOB column.
  • Added various new validations to avoid corner case misconfigurations.
  • Improved various exception handling and logging for easier root-cause analysis.
  • Changed default pipeline buffer size to 4096 to avoid unnecessarily prolonged quiesce cycles.
  • (Not backward compatible) Renamed REDIS_STREAM_SINK to REDIS_STREAMS_SINK.
  • (Not backward compatible) Changed default value for snapshot.mode for all Debezium supported sources from "initial" to either "never" or "schema_only". This avoids using debezium's initial load snapshot process which is slow and does not scale efficiently. For development environments, users can manually set the snapshot.mode back to "initial" since they test on small tables. For production environments, Redis Connect's initial load process should be used to scale independently from the stream process.
  • (Deprecated) Initial Load selectQuery and countQuery configurations. In their place a new framework removes the need for users to create complex nested queries. Instead each query will be customized to user preferences based on Boolean fields and customWhereClause; specific to the unique semantics of each source's SQL support.

🐛 Bug Fixes

  • Fixed checkpoint transactionality to work on clustered Redis databases.
  • Replaced scanning every file within the user-provided credentials directory, during the credentials rotation process, with a direct read on only the file/job that is having its credentials rotated. This avoids impacting every job (noisy neighbor) during each individual credentials rotation cycle.

Tested Versions

Java 11+
Redis Connect 0.9.x
DB2 (Initial Loader) Database: 11.5.x
JDBC Driver: 11.5.6.0
Files (Initial Loader) CSV
MongoDB (CDC and Initial Loader) Database: 4.4+
Driver: 4.3.3
MySQL (CDC and Initial Loader) Database: 5.7, 8.0.x
JDBC Driver: 8.0.28
Oracle (CDC and Initial Loader) Database: 11g, 12c, 19c, 21c
JDBC Driver: 12.2.0.1, 19.8.0.0, 21.1.0.0
Adapter: logminer
PostgreSQL (CDC and Initial Loader) Database: 10, 11, 12, 13, 14
JDBC Driver: 42.3.5
Plug-ins: pgoutput
SQL Server (CDC and Initial Loader) Database: 2017, 2019
JDBC Driver: 9.4.1.jre8
Vertica (Initial Loader) Database: 11.1.0-0
JDBC Driver: 11.1.0-0, 12.0.1-0