- Dependencies: Minimize dependencies of core installation,
defer
polars
tocratedb-toolkit[io]
.
- MongoDB: Added Zyp transformations to the CDC subsystem, making it more symmetric to the full-load procedure.
- Query Converter: Added very basic expression converter utility with CLI interface
- DynamoDB: Added query expression converter for relocating object references, to support query migrations after the breaking change with the SQL DDL schema, by v0.0.27.
- IO: Improved
BulkProcessor
when running per-record operations by also checkingrowcount
for handlingINSERT OK, 0 rows
responses - MongoDB: Fixed BSON decoding of
{"$date": 1180690093000}
timestamps by updating to commons-codec 0.0.21. - Testcontainers: Don't always pull the OCI image before starting. It is unfortunate in disconnected situations.
- MongoDB: Updated to pymongo 4.9
- DynamoDB: Change CrateDB data model to use (
pk
,data
,aux
) columns Attention: This is a breaking change.
- MongoDB: Configure
MongoDBCrateDBConverter
after updating to commons-codec 0.0.18 - DynamoDB CDC: Fix
MODIFY
operation to also propagate deleted attributes
- Table Loader: Improved conditional handling of "transformation" parameter
- Table Loader: Improved status reporting and error logging in
BulkProcessor
- MongoDB: Improve error reporting
- MongoDB Full: Polars'
read_ndjson
doesn't load MongoDB JSON data well, usefsspec
andorjson
instead - MongoDB Full: Improved initialization of transformation subsystem
- MongoDB Adapter: Improved performance of when computing collection cardinality
by using
collection.estimated_document_count()
- MongoDB Full: Optionally use
limit
parameter as number of total records - MongoDB Adapter: Evaluate
_id
filter field by upcasting tobson.ObjectId
, to convey a filter that makesctk load table
process a single document, identified by its OID - MongoDB Dependencies: Update to commons-codec 0.0.17
- MongoDB Full: Refactor transformation subsystem to
commons-codec
- MongoDB: Update to commons-codec v0.0.16
- MongoDB: Unlock processing multiple collections, either from server database, or from filesystem directory
- MongoDB: Unlock processing JSON files from HTTP resource, using
https+bson://
- MongoDB: Optionally filter server collection using MongoDB query expression
- MongoDB: Improve error handling wrt. bulk operations vs. usability
- DynamoDB CDC: Add
ctk load table
interface for processing CDC events - DynamoDB CDC: Accept a few more options for the Kinesis Stream: batch-size, create, create-shards, start, seqno, idle-sleep, buffer-time
- DynamoDB Full: Improve error handling wrt. bulk operations vs. usability
- MongoDB: Rename columns with leading underscores to use double leading underscores
- MongoDB: Add support for UUID types
- MongoDB: Improve reading timestamps in previous BSON formats
- MongoDB: Fix processing empty arrays/lists. By default, assume
TEXT
as inner type. - MongoDB: For
ctk load table
, use "partial" scan for inferring the collection schema, based on the first 10,000 documents. - MongoDB: Skip leaking
UNKNOWN
fields into SQL DDL. This means relevant column definitions will not be included into the SQL DDL. - MongoDB: Make
ctk load table
use thedata OBJECT(DYNAMIC)
mapping strategy. - MongoDB: Sanitize lists of varying objects
- MongoDB: Add treatment option for applying special treatments to certain items on real-world data
- MongoDB: Use pagination on source collection, for creating batches towards CrateDB
- MongoDB: Unlock importing MongoDB Extended JSON files using
file+bson://...
- DynamoDB: Add special decoding for varied lists.
Store them into a separate
OBJECT(IGNORED)
column in CrateDB. - DynamoDB: Add pagination support for
full-load
table loader
- DMS/DynamoDB: Fix table name quoting within CDC processor handler
- MongoDB: Fix and verify Zyp transformations
- DMS/DynamoDB/MongoDB I/O: Use SQL with parameters instead of inlining values
- Dependencies: Unpin commons-codec, to always use the latest version
- Dependencies: Unpin lorrystream, to always use the latest version
- MongoDB: Improve type mapper by discriminating between
INTEGER
andBIGINT
- MongoDB: Improve type mapper by supporting BSON
DatetimeMS
,Decimal128
, andInt64
types
- Processor: Updated Kinesis Lambda processor to understand AWS DMS
- MongoDB: Fix missing output on STDOUT for
migr8 export
- MongoDB: Improve timestamp parsing by using
python-dateutil
- MongoDB: Converge
_id
input field toid
column instead of dropping it - MongoDB: Make user interface use stderr, so stdout is for data only
- MongoDB: Make
migr8 extract
write to stdout by default - MongoDB: Make
migr8 translate
read from stdin by default - MongoDB: Improve user interface messages
- MongoDB: Strip single leading underscore character from all top-level fields
- MongoDB: Map OID types to CrateDB TEXT columns
- MongoDB: Make
migr8 extract
andmigr8 export
accept the--limit
option - MongoDB: Fix indentation in prettified SQL output of
migr8 translate
- MongoDB: Add capability to give type hints and add transformations
- Dependencies: Adjust code for lorrystream version 0.0.3
- Dependencies: Update to lorrystream 0.0.4 and commons-codec 0.0.7
- DynamoDB: Add table loader for full-load operations
ctk load table
: Added support for MongoDB Change Streams- Fix dependency with the
kaggle
package, downgrade tokaggle==1.6.14
- DynamoDB CDC: Add demo to support reading DynamoDB change data capture
- IO: Added the
if-exists
query parameter by updating to influxio 0.4.0. - Rockset: Added CrateDB Rockset Adapter, a HTTP API emulation layer
- MongoDB: Added adapter amalgamating PyMongo to use CrateDB as backend
- SQLAlchemy: Clean up and refactor SQLAlchemy polyfills
to
cratedb_toolkit.util.sqlalchemy
- CFR: Build as a self-contained program using PyInstaller
- CFR: Publish self-contained application bundle to GitHub Workflow Artifacts
- Add
ctk cfr
andctk wtf
diagnostics programs - Remove support for Python 3.7
- SQLAlchemy dialect: Use
sqlalchemy-cratedb>=0.37.0
This includes the fix to theget_table_names()
reflection method.
- Dependencies: Migrate from
crate[sqlalchemy]
tosqlalchemy-cratedb
- Fix InfluxDB Cloud <-> CrateDB Cloud connectivity by using
ssl=true
query argument also forinfluxdb2://
source URLs.
- Fix InfluxDB Cloud <-> CrateDB Cloud connectivity by propagating
ssl=true
query argument. Update dependencies toinfluxio>=0.2.1,<1
.
- Dependencies: Unpin upper version bound of
dask
. Otherwise, compatibility issues can not be resolved quickly, like with Python 3.11.9. dask/dask#11038
- Dependencies: Use
dask[dataframe]
- datasets: Fix compatibility with Python 3.7
- datasets: Fix dataset loader
- Added
cratedb_toolkit.datasets
subsystem, for acquiring datasets from cratedb-datasets and Kaggle.
- Do not always activate pytest11 entrypoint to pytest fixture
cratedb_service
, as it depends on thetestcontainers
package, which is not always installed.
- Packaging: Use
cloud
extra to install relevant packages - Dependencies: Add
testing
extra, which installstestcontainers
only - Testing: Export
cratedb_service
fixture as pytest11 entrypoint - Sandbox: Reduce number of extras by just using
all
- Add SQL runner utility primitives to
io.sql
namespace - Add
import_csv_pandas
andimport_csv_dask
utility primitives - data: Add subsystem for "loading" data.
- Add SDK and CLI for CrateDB Cloud Data Import APIs
ctk load table ...
- Add
migr8
program from previous repository - InfluxDB: Add adapter for
influxio
- MongoDB: Add
migr8
program from previous repository - MongoDB: Improve UX by using
ctk load table mongodb://...
- load table: Refactor to use more OO
- Add
examples/cloud_import.py
- Adapt testcontainers to be agnostic of the testing framework. Thanks, @pilosus.
-
CLI: Upgrade to
click-aliases>=1.0.2
, fixing erroring out when no group aliases are specified. -
Add support for Python 3.12
-
SQLAlchemy: Improve UNIQUE constraints polyfill to accept multiple column names, for emulating unique composite keys.
-
SQLAlchemy: Add a few patches and polyfills, which do not fit well into the vanilla Python driver / SQLAlchemy dialect.
-
Retention: Refactor strategies
delete
,reallocate
, andsnapshot
, to standalone variants. -
Retention: Bundle configuration and runtime settings into
Settings
entity, and use more OO instead of weak dictionaries: AddRetentionStrategy
,TableAddress
, andSettings
entities, to improve information passing throughout the application and the SQL templates. -
Retention: Add
--schema
option, andCRATEDB_EXT_SCHEMA
environment variable, to configure the database schema used to store the retention policy table. The default value isext
. -
Retention: Use full-qualified table names everywhere.
-
Retention: Fix: Compensate for
DROP REPOSITORY
now returningRepositoryMissingException
when the repository does not exist. With previous versions of CrateDB, it wasRepositoryUnknownException
.
- Import "data retention" implementation from https://github.com/crate/crate-airflow-tutorial. Thanks, @hammerhead.