This provides the source files for MariaDB's ColumunStore bulk loader plugin, to inject data into ColumnStore via PDI.
This plugin was designed for following software composition:
- OS: Ubuntu 16.04, RHEL/CentOS+ 7, Windows 10
- MariaDB ColumnStore >= 1.2.0
- MariaDB Java Database client* >= 2.2.1
- Java >= 8
- PDI >= 7
+not officially supported by Pentaho.
*only needed if you want to execute DDL.
Follow this steps to build the plugin from source.
These requirements need to be installed prior building:
- MariaDB AX Bulk Data Adapters 1.2.0 or higher (an DEB/RPM is provided by MariaDB)
- Java SDK 8 or higher
- chrpath (only on Linux)
sudo apt-get install chrpath
sudo yum install chrpath
To build the plugin from source execute following commands:
git clone https://github.com/mariadb-corporation/mariadb-columnstore-data-adapters.git
cd mariadb-columnstore-data-adapters/kettle-columnstore-bulk-exporter-plugin
./gradlew [-PmcsapiLibPath="include this custom mcsapi path"] [-Pversion="x.y.z"] plugin
The built plugin can be found in build/distributions/
NOTE:
- The generated plugin's archive's name doesn't contain release and OS information if build manually and not through cmake.
To build the plugin from source you first have to execute following commands:
git clone https://github.com/mariadb-corporation/mariadb-columnstore-data-adapters.git
cd mariadb-columnstore-data-adapters/kettle-columnstore-bulk-exporter-plugin
gradlew.bat -b "build_win.gradle" -Pversion=${VERSION} -PmcsapiRuntimeLibrary=${MCSAPI_RUNTIME_LIBRARY} -PmcsapiLibxml2RuntimeLibrary=${MCSAPI_LIBXML2_RUNTIME_LIBRARY} -PmcsapiLibiconvRuntimeLibrary=${MCSAPI_LIBICONV_RUNTIME_LIBRARY} -PmcsapiLibuvRuntimeLibrary=${MCSAPI_LIBUV_RUNTIME_LIBRARY} -PjavamcsapiLibraryPath=${JAVA_MCSAPI_LIBRARY_PATH} -PjavamcsapiRuntimeLibrary=${JAVA_MCSAPI_RUNTIME_LIBRARY} plugin
NOTES:
- You have to substitute all variables according to your mcsapi installation. It is probably easier to built the PDI plugin through cmake from the top level directory.
- The generated plugin's archive's name doesn't contain release and OS information if build manually and not through cmake.
Following steps are necessary to install the ColumnStore bulk loader plugin.
- build the plugin from source or download it from our website
- extract the archive mariadb-columnstore-kettle-bulk-exporter-plugin-*.zip into your PDI installation directory $PDI-INSTALLATION/plugins.
- copy MariaDB's JDBC Client mariadb-java-client-2.2.x.jar into PDI's lib directory $PDI-INSTALLATION/lib.
- install the additional library dependencies
sudo apt-get install libuv1
sudo yum install epel-release
sudo yum install libuv
The Visual C++ Redistributable for Visual Studio 2015 (x64) is required to use the Bulk Write SDK.
By default the plugin tries to use ColumnStore's default configuration /usr/local/mariadb/columnstore/etc/Columnstore.xml to connect to the ColumnStore instance through the Bulk Write SDK.
Individual configurations can be assigned within each block.
Information on how to change the Columnstore.xml configuration file to connect to remote ColumnStore instances can be found in our Knowledge Base.
All continious integration test jobs are in the test directory and can be run through the regression suite, loaded manually into kettle or be executed through the test scripts.
On Linux the test script can be manually invoked through:
./test/test.sh [path_to_the_pdi_connector_to_test] [-v]
On Windows through:
powershell -File .\test\test.ps1 [-csPdiPlugin path_to_the_pdi_connector_to_test]
The test script will download PDI 7.1 and 8.1, install the built plugin and MariaDB JDBC driver, and execute the tests residing in the tests sub-directories.
You might have to change the database connection properties set in job.parameter or job.parameter.win, according to your ColumnStore setup.
On Windows 10 the default test configuration uses the environment variables MCSAPI_CS_TEST_IP
, MCSAPI_CS_TEST_PASSWORD
, MCSAPI_CS_TEST_USER
, and COLUMNSTORE_INSTALL_DIR
.
By default the test scripts use the built Kettle Columnstore plugin build/distributions/mariadb-columnstore-kettle-bulk-exporter-plugin-*.zip
.
A specific Kettle Columnstore plugin can be specified as optional command line argument.
This job runs a basic ingestion test of all datatypes into ColumnStore and InnoDB tables and compares the results.
Ingests two csv files into ColumnStore and checks if the count of injected rows matches the line count of the csv files. Possible to adapt the number of ingestion loops to run in job.parameter.
The plugin currently can't handle blob datatypes and only supports multi inputs to one block if the input field names are equal for all input sources.