Skip to content

Latest commit

 

History

History
109 lines (84 loc) · 5.55 KB

File metadata and controls

109 lines (84 loc) · 5.55 KB

MariaDB ColumnStore - Pentaho Data Integration - Bulk Loader Plugin

This provides the source files for MariaDB's ColumunStore bulk loader plugin, to inject data into ColumnStore via PDI.

Compatibility notice

This plugin was designed for following software composition:

  • OS: Ubuntu 16.04, RHEL/CentOS+ 7, Windows 10
  • MariaDB ColumnStore >= 1.2.0
  • MariaDB Java Database client* >= 2.2.1
  • Java >= 8
  • PDI >= 7

+not officially supported by Pentaho.

*only needed if you want to execute DDL.

Building the plugin from source

Follow this steps to build the plugin from source.

Requirements

These requirements need to be installed prior building:

  • MariaDB AX Bulk Data Adapters 1.2.0 or higher (an DEB/RPM is provided by MariaDB)
  • Java SDK 8 or higher
  • chrpath (only on Linux)
sudo apt-get install chrpath
sudo yum install chrpath

Build process on Linux

To build the plugin from source execute following commands:

git clone https://github.com/mariadb-corporation/mariadb-columnstore-data-adapters.git
cd mariadb-columnstore-data-adapters/kettle-columnstore-bulk-exporter-plugin
./gradlew [-PmcsapiLibPath="include this custom mcsapi path"] [-Pversion="x.y.z"] plugin

The built plugin can be found in build/distributions/

NOTE:

  • The generated plugin's archive's name doesn't contain release and OS information if build manually and not through cmake.

Build process on Windows

To build the plugin from source you first have to execute following commands:

git clone https://github.com/mariadb-corporation/mariadb-columnstore-data-adapters.git
cd mariadb-columnstore-data-adapters/kettle-columnstore-bulk-exporter-plugin
gradlew.bat -b "build_win.gradle" -Pversion=${VERSION} -PmcsapiRuntimeLibrary=${MCSAPI_RUNTIME_LIBRARY} -PmcsapiLibxml2RuntimeLibrary=${MCSAPI_LIBXML2_RUNTIME_LIBRARY} -PmcsapiLibiconvRuntimeLibrary=${MCSAPI_LIBICONV_RUNTIME_LIBRARY} -PmcsapiLibuvRuntimeLibrary=${MCSAPI_LIBUV_RUNTIME_LIBRARY} -PjavamcsapiLibraryPath=${JAVA_MCSAPI_LIBRARY_PATH} -PjavamcsapiRuntimeLibrary=${JAVA_MCSAPI_RUNTIME_LIBRARY} plugin

NOTES:

  • You have to substitute all variables according to your mcsapi installation. It is probably easier to built the PDI plugin through cmake from the top level directory.
  • The generated plugin's archive's name doesn't contain release and OS information if build manually and not through cmake.

Installation of the plugin in PDI / Kettle

Following steps are necessary to install the ColumnStore bulk loader plugin.

  1. build the plugin from source or download it from our website
  2. extract the archive mariadb-columnstore-kettle-bulk-exporter-plugin-*.zip into your PDI installation directory $PDI-INSTALLATION/plugins.
  3. copy MariaDB's JDBC Client mariadb-java-client-2.2.x.jar into PDI's lib directory $PDI-INSTALLATION/lib.
  4. install the additional library dependencies

Ubuntu dependencies

sudo apt-get install libuv1

CentOS dependencies

sudo yum install epel-release
sudo yum install libuv

Windows 10 dependencies

The Visual C++ Redistributable for Visual Studio 2015 (x64) is required to use the Bulk Write SDK.

Configuration

By default the plugin tries to use ColumnStore's default configuration /usr/local/mariadb/columnstore/etc/Columnstore.xml to connect to the ColumnStore instance through the Bulk Write SDK.

Individual configurations can be assigned within each block.

Information on how to change the Columnstore.xml configuration file to connect to remote ColumnStore instances can be found in our Knowledge Base.

Testing

All continious integration test jobs are in the test directory and can be run through the regression suite, loaded manually into kettle or be executed through the test scripts.

On Linux the test script can be manually invoked through:

./test/test.sh [path_to_the_pdi_connector_to_test] [-v]

On Windows through:

powershell -File .\test\test.ps1 [-csPdiPlugin path_to_the_pdi_connector_to_test]

The test script will download PDI 7.1 and 8.1, install the built plugin and MariaDB JDBC driver, and execute the tests residing in the tests sub-directories.

You might have to change the database connection properties set in job.parameter or job.parameter.win, according to your ColumnStore setup.

On Windows 10 the default test configuration uses the environment variables MCSAPI_CS_TEST_IP, MCSAPI_CS_TEST_PASSWORD, MCSAPI_CS_TEST_USER, and COLUMNSTORE_INSTALL_DIR.

By default the test scripts use the built Kettle Columnstore plugin build/distributions/mariadb-columnstore-kettle-bulk-exporter-plugin-*.zip.
A specific Kettle Columnstore plugin can be specified as optional command line argument.

all-datatype-ingestion-test

This job runs a basic ingestion test of all datatypes into ColumnStore and InnoDB tables and compares the results.

csv-ingestion-test

Ingests two csv files into ColumnStore and checks if the count of injected rows matches the line count of the csv files. Possible to adapt the number of ingestion loops to run in job.parameter.

Limitations

The plugin currently can't handle blob datatypes and only supports multi inputs to one block if the input field names are equal for all input sources.