-
Notifications
You must be signed in to change notification settings - Fork 78
MarkLogic Content Pump (mlcp) and Gradle
The MlcpTask
class allows you to invoke MarkLogic's Content Pump tool (mlcp) via a Gradle task.
One benefit of using MlcpTask
vs JavaExec
is that MlcpTask
will use the mlHost/(mlUsername or mlRestAdminUsername)/(mlPassword or mlRestAdminPassword) properties by default, which are defined in the mlAppConfig
instance that ml-gradle instantiates in Gradle. Another benefit is you don't need to download mlcp and put the executable in your path - you can run this from anywhere, as all of mlcp's libraries are downloaded via Gradle. That's also handy for something like running mlcp on a Jenkins CI server.
MlcpTask
also provides task properties for most of mlcp's command-line arguments. These are just syntactic sugar - since MlcpTask
extends JavaExec
, you can always pass properties through JavaExec's "args" property.
Note that you don't need to use MlcpTask
either to use mlcp - just use JavaExec
, and configure all of the command line arguments yourself. In particular, if you are using an MLCP options file to specify arguments for MLCP, the syntactic sugar provided by MlcpTask
won't be of any help.
The behavior of Gradle's JavaExec
task changed between Gradle 6.3 and 6.4 such that if you wish to use MlcpTask
in ml-gradle 4.3.2 or higher, you must use at least Gradle 6.4.
If you are using Gradle 7.0 or higher, you must use ml-gradle 4.3.1 or higher.
If you are using ml-gradle 4.2.x or older, it is recommended to use at least Gradle 6, but Gradle 5 and possibly Gradle 4 may work as well.
Below is an example of using MlcpTask
and pulling in the mlcp dependencies - see the mlcp-project build file for a more complete example, which shows both import and export tasks:
plugins {
id "com.marklogic.ml-gradle" version "5.0.0"
}
repositories {
mavenCentral()
// This MarkLogic-specific repository is only needed for older versions of MLCP. If you receive an error that Gradle
// cannot download version "1.5.2-marklogic" of the "commons-csv" dependency, then add this. Otherwise, it can be omitted.
// maven { url "https://developer.marklogic.com/maven2/" }
}
configurations {
mlcp {
// MLCP 11.1.0 and higher requires this modification.
attributes {
attribute(TargetJvmEnvironment.TARGET_JVM_ENVIRONMENT_ATTRIBUTE, objects.named(TargetJvmEnvironment.class, TargetJvmEnvironment.STANDARD_JVM))
}
}
}
dependencies {
mlcp "com.marklogic:mlcp:11.3.0"
}
task sample(type: com.marklogic.gradle.task.MlcpTask) {
classpath = configurations.mlcp
command = "IMPORT"
database = "my-database"
input_file_path = "my-input-file.txt"
input_file_type = "delimited_text"
output_collections = "my-collection"
// Can also override the default properties
// username = "some-other-username"
etc...
}
If you need to pass any arguments to MLCP that are not yet present as parameters in MlcpTask
, simply use the args
parameter that MlcpTask
inherits from Gradle's JavaExec
task:
args = ["-ssl_protocol", "TLSv1.2"]
You can specify any number of arguments this way.
See Dynamically creating tasks for tips on reducing duplication across many MLCP tasks.
When you depend on MLCP via a dependency, you don't get a default logging configuration file like you do in the MLCP zip file. And thus, you won't get any logging from MLCP.
You can fix this by adding the following to your build.gradle
file:
dependencies {
mlcp 'com.marklogic:mlcp:11.3.0'
mlcp 'ch.qos.logback:logback-classic:1.3.14'
mlcp files('lib')
}
You can then add a logback.xml
file to the ./lib
directory to configure MLCP logging - for example:
<configuration>
<statusListener class="ch.qos.logback.core.status.NopStatusListener"/>
<appender name="STDOUT" class="ch.qos.logback.core.ConsoleAppender">
<encoder>
<pattern>%d{HH:mm:ss.SSS} [%thread] %-5level %logger{36} - %msg%n</pattern>
</encoder>
</appender>
<root level="WARN">
<appender-ref ref="STDOUT"/>
</root>
<logger name="com.marklogic" level="INFO" additivity="false">
<appender-ref ref="STDOUT"/>
</logger>
</configuration>
If you execute an instance of MlcpTask
with Gradle's info or debug logging enabled, all of the arguments passed to MLCP - including passwords - will be logged via the JavaExec
parent class. To avoid this, choose one of the following options:
- Don't use info or debug logging when running an instance of
MlcpTask
, or any instance ofJavaExec
where passwords are passed as plaintext. - Use an MLCP options file - in which case you should just use
JavaExec
so that you do not inherit what will be the unwanted behavior whereMlcpTask
automatically sets a password based onmlRestAdminPassword
.
Note that if neither info or debug logging is enabled, MlcpTask
will print all of the non-password arguments passed to it.
Be aware that MlcpTask
defaults to using port 8000. IF you specify a transform parameter in your MlcpTask
, then you will need to set the "port" parameter to that of your XDBC server or REST server that supports XDBC requests.
New in ml-gradle 2.6.0 - you can set the logOutputUri
parameter to define a URI for mlcp log output to be written to:
task sample(type: com.marklogic.gradle.task.MlcpTask) {
...
logOutputUri = "/mlcp-output.txt"
}
And new in 3.12.0 - you can provide a custom DatabaseClient
to control what database the log output is written to (it defaults to mlAppConfig.newDatabaseClient()
):
task sample(type: com.marklogic.gradle.task.MlcpTask) {
...
logOutputUri = "/mlcp-output.txt"
logClient = mlAppConfig.newModulesDatabaseClient() // Just notional - reference or construct any DatabaseClient you want
}
When running mlcp via Gradle on Windows, you're likely to see the following message logged:
java.io.IOException: Could not locate executable null\bin\winutils.exe in the Hadoop binaries.
It reads as an exception, but unless you're using certain Hadoop-based features within mlcp, you can safely ignore this. If MLCP is instead throwing an error later on, you likely should use the MLCP standalone distribution instead of using MlcpTask
.
You can also suppress the message by performing the following steps:
- Create a dummy lib\bin\winutils.exe file in your project
- Add the following to your task that extends
MlcpTask
:
systemProperties = ["hadoop.home.dir" : "$project.rootDir/lib"]