2.3.0
This minor release provides significant new functionality in support of the 1.0.0 release of the new MarkLogic Flux data movement tool. Much of this functionality is documented in the Flux documentation. We will soon have complete documentation of all the new options in this repository's documentation as well.
In the meantime, the new options in this release are listed below.
Read Options
spark.marklogic.read.javascriptFile
andspark.marklogic.read.xqueryFile
allow for custom code to be read from a file path.spark.marklogic.read.partitions.javascriptFile
andspark.marklogic.read.partitions.xqueryFile
allow for custom code to be read from a file path.- Can now read document rows by specifying a list of newline-delimited URIs via the
spark.marklogic.read.documents.uris
option. - Can now read rows containing semantic triples in MarkLogic via
spark.marklogic.read.triples.graphs
,spark.marklogic.read.triples.collections
,spark.marklogic.read.triples.query
,spark.marklogic.read.triples.stringQuery
,spark.marklogic.read.triples.uris
,spark.marklogic.read.triples.directory
,spark.marklogic.read.triples.options
,spark.marklogic.read.triples.filtered
, andspark.marklogic.read.triples.baseIri
. - Can now read Flux and MLCP archives by setting
spark.marklogic.read.files.type
toarchive
ormlcp_archive
. - Can control which categories of metadata are read from Flux archives via
spark.marklogic.read.archives.categories
. - Can now specify the encoding of a file to read via
spark.marklogic.read.files.encoding
. - Can now see progress logged of reading data from MarkLogic via
spark.marklogic.read.logProgress
. - Can specify whether to fail on a file read error via
spark.marklogic.read.files.abortOnFailure
.
Write Options
spark.marklogic.write.threadCount
has been altered to reflect the common user understanding of "number of threads used to connect to MarkLogic". If you need to specify a thread count per partition, usespark.marklogic.write.threadCountPerPartition
.- Can now see progress logged of data written to MarkLogic via
spark.marklogic.write.logProgress
. spark.marklogic.write.javascriptFile
andspark.marklogic.write.xqueryFile
allow for custom code to be read from a file path.- Setting
spark.marklogic.write.archivePathForFailedDocuments
to a file path will result in any failed documents being added to an archive zip file at that file path. spark.marklogic.write.jsonRootName
allows for a root field to be added to a JSON document constructed from an arbitrary row.spark.marklogic.write.xmlRootName
andspark.marklogic.write.xmlNamespace
allow for an XML document to be constructed from an arbitrary row.- Options starting with
spark.marklogic.write.json.
will be used to configure how the connector serializes a Spark row into a JSON object. - Can use
spark.marklogic.write.graph
andspark.marklogic.write.graphOverride
to specify the graph when writing RDF triples to MarkLogic. - Deprecated
spark.marklogic.write.fileRows.documentType
in favor of usingspark.marklogic.write.documentType
to force a document type on documents written to MarkLogic with an extension unrecognized by MarkLogic. - Can use
spark.marklogic.write.files.prettyPrint
to pretty-print JSON and XML files written by the connector. - Can use
spark.marklogic.write.files.encoding
to write files in a different encoding. - Can use
spark.marklogic.write.files.rdf.format
to specify an RDF type when writing triples to RDF files. - Can use
spark.marklogic.write.files.rdf.graph
to specify a graph when writing RDF files.