Skip to content

Commit

Permalink
Changing "MarkLogic Spark connector"
Browse files Browse the repository at this point in the history
Depending on the context, changed it to either "MarkLogic connector for Apache Spark", "MarkLogic connector", or just "connector".

The one exception is the config.yml file the docs, where "MarkLogic connector for Apache Spark" did not fit. So using "MarkLogic Apache Spark connector" there instead of just "MarkLogic Spark connector". 
4b45ec
  • Loading branch information
rjrudin committed Jun 20, 2023
1 parent 63df947 commit 0fc8206
Show file tree
Hide file tree
Showing 14 changed files with 26 additions and 27 deletions.
4 changes: 2 additions & 2 deletions CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@ have cloned this repository to your local workstation.

# Do this first!

In order to develop and/or test the MarkLogic Spark connector, or to try out the PySpark instructions below, you first
In order to develop and/or test the connector, or to try out the PySpark instructions below, you first
need to deploy the test application in this project to MarkLogic. You can do so either on your own installation of
MarkLogic, or you can use `docker-compose` to install a 3-node MarkLogic cluster with a load balancer in front of it.

Expand Down Expand Up @@ -75,7 +75,7 @@ You can then run the tests from within the Docker environment via the following

The documentation for this project
[has instructions on using PySpark](https://marklogic.github.io/marklogic-spark-connector/getting-started-pyspark.html)
with the MarkLogic Spark connector. The documentation instructs a user to obtain the connector from this repository's
with the connector. The documentation instructs a user to obtain the connector from this repository's
releases page. For development and testing, you will most likely want to build the connector yourself by running the
following command from the root of this repository:

Expand Down
6 changes: 3 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
# MarkLogic Spark Connector
# MarkLogic Connector for Apache Spark

The MarkLogic Spark connector is an [Apache Spark 3 connector](https://spark.apache.org/docs/latest/) that supports
reading data from and writing data to MarkLogic.
The MarkLogic connector for Apache Spark is an [Apache Spark 3 connector](https://spark.apache.org/docs/latest/) that
supports reading data from and writing data to MarkLogic.

Please see [the User Guide](http://marklogic.github.io/marklogic-spark-connector) for more information.

Expand Down
2 changes: 1 addition & 1 deletion docs/_config.yml
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
title: MarkLogic Spark Connector
title: MarkLogic Apache Spark Connector
remote_theme: just-the-docs/just-the-docs
plugins:
- jekyll-remote-theme
Expand Down
2 changes: 1 addition & 1 deletion docs/configuration.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ title: Configuration Reference
nav_order: 5
---

The MarkLogic Spark connector has 3 sets of configuration options - connection options, reading options, and writing
The MarkLogic connector has 3 sets of configuration options - connection options, reading options, and writing
options. Each set of options is defined in a separate table below.

## Connection options
Expand Down
2 changes: 1 addition & 1 deletion docs/getting-started/getting-started.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ has_children: true
permalink: /docs/getting-started
---

This guide provides instructions on using the MarkLogic Spark connector with multiple popular Spark environments.
This guide provides instructions on using the MarkLogic connector with multiple popular Spark environments.
Before trying the connector in any of these environments, please [follow the instructions in the Setup guide](setup.md)
to obtain the connector and deploy an example application to MarkLogic.

Expand Down
4 changes: 2 additions & 2 deletions docs/getting-started/java.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,12 +5,12 @@ parent: Getting Started
nav_order: 4
---

The MarkLogic Spark connector is published to [Maven Central](https://central.sonatype.com/namespace/com.marklogic) and
The MarkLogic connector is published to [Maven Central](https://central.sonatype.com/namespace/com.marklogic) and
can thus be expressed as a regular dependency of a Java application that also depends on the Spark APIs.

As an example, please see the project configuration in the
[java-dependency example project](https://github.com/marklogic/marklogic-spark-connector/blob/master/examples/java-dependency)
for how to depend on the MarkLogic Spark connector as a library. The `org.example.App` class in the project demonstrates
for how to depend on the MarkLogic connector as a library. The `org.example.App` class in the project demonstrates
a very simple Spark Java program for accessing the data in the application deployed via the [Setup guide](setup.md).

Note - if you are using Java 11 or higher, you may run into a `NoClassDefFoundError` for a class in the `javax.xml.bind`
Expand Down
4 changes: 2 additions & 2 deletions docs/getting-started/jupyter.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ nav_order: 3
---

[Project Jupyter](https://jupyter.org/) provides a set of tools for working with notebooks, code, and data. The
MarkLogic Spark connector can be easily integrated into these tools to allow users to access and analyze data in
MarkLogic connector can be easily integrated into these tools to allow users to access and analyze data in
MarkLogic.

Before going further, be sure you've followed the instructions in the [setup guide](setup.md) for
Expand All @@ -28,7 +28,7 @@ corner of the Notebook interface and select "Python 3 (ipykernel)" to create a n
## Using the connector

In the first cell in the notebook created above, enter the following to allow Jupyter Notebook to access the MarkLogic
Spark connector and also to initialize Spark:
connector and also to initialize Spark:

```
import os
Expand Down
7 changes: 3 additions & 4 deletions docs/getting-started/pyspark.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,8 +6,7 @@ nav_order: 2
---

[PySpark](https://spark.apache.org/docs/latest/api/python/index.html) is a Python API for Spark and an excellent choice
for learning how to use Spark. This guide describes how to install PySpark and use it with the MarkLogic Spark
connector.
for learning how to use Spark. This guide describes how to install PySpark and use it with the MarkLogic connector.

Before going further, be sure you've followed the instructions in the [Getting Started](getting-started.md) guide for
obtaining the connector and deploying an example application to MarkLogic.
Expand All @@ -34,15 +33,15 @@ Run PySpark from the directory that you downloaded the connector to per the [set

The `--jars` command line option is PySpark's method for utilizing Spark connectors. Each Spark environment should have
a similar mechanism for including third party connectors; please see the documentation for your particular Spark
environment. In the example above, the `--jars` option allows for the MarkLogic Spark connector to be used within
environment. In the example above, the `--jars` option allows for the connector to be used within
PySpark.

When PySpark starts, you should see information like this on how to configure logging:

Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).

Setting the default log level to `INFO` or `DEBUG` will show logging from the MarkLogic Spark connector. This will also
Setting the default log level to `INFO` or `DEBUG` will show logging from the MarkLogic connector. This will also
include potentially significant amounts of log messages from PySpark itself.

### Reading data with the connector
Expand Down
2 changes: 1 addition & 1 deletion docs/getting-started/setup.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ environments, as those examples depend on an application being deployed to MarkL

## Obtaining the connector

The MarkLogic Spark connector can be downloaded from
The MarkLogic connector can be downloaded from
[this repository's Releases page](https://github.com/marklogic/marklogic-spark-connector/releases). Each Spark
environment should have documentation on how to include third-party connectors; please consult your Spark
environment's documentation on how to achieve this.
Expand Down
2 changes: 1 addition & 1 deletion docs/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ title: Introduction
nav_order: 1
---

The MarkLogic Spark connector is an [Apache Spark 3 connector](https://spark.apache.org/docs/latest/) that supports
The MarkLogic connector for Apache Spark is an [Apache Spark 3 connector](https://spark.apache.org/docs/latest/) that supports
reading data from and writing data to MarkLogic. Within any Spark 3 environment, the connector enables users to easily
query for data in MarkLogic, manipulate it using widely-known Spark operations, and then write results back to
MarkLogic or disseminate them to another system. Data can also be easily imported into MarkLogic by first reading it
Expand Down
4 changes: 2 additions & 2 deletions docs/reading.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ title: Reading Data
nav_order: 3
---

The MarkLogic Spark connector allows for data to be retrieved from MarkLogic as rows via an
The MarkLogic connector allows for data to be retrieved from MarkLogic as rows via an
[Optic query](https://docs.marklogic.com/guide/app-dev/OpticAPI#id_46710). The
sections below provide more detail on configuring how data is retrieved and converted into a Spark DataFrame.

Expand Down Expand Up @@ -150,7 +150,7 @@ repository.

The Spark connector framework supports pushing down multiple operations to the connector data source. This can
often provide a significant performance boost by allowing the data source to perform the operation, which can result in
both fewer rows returned to Spark and less work for Spark to perform. The MarkLogic Spark connector supports pushing
both fewer rows returned to Spark and less work for Spark to perform. The MarkLogic connector supports pushing
down the following operations to MarkLogic:

- `count`
Expand Down
10 changes: 5 additions & 5 deletions docs/writing.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ title: Writing Data
nav_order: 4
---

The MarkLogic Spark connector allows for writing rows in a Spark DataFrame to MarkLogic as documents.
The MarkLogic connector allows for writing rows in a Spark DataFrame to MarkLogic as documents.
The sections below provide more detail about how this process works and how it can be controlled.

## Basic write operation
Expand Down Expand Up @@ -104,7 +104,7 @@ temporal collection.

## Streaming support

The MarkLogic Spark connector supports
The connector supports
[streaming writes](https://spark.apache.org/docs/latest/structured-streaming-programming-guide.html) to MarkLogic.
The connector configuration does not change; instead, different Spark APIs are used to read a stream of data and
write that stream to MarkLogic.
Expand Down Expand Up @@ -134,7 +134,7 @@ spark.readStream \
.processAllAvailable()
```

The above example will stream the data in the `./data/csv-files/100-employees.csv` file through the MarkLogic Spark
The above example will stream the data in the `./data/csv-files/100-employees.csv` file through the
connector and into MarkLogic. This will result 100 new JSON documents in the `streaming-example` collection.

The ability to stream data into MarkLogic can make Spark an effective tool for obtaining data from a variety of data
Expand All @@ -158,7 +158,7 @@ assist with debugging the cause of the error.

## Tuning performance

The MarkLogic Spark connector uses MarkLogic's
The connector uses MarkLogic's
[Data Movement SDK](https://docs.marklogic.com/guide/java/data-movement) for writing documents to a database. The
following options can be set to adjust how the connector performs when writing data:

Expand All @@ -178,7 +178,7 @@ resource consumption and throughput from Spark to MarkLogic.

Spark supports
[several save modes](https://spark.apache.org/docs/latest/sql-data-sources-load-save-functions.html#save-modes)
when writing data. The MarkLogic Spark connector requires the `append` mode to be used. Because Spark defaults to
when writing data. The MarkLogic connector requires the `append` mode to be used. Because Spark defaults to
the `error` mode, you will need to set this to `append` each time you use the connector to write data.

`append` is the only supported mode due to MarkLogic not having the concept of a single "table" that a document
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -51,7 +51,7 @@
}
],
"source": [
"# Make the MarkLogic Spark connector available to the underlying PySpark application.\n",
"# Make the MarkLogic connector available to the underlying PySpark application.\n",
"import os\n",
"os.environ['PYSPARK_SUBMIT_ARGS'] = '--jars \"marklogic-spark-connector-2.0.0.jar\" pyspark-shell'\n",
"\n",
Expand Down
2 changes: 1 addition & 1 deletion examples/java-dependency/README.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
This project is a simple example of creating a Spark application in Java that depends on the MarkLogic Spark
This project is a simple example of creating a Spark application in Java that depends on the MarkLogic
connector as a normal dependency expressed through Gradle.

Please see the [Java setup guide](https://marklogic.github.io/marklogic-spark-connector/docs/java)
Expand Down

0 comments on commit 0fc8206

Please sign in to comment.