Skip to content

Commit

Permalink
Merge branch 'apache:dev' into flink-cdc
Browse files Browse the repository at this point in the history
  • Loading branch information
Carl-Zhou-CN authored Aug 14, 2023
2 parents e0cda53 + f188039 commit 08049cf
Show file tree
Hide file tree
Showing 58 changed files with 1,654 additions and 557 deletions.
2 changes: 0 additions & 2 deletions .github/workflows/backend.yml
Original file line number Diff line number Diff line change
Expand Up @@ -22,8 +22,6 @@ on:
branches:
- dev
paths-ignore:
- 'docs/**'
- '**/*.md'
- 'seatunnel-ui/**'

concurrency:
Expand Down
2 changes: 0 additions & 2 deletions config/seatunnel.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -27,8 +27,6 @@ seatunnel:
checkpoint:
interval: 10000
timeout: 60000
max-concurrent: 1
tolerable-failure: 2
storage:
type: hdfs
max-retained: 3
Expand Down
47 changes: 47 additions & 0 deletions docs/en/connector-v2/formats/kafka-compatible-kafkaconnect-json.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,47 @@
# Kafka source compatible kafka-connect-json

Seatunnel connector kafka supports parsing data extracted through kafka connect source, especially data extracted from kafka connect jdbc and kafka connect debezium

# How to use

## Kafka output to mysql

```bash
env {
execution.parallelism = 1
job.mode = "BATCH"
}

source {
Kafka {
bootstrap.servers = "localhost:9092"
topic = "jdbc_source_record"
result_table_name = "kafka_table"
start_mode = earliest
schema = {
fields {
id = "int"
name = "string"
description = "string"
weight = "string"
}
},
format = COMPATIBLE_KAFKA_CONNECT_JSON
}
}


sink {
Jdbc {
driver = com.mysql.cj.jdbc.Driver
url = "jdbc:mysql://localhost:3306/seatunnel"
user = st_user
password = seatunnel
generate_sink_sql = true
database = seatunnel
table = jdbc_sink
primary_keys = ["id"]
}
}
```

5 changes: 5 additions & 0 deletions docs/en/connector-v2/sink/Redis.md
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,7 @@ Used to write data to Redis.
| mode | string | no | single |
| nodes | list | yes when mode=cluster | - |
| format | string | no | json |
| expire | long | no | -1 |
| common-options | | no | - |

### host [string]
Expand Down Expand Up @@ -120,6 +121,10 @@ Connector will generate data as the following and write it to redis:

```

### expire [long]

Set redis expiration time, the unit is second. The default value is -1, keys do not automatically expire by default.

### common options

Sink plugin common parameters, please refer to [Sink Common Options](common-options.md) for details
Expand Down
241 changes: 176 additions & 65 deletions docs/en/connector-v2/sink/S3File.md

Large diffs are not rendered by default.

215 changes: 101 additions & 114 deletions docs/en/connector-v2/source/MyHours.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,11 +2,13 @@

> My Hours source connector
## Description
## Support Those Engines

Used to read data from My Hours.
> Spark<br/>
> Flink<br/>
> SeaTunnel Zeta<br/>
## Key features
## Key Features

- [x] [batch](../../concept/connector-v2-features.md)
- [ ] [stream](../../concept/connector-v2-features.md)
Expand All @@ -15,71 +17,103 @@ Used to read data from My Hours.
- [ ] [parallelism](../../concept/connector-v2-features.md)
- [ ] [support user-defined split](../../concept/connector-v2-features.md)

## Options

| name | type | required | default value |
|-----------------------------|---------|----------|---------------|
| url | String | Yes | - |
| email | String | Yes | - |
| password | String | Yes | - |
| method | String | No | get |
| schema | Config | No | - |
| schema.fields | Config | No | - |
| format | String | No | json |
| params | Map | No | - |
| body | String | No | - |
| json_field | Config | No | - |
| content_json | String | No | - |
| poll_interval_ms | int | No | - |
| retry | int | No | - |
| retry_backoff_multiplier_ms | int | No | 100 |
| retry_backoff_max_ms | int | No | 10000 |
| enable_multi_lines | boolean | No | false |
| common-options | config | No | - |

### url [String]

http request url

### email [String]

email for login

### password [String]

password for login

### method [String]

http request method, only supports GET, POST method

### params [Map]

http params

### body [String]

http body

### poll_interval_ms [int]
## Description

request http api interval(millis) in stream mode
Used to read data from My Hours.

### retry [int]
## Key features

The max retry times if request http return to `IOException`
- [x] [batch](../../concept/connector-v2-features.md)
- [ ] [stream](../../concept/connector-v2-features.md)
- [ ] [exactly-once](../../concept/connector-v2-features.md)
- [ ] [column projection](../../concept/connector-v2-features.md)
- [ ] [parallelism](../../concept/connector-v2-features.md)
- [ ] [support user-defined split](../../concept/connector-v2-features.md)

### retry_backoff_multiplier_ms [int]
## Supported DataSource Info

In order to use the My Hours connector, the following dependencies are required.
They can be downloaded via install-plugin.sh or from the Maven central repository.

| Datasource | Supported Versions | Dependency |
|------------|--------------------|---------------------------------------------------------------------------------------------|
| My Hours | universal | [Download](https://mvnrepository.com/artifact/org.apache.seatunnel/seatunnel-connectors-v2) |

## Source Options

| Name | Type | Required | Default | Description |
|-----------------------------|---------|----------|---------|--------------------------------------------------------------------------------------------------------------------------------------|
| url | String | Yes | - | Http request url. |
| email | String | Yes | - | My hours login email address. |
| password | String | Yes | - | My hours login password. |
| schema | Config | No | - | Http and seatunnel data structure mapping |
| schema.fields | Config | No | - | The schema fields of upstream data |
| json_field | Config | No | - | This parameter helps you configure the schema,so this parameter must be used with schema. |
| content_json | String | No | - | This parameter can get some json data.If you only need the data in the 'book' section, configure `content_field = "$.store.book.*"`. |
| format | String | No | json | The format of upstream data, now only support `json` `text`, default `json`. |
| method | String | No | get | Http request method, only supports GET, POST method. |
| headers | Map | No | - | Http headers. |
| params | Map | No | - | Http params. |
| body | String | No | - | Http body. |
| poll_interval_ms | Int | No | - | Request http api interval(millis) in stream mode. |
| retry | Int | No | - | The max retry times if request http return to `IOException`. |
| retry_backoff_multiplier_ms | Int | No | 100 | The retry-backoff times(millis) multiplier if request http failed. |
| retry_backoff_max_ms | Int | No | 10000 | The maximum retry-backoff times(millis) if request http failed |
| enable_multi_lines | Boolean | No | false | |
| common-options | | No | - | Source plugin common parameters, please refer to [Source Common Options](common-options.md) for details |

## How to Create a My Hours Data Synchronization Jobs

The retry-backoff times(millis) multiplier if request http failed
```hocon
env {
execution.parallelism = 1
job.mode = "BATCH"
}
### retry_backoff_max_ms [int]
MyHours{
url = "https://api2.myhours.com/api/Projects/getAll"
email = "[email protected]"
password = "seatunnel"
schema {
fields {
name = string
archived = boolean
dateArchived = string
dateCreated = string
clientName = string
budgetAlertPercent = string
budgetType = int
totalTimeLogged = double
budgetValue = double
totalAmount = double
totalExpense = double
laborCost = double
totalCost = double
billableTimeLogged = double
totalBillableAmount = double
billable = boolean
roundType = int
roundInterval = int
budgetSpentPercentage = double
budgetTarget = int
budgetPeriodType = string
budgetSpent = string
id = string
}
}
}
The maximum retry-backoff times(millis) if request http failed
# Console printing of the read data
sink {
Console {
parallelism = 1
}
}
```

### format [String]
## Parameter Interpretation

the format of upstream data, now only support `json` `text`, default `json`.
### format

when you assign format is `json`, you should also assign schema option, for example:

Expand All @@ -98,11 +132,11 @@ you should assign schema as the following:
```hocon
schema {
fields {
code = int
data = string
success = boolean
}
fields {
code = int
data = string
success = boolean
}
}
```
Expand Down Expand Up @@ -131,13 +165,7 @@ connector will generate data as the following:
|----------------------------------------------------------|
| {"code": 200, "data": "get success", "success": true} |

### schema [Config]

#### fields [Config]

the schema fields of upstream data

### content_json [String]
### content_json

This parameter can get some json data.If you only need the data in the 'book' section, configure `content_field = "$.store.book.*"`.

Expand Down Expand Up @@ -212,14 +240,14 @@ Here is an example:
- Test data can be found at this link [mockserver-config.json](../../../../seatunnel-e2e/seatunnel-connector-v2-e2e/connector-http-e2e/src/test/resources/mockserver-config.json)
- See this link for task configuration [http_contentjson_to_assert.conf](../../../../seatunnel-e2e/seatunnel-connector-v2-e2e/connector-http-e2e/src/test/resources/http_contentjson_to_assert.conf).

### json_field [Config]
### json_field

This parameter helps you configure the schema,so this parameter must be used with schema.

If your data looks something like this:

```json
{
{
"store": {
"book": [
{
Expand Down Expand Up @@ -273,47 +301,6 @@ source {
- Test data can be found at this link [mockserver-config.json](../../../../seatunnel-e2e/seatunnel-connector-v2-e2e/connector-http-e2e/src/test/resources/mockserver-config.json)
- See this link for task configuration [http_jsonpath_to_assert.conf](../../../../seatunnel-e2e/seatunnel-connector-v2-e2e/connector-http-e2e/src/test/resources/http_jsonpath_to_assert.conf).

### common options

Source plugin common parameters, please refer to [Source Common Options](common-options.md) for details

## Example

```hocon
MyHours{
url = "https://api2.myhours.com/api/Projects/getAll"
email = "[email protected]"
password = "seatunnel"
schema {
fields {
name = string
archived = boolean
dateArchived = string
dateCreated = string
clientName = string
budgetAlertPercent = string
budgetType = int
totalTimeLogged = double
budgetValue = double
totalAmount = double
totalExpense = double
laborCost = double
totalCost = double
billableTimeLogged = double
totalBillableAmount = double
billable = boolean
roundType = int
roundInterval = int
budgetSpentPercentage = double
budgetTarget = int
budgetPeriodType = string
budgetSpent = string
id = string
}
}
}
```

## Changelog

### next version
Expand Down
Loading

0 comments on commit 08049cf

Please sign in to comment.