Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[enhancement](jdbc catalog) Add lowercase column name mapping to Jdbc data source & optimize database and table mapping #27283

Closed
wants to merge 2 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -132,6 +132,13 @@ age number(2),
score number(3,1)
);

CREATE TABLE "DORIS_TEST"."student3"
(
"id" NUMBER(5,0),
"NAME" VARCHAR2(20),
"AGE" NUMBER(2,0),
"SCORE" NUMBER(3,1)
);

create table doris_test.test_all_types (
id int,
Expand Down
2 changes: 2 additions & 0 deletions docker/thirdparties/docker-compose/oracle/init/04-insert.sql
Original file line number Diff line number Diff line change
Expand Up @@ -86,6 +86,8 @@ insert into doris_test."student2" values (2, 'bob', 21, 90.5);
insert into doris_test."student2" values (3, 'jerry', 23, 88.0);
insert into doris_test."student2" values (4, 'andy', 21, 93);

insert into doris_test."student3" values(1, 'doris', 3, 1.0);

insert into doris_test.test_all_types values
(1, 111, 123, 7456123.89, 573, 34, 673.43, 34.1264, 56.2, 23.231,
99, 9999, 999999999, 999999999999999999, 999, 99999, 9999999999, 9999999999999999999,
Expand Down
19 changes: 13 additions & 6 deletions docs/en/docs/lakehouse/multi-catalog/jdbc.md
Original file line number Diff line number Diff line change
Expand Up @@ -52,7 +52,7 @@ PROPERTIES ("key"="value", ...)
| `driver_url ` | Yes | | JDBC Driver Jar |
| `driver_class ` | Yes | | JDBC Driver Class |
| `only_specified_database` | No | "false" | Whether only the database specified to be synchronized. |
| `lower_case_table_names` | No | "false" | Whether to synchronize jdbc external data source table names in lower case. |
| `lower_case_table_names` | No | "false" | Whether to synchronize the database name, table name and column name of jdbc external data source in lowercase. |
| `include_database_list` | No | "" | When only_specified_database=true,only synchronize the specified databases. split with ','. db name is case sensitive. |
| `exclude_database_list` | No | "" | When only_specified_database=true,do not synchronize the specified databases. split with ','. db name is case sensitive. |

Expand All @@ -68,7 +68,7 @@ PROPERTIES ("key"="value", ...)

### Lowercase table name synchronization

When `lower_case_table_names` is set to `true`, Doris is able to query non-lowercase databases and tables by maintaining a mapping of lowercase names to actual names on the remote system
When `lower_case_table_names` is set to `true`, Doris is able to query non-lowercase databases and tables and columns by maintaining a mapping of lowercase names to actual names on the remote system

**Notice:**

Expand All @@ -78,9 +78,9 @@ When `lower_case_table_names` is set to `true`, Doris is able to query non-lower

For other databases, you still need to specify the real library name and table name when querying.

2. In Doris 2.0.3 and later versions, it is valid for all databases. When querying, all library names and table names will be converted into real names and then queried. If you upgrade from an old version to 2.0. 3, `Refresh <catalog_name>` is required to take effect.
2. In Doris 2.0.3 and later versions, it is valid for all databases. When querying, all database names and table names and columns will be converted into real names and then queried. If you upgrade from an old version to 2.0. 3, `Refresh <catalog_name>` is required to take effect.

However, if the database or table names differ only in case, such as `Doris` and `doris`, Doris cannot query them due to ambiguity.
However, if the database or table or column names differ only in case, such as `Doris` and `doris`, Doris cannot query them due to ambiguity.

3. When the FE parameter's `lower_case_table_names` is set to `1` or `2`, the JDBC Catalog's `lower_case_table_names` parameter must be set to `true`. If the FE parameter's `lower_case_table_names` is set to `0`, the JDBC Catalog parameter can be `true` or `false` and defaults to `false`. This ensures consistency and predictability in how Doris handles internal and external table configurations.

Expand Down Expand Up @@ -113,8 +113,8 @@ In some cases, the keywords in the database might be used as the field names. Fo
### Predicate Pushdown

1. When executing a query like `where dt = '2022-01-01'`, Doris can push down these filtering conditions to the external data source, thereby directly excluding data that does not meet the conditions at the data source level, reducing the number of unqualified Necessary data acquisition and transfer. This greatly improves query performance while also reducing the load on external data sources.
2. When `enable_func_pushdown` is set to true, the function condition after where will also be pushed down to the external data source. Currently, only MySQL is supported. If you encounter a function that MySQL does not support, you can set this parameter to false, at present, Doris will automatically identify some functions not supported by MySQL to filter the push-down conditions, which can be checked by explain sql.

2. When `enable_func_pushdown` is set to true, the function conditions after where will also be pushed down to the external data source. Currently, only MySQL and ClickHouse are supported. If you encounter a function that is not supported by MySQL or ClickHouse, you can set this parameter to false. , currently Doris will automatically identify some functions not supported by MySQL and functions supported by CLickHouse for push-down condition filtering, which can be viewed through explain sql.

Functions that are currently not pushed down include:

Expand All @@ -123,6 +123,13 @@ Functions that are currently not pushed down include:
| DATE_TRUNC |
| MONEY_FORMAT |

Functions that are currently pushed down include:

| ClickHouse |
|:--------------:|
| FROM_UNIXTIME |
| UNIX_TIMESTAMP |

### Line Limit

If there is a limit keyword in the query, Doris will translate it into semantics suitable for different data sources.
Expand Down
35 changes: 21 additions & 14 deletions docs/zh-CN/docs/lakehouse/multi-catalog/jdbc.md
Original file line number Diff line number Diff line change
Expand Up @@ -44,16 +44,16 @@ PROPERTIES ("key"="value", ...)

## 参数说明

| 参数 | 必须 | 默认值 | 说明 |
|---------------------------|-----|---------|---------------------------------------------------------------------------------------------|
| `user` | 是 | | 对应数据库的用户名 |
| `password` | 是 | | 对应数据库的密码 |
| `jdbc_url` | 是 | | JDBC 连接串 |
| `driver_url` | 是 | | JDBC Driver Jar 包名称 |
| `driver_class` | 是 | | JDBC Driver Class 名称 |
| `lower_case_table_names` | 否 | "false" | 是否以小写的形式同步jdbc外部数据源的库名和表名 |
| `only_specified_database` | 否 | "false" | 指定是否只同步指定的 database |
| `include_database_list` | 否 | "" | 当only_specified_database=true时,指定同步多个database,以','分隔。db名称是大小写敏感的。 |
| 参数 | 必须 | 默认值 | 说明 |
|---------------------------|-----|---------|-----------------------------------------------------------------------|
| `user` | 是 | | 对应数据库的用户名 |
| `password` | 是 | | 对应数据库的密码 |
| `jdbc_url` | 是 | | JDBC 连接串 |
| `driver_url` | 是 | | JDBC Driver Jar 包名称 |
| `driver_class` | 是 | | JDBC Driver Class 名称 |
| `lower_case_table_names` | 否 | "false" | 是否以小写的形式同步jdbc外部数据源的库名和表名以及列名 |
| `only_specified_database` | 否 | "false" | 指定是否只同步指定的 database |
| `include_database_list` | 否 | "" | 当only_specified_database=true时,指定同步多个database,以','分隔。db名称是大小写敏感的。 |
| `exclude_database_list` | 否 | "" | 当only_specified_database=true时,指定不需要同步的多个database,以','分割。db名称是大小写敏感的。 |

### 驱动包路径
Expand All @@ -68,7 +68,7 @@ PROPERTIES ("key"="value", ...)

### 小写表名同步

当 `lower_case_table_names` 设置为 `true` 时,Doris 通过维护小写名称到远程系统中实际名称的映射,能够查询非小写的数据库和表
当 `lower_case_table_names` 设置为 `true` 时,Doris 通过维护小写名称到远程系统中实际名称的映射,能够查询非小写的数据库和表以及列

**注意:**

Expand All @@ -78,9 +78,9 @@ PROPERTIES ("key"="value", ...)

对于其他数据库,仍需要在查询时指定真实的库名和表名。

2. 在 Doris 2.0.3 及之后的版本,对所有的数据库都有效,在查询时,会将所有的库名和表名转换为真实的名称,再去查询,如果是从老版本升级到 2.0.3 ,需要 `Refresh <catalog_name>` 才能生效。
2. 在 Doris 2.0.3 及之后的版本,对所有的数据库都有效,在查询时,会将所有的库名和表名以及列名转换为真实的名称,再去查询,如果是从老版本升级到 2.0.3 ,需要 `Refresh <catalog_name>` 才能生效。

但是,如果数据库或者表名只有大小写不同,例如 `Doris` 和 `doris`,则 Doris 由于歧义而无法查询它们。
但是,如果库名、表名或列名只有大小写不同,例如 `Doris` 和 `doris`,则 Doris 由于歧义而无法查询它们。

3. 当 FE 参数的 `lower_case_table_names` 设置为 `1` 或 `2` 时,JDBC Catalog 的 `lower_case_table_names` 参数必须设置为 `true`。如果 FE 参数的 `lower_case_table_names` 设置为 `0`,则 JDBC Catalog 的参数可以为 `true` 或 `false`,默认为 `false`。这确保了 Doris 在处理内部和外部表配置时的一致性和可预测性。

Expand Down Expand Up @@ -114,7 +114,7 @@ select * from mysql_catalog.mysql_database.mysql_table where k1 > 1000 and k3 ='

1. 当执行类似于 `where dt = '2022-01-01'` 这样的查询时,Doris 能够将这些过滤条件下推到外部数据源,从而直接在数据源层面排除不符合条件的数据,减少了不必要的数据获取和传输。这大大提高了查询性能,同时也降低了对外部数据源的负载。

2. 当 `enable_func_pushdown` 设置为true,会将 where 之后的函数条件也下推到外部数据源,目前仅支持 MySQL,如遇到 MySQL 不支持的函数,可以将此参数设置为 false,目前 Doris 会自动识别部分 MySQL 不支持的函数进行下推条件过滤,可通过 explain sql 查看。
2. 当 `enable_func_pushdown` 设置为true,会将 where 之后的函数条件也下推到外部数据源,目前仅支持 MySQL 以及 ClickHouse,如遇到 MySQL 或 ClickHouse 不支持的函数,可以将此参数设置为 false,目前 Doris 会自动识别部分 MySQL 不支持的函数以及 CLickHouse 支持的函数进行下推条件过滤,可通过 explain sql 查看。

目前不会下推的函数有:

Expand All @@ -123,6 +123,13 @@ select * from mysql_catalog.mysql_database.mysql_table where k1 > 1000 and k3 ='
| DATE_TRUNC |
| MONEY_FORMAT |

目前会下推的函数有:

| ClickHouse |
|:--------------:|
| FROM_UNIXTIME |
| UNIX_TIMESTAMP |

### 行数限制

如果在查询中带有 limit 关键字,Doris 会将其转译成适合不同数据源的语义。
Expand Down
45 changes: 33 additions & 12 deletions fe/fe-core/src/main/java/org/apache/doris/catalog/JdbcTable.java
Original file line number Diff line number Diff line change
Expand Up @@ -27,6 +27,8 @@
import org.apache.doris.thrift.TTableDescriptor;
import org.apache.doris.thrift.TTableType;

import com.fasterxml.jackson.core.type.TypeReference;
import com.fasterxml.jackson.databind.ObjectMapper;
import com.google.common.base.Strings;
import com.google.common.collect.Maps;
import lombok.Setter;
Expand All @@ -47,9 +49,12 @@
public class JdbcTable extends Table {
private static final Logger LOG = LogManager.getLogger(JdbcTable.class);

private static final ObjectMapper objectMapper = new ObjectMapper();

private static final String TABLE = "table";
private static final String REAL_DATABASE = "real_database";
private static final String REAL_TABLE = "real_table";
private static final String REAL_COLUMNS = "real_columns";
private static final String RESOURCE = "resource";
private static final String TABLE_TYPE = "table_type";
private static final String URL = "jdbc_url";
Expand All @@ -65,6 +70,7 @@ public class JdbcTable extends Table {
// real name only for jdbc catalog
private String realDatabaseName;
private String realTableName;
private Map<String, String> realColumnNames;

private String jdbcTypeName;

Expand Down Expand Up @@ -110,7 +116,7 @@ public String getInsertSql(List<String> insertCols) {
sb.append(getProperRealFullTableName(TABLE_TYPE_MAP.get(getTableTypeName())));
sb.append("(");
List<String> transformedInsertCols = insertCols.stream()
.map(col -> databaseProperName(TABLE_TYPE_MAP.get(getTableTypeName()), col))
.map(col -> getProperRealColumnName(TABLE_TYPE_MAP.get(getTableTypeName()), col))
.collect(Collectors.toList());
sb.append(String.join(",", transformedInsertCols));
sb.append(")");
Expand Down Expand Up @@ -200,6 +206,7 @@ public void write(DataOutput out) throws IOException {
serializeMap.put(CHECK_SUM, checkSum);
serializeMap.put(REAL_DATABASE, realDatabaseName);
serializeMap.put(REAL_TABLE, realTableName);
serializeMap.put(REAL_COLUMNS, objectMapper.writeValueAsString(realColumnNames));

int size = (int) serializeMap.values().stream().filter(v -> {
return v != null;
Expand Down Expand Up @@ -236,6 +243,13 @@ public void readFields(DataInput in) throws IOException {
checkSum = serializeMap.get(CHECK_SUM);
realDatabaseName = serializeMap.get(REAL_DATABASE);
realTableName = serializeMap.get(REAL_TABLE);
String realColumnNamesJson = serializeMap.get(REAL_COLUMNS);
if (realColumnNamesJson != null) {
realColumnNames = objectMapper.readValue(realColumnNamesJson, new TypeReference<Map<String, String>>() {
});
} else {
realColumnNames = Maps.newHashMap();
}
}

public String getResourceName() {
Expand Down Expand Up @@ -263,6 +277,14 @@ public String getProperRealFullTableName(TOdbcTableType tableType) {
}
}

public String getProperRealColumnName(TOdbcTableType tableType, String columnName) {
if (realColumnNames == null || realColumnNames.isEmpty() || !realColumnNames.containsKey(columnName)) {
return databaseProperName(tableType, columnName);
} else {
return properNameWithRealName(tableType, realColumnNames.get(columnName));
}
}

public String getTableTypeName() {
return jdbcTypeName;
}
Expand Down Expand Up @@ -358,14 +380,13 @@ private void validate(Map<String, String> properties) throws DdlException {
* @param wrapEnd The character(s) to be added at the end of each name component.
* @param toUpperCase If true, convert the name to upper case.
* @param toLowerCase If true, convert the name to lower case.
* <p>
* Note: If both toUpperCase and toLowerCase are true, the name will ultimately be converted to lower case.
* <p>
* The name is expected to be in the format of 'schemaName.tableName'. If there is no '.',
* the function will treat the entire string as one name component.
* If there is a '.', the function will treat the string before the first '.' as the schema name
* and the string after the '.' as the table name.
*
* <p>
* Note: If both toUpperCase and toLowerCase are true, the name will ultimately be converted to lower case.
* <p>
* The name is expected to be in the format of 'schemaName.tableName'. If there is no '.',
* the function will treat the entire string as one name component.
* If there is a '.', the function will treat the string before the first '.' as the schema name
* and the string after the '.' as the table name.
* @return The formatted name.
*/
public static String formatName(String name, String wrapStart, String wrapEnd, boolean toUpperCase,
Expand All @@ -386,18 +407,18 @@ public static String formatName(String name, String wrapStart, String wrapEnd, b

/**
* Formats a database name according to the database type.
*
* <p>
* Rules:
* - MYSQL, OCEANBASE: Wrap with backticks (`), case unchanged. Example: mySchema.myTable -> `mySchema.myTable`
* - SQLSERVER: Wrap with square brackets ([]), case unchanged. Example: mySchema.myTable -> [mySchema].[myTable]
* - POSTGRESQL, CLICKHOUSE, TRINO, OCEANBASE_ORACLE, SAP_HANA: Wrap with double quotes ("), case unchanged.
* Example: mySchema.myTable -> "mySchema"."myTable"
* Example: mySchema.myTable -> "mySchema"."myTable"
* - ORACLE: Wrap with double quotes ("), convert to upper case. Example: mySchema.myTable -> "MYSCHEMA"."MYTABLE"
* For other types, the name is returned as is.
*
* @param tableType The database type.
* @param name The name to be formatted, expected in 'schemaName.tableName' format. If no '.', treats entire string
* as one name component. If '.', treats string before first '.' as schema name and after as table name.
* as one name component. If '.', treats string before first '.' as schema name and after as table name.
* @return The formatted name.
*/
public static String databaseProperName(TOdbcTableType tableType, String name) {
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -91,6 +91,8 @@ private JdbcTable toJdbcTable() {
jdbcTable.setRealDatabaseName(((JdbcExternalCatalog) catalog).getJdbcClient().getRealDatabaseName(this.dbName));
jdbcTable.setRealTableName(
((JdbcExternalCatalog) catalog).getJdbcClient().getRealTableName(this.dbName, this.name));
jdbcTable.setRealColumnNames(((JdbcExternalCatalog) catalog).getJdbcClient().getRealColumnNames(this.dbName,
this.name));
jdbcTable.setJdbcTypeName(jdbcCatalog.getDatabaseTypeName());
jdbcTable.setJdbcUrl(jdbcCatalog.getJdbcUrl());
jdbcTable.setJdbcUser(jdbcCatalog.getJdbcUser());
Expand Down
Loading
Loading