-
Notifications
You must be signed in to change notification settings - Fork 165
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
To support task display_name #1278
base: main
Are you sure you want to change the base?
Changes from 9 commits
8dc46be
889861d
28dc04d
c3a1ee8
9170c3a
49363b0
1ed6e10
946e183
32586dc
218bbc5
fa6b2d9
03076bc
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -62,6 +62,8 @@ class RenderConfig: | |
:param dbt_ls_path: Configures the location of an output of ``dbt ls``. Required when using ``load_method=LoadMode.DBT_LS_FILE``. | ||
:param enable_mock_profile: Allows to enable/disable mocking profile. Enabled by default. Mock profiles are useful for parsing Cosmos DAGs in the CI, but should be disabled to benefit from partial parsing (since Cosmos 1.4). | ||
:param source_rendering_behavior: Determines how source nodes are rendered when using cosmos default source node rendering (ALL, NONE, WITH_TESTS_OR_FRESHNESS). Defaults to "NONE" (since Cosmos 1.6). | ||
:param airflow_vars_to_purge_dbt_ls_cache: Specify Airflow variables that will affect the LoadMode.DBT_LS cache. | ||
:param set_task_id_by_node: A callable that takes a dbt node as input and returns the task ID. This allows users to assign a custom node ID separate from the display name. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We received feedback from end-users that Cosmos already has too many configurations. Would it be possible for us to handle non-ASCII following how other libraries handle this, without enforcing users to define a new configuration? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The reason for not providing users with an option to specify task_id themselves is that automatically generating task_id for names written in non-ASCII characters is a highly challenging task. For example, while slugify (as mentioned in the documentation) can work in some cases, it’s not suitable for use in actual production code. Examples:
In other words, what’s needed is "translation" rather than a mechanical "conversion" like slugify for the sake of automation. To ensure unique task IDs, some form of mapping would likely be necessary. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I understand the concern about having too many options for the end users. However, I believe it would be challenging to automatically generate task_id on behalf of the users. May I please propose adding an option for this purpose? If that’s acceptable, I’d like to work on a revision that minimizes code changes. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Hi @t0momi219, Thanks a lot for explaining these differences. I don't speak Chinese nor Japanese, so your explanation was extremely helpful. I now understand the need Yes, please. It would be amazing if you could make changes to make this customizable without changing the codebase. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. yes, given the explanation I'm inclined to provide & expose it as a configuration. Like that someone who really needs to use it can use. Also +1 to minimising the code change, easier to review & address the scope of a particular issue. Thanks for working on this @t0momi219 |
||
""" | ||
|
||
emit_datasets: bool = True | ||
|
@@ -80,6 +82,7 @@ class RenderConfig: | |
enable_mock_profile: bool = True | ||
source_rendering_behavior: SourceRenderingBehavior = SourceRenderingBehavior.NONE | ||
airflow_vars_to_purge_dbt_ls_cache: list[str] = field(default_factory=list) | ||
set_task_id_by_node: Callable[..., Any] | None = None | ||
|
||
def __post_init__(self, dbt_project_path: str | Path | None) -> None: | ||
if self.env_vars: | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,30 @@ | ||
.. _task-display-name: | ||
|
||
Task display name | ||
================ | ||
|
||
.. note:: | ||
This feature is only available for Airflow >= 2.9. | ||
|
||
In Airflow, `task_id` does not support non-ASCII characters. Therefore, if users wish to use non-ASCII characters (such as their native language) as display names while keeping `task_id` in ASCII, they can use the `display_name` parameter. | ||
|
||
To work with projects that use non-ASCII characters in model names, the `set_task_id_by_node` field of `RenderConfig` can be utilized. | ||
|
||
Example: | ||
|
||
You can provide a function to convert the model name to an ASCII-compatible format. The function’s output is used as the TaskID, while the display name on Airflow remains as the original model name. | ||
|
||
.. code-block:: python | ||
|
||
from slugify import slugify | ||
|
||
|
||
def set_task_id_by_node(node): | ||
return slugify(node.name) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Why not to do this on behalf of the users? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. As I mentioned in my previous comment. |
||
|
||
|
||
from cosmos import DbtTaskGroup, RenderConfig | ||
|
||
jaffle_shop = DbtTaskGroup( | ||
render_config=RenderConfig(set_task_id_by_node=set_task_id_by_node) | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a critical part of the code, and I feel that the bug fix does not justify changing all these lines.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for reviewing this over the weekend. I apologize for the lack of clarity in my pull request.
The purpose of these changes is as follows: