Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The new 'Airflow Pipeline Editor' feature is not working properly. DAGs created with this function will be recognized as errors by 'Airflow'. #2124

Open
nanaones opened this issue Sep 10, 2021 · 3 comments · May be fixed by #3208
Assignees
Labels
component:pipeline-editor pipeline editor component:pipeline-runtime issues related to pipeline runtimes e.g. kubeflow pipelines feedback:user help wanted Extra attention is needed inactive:invalid This doesn't seem right platform: pipeline-Airflow Related to usage of Apache Airflow as pipeline runtime

Comments

@nanaones
Copy link

Describe the issue
Airflow Pipeline Generator generates malformed DAG files.

To Reproduce
Steps to reproduce the behavior:

  1. Go to 'Airflow pipeline Editor on Launcher'
  2. Drag 'BashOperator on template'
  3. Fill bash_command on Node properties 'echo hello-world'
  4. Click Run pipeline on Apache Airflow
  5. Fill 'Pipeline Name' to 'hello-world'

Screenshots or log output
If applicable, add screenshots or log output to help explain your problem.

Broken DAG: [/opt/airflow/dags/repo/hello-world-0909015623.py] Traceback (most recent call last): File "/home/airflow/.local/lib/python3.6/site-packages/airflow/utils/decorators.py", line 94, in wrapper result = func(*args, **kwargs) File "/home/airflow/.local/lib/python3.6/site-packages/airflow/models/baseoperator.py", line 414, in __init__ "arguments were:\n**kwargs: {k}".format(c=self.__class__.__name__, k=kwargs, t=task_id), airflow.exceptions.AirflowException: Invalid arguments were passed to BashOperator (task_id: BashOperator). Invalid arguments were: **kwargs: {'namespace': 'airflow', 'xcom_push': False, 'inputs': [], 'outputs': [], 'secrets': [Secret(env, AWS_ACCESS_KEY_ID, elyra, AWS_ACCESS_KEY_ID), Secret(env, AWS_SECRET_ACCESS_KEY, elyra, AWS_SECRET_ACCESS_KEY)], 'in_cluster': True, 'config_file': 'None'}

Expected behavior
The hello-world DAG performs the 'echo hello' behavior.

Deployment information
Describe what you've deployed and how:

  • Elyra version: 3.0.1
  • Operating system: windows 10
  • Installation source: Pip install
  • Deployment type: EKS(Kubernetes v 1.21)

Pipeline runtime environment
If the issue is related to pipeline execution, identify the environment where the pipeline is executed

  • Apache Airflow Version 2.0.2

The same goes for EmailOperators.
I checked because the generated dags passing arguments are based on k8soperator.

@nanaones nanaones added component:pipeline-editor pipeline editor component:pipeline-runtime issues related to pipeline runtimes e.g. kubeflow pipelines feedback:user status:Needs Triage labels Sep 10, 2021
@lresende lresende added this to the 3.1.1 milestone Sep 10, 2021
@ptitzler
Copy link
Member

Unfortunately Airflow 2.x is not yet supported. https://elyra.readthedocs.io/en/stable/recipes/configure-airflow-as-a-runtime.html

@shalberd
Copy link
Contributor

shalberd commented Jan 12, 2024

@nanaones unsure if airflow package catalog connector was a feature in 2021 already, it is now ... still, even on package catalog connector initial setup and import, i.e. from

https://archive.apache.org/dist/airflow/2.6.2/apache_airflow-2.6.2-py3-none-any.whl

the message is as follows in Elyra:

I 2024-01-05 10:52:38.508 ElyraApp] Analysis of 'https://archive.apache.org/dist/airflow/2.6.2/apache_airflow-2.6.2-py3-none-any.whl' completed. Located 9 operator classes in 4 Python scripts.
[W 2024-01-05 10:52:38.521 ServerApp] Operator 'BranchPythonOperator' associated with identifier '{'airflow_package': 'apache_airflow-2.6.2-py3-none-any.whl', 'file': 'airflow/operators/python.py'}' does not have an __init__ function. Skipping...

package catalog connector code needs fixes, will make a PR according to a fork by @ianonavy for package catalog connector, so that bash operator and in general the provided base airflow operators work

work done in fork outside community elyra but never tested and discussed for far here

should actually be ok for the most part, but BashOperator does not show up.

Bildschirmfoto 2024-01-12 um 17 48 41

@lresende @romeokienzler I got an error message when importing the package catalog connector airflow wheel file leading to non-import of two operators (one of them bash ...) ... @nanaones I will investigate and make an additional PR for https://github.com/elyra-ai/elyra/blob/main/elyra/pipeline/airflow/package_catalog_connector/airflow_package_catalog_connector.py#L191 to bring in that change from @@ianonavy fork. That'll fix this issue I am quite sure. BaseOperator reference needs to be compatible with Airflow 2.x.

https://airflow.apache.org/docs/apache-airflow/2.6.2/operators-and-hooks-ref.html

after I do the changes, I get a different log on Elyra start when evaluating the wheel file, looking much better, more operator classes (16 instead of 9) detected.

[I 2024-01-12 22:25:00.524 ElyraApp] Analysis of ''https://archive.apache.org/dist/airflow/2.6.2/apache_airflow-2.6.2-py3-none-any.whl'' completed. Located 16 operator classes in 11 Python scripts.
[W 2024-01-12 22:25:00.568 ServerApp] Operator 'BaseBranchOperator' associated with identifier '{'airflow_package': 'apache_airflow-2.6.2-py3-none-any.whl', 'file': 'airflow/operators/branch.py'}' does not have an __init__ function. Skipping...
[W 2024-01-12 22:25:00.571 ServerApp] Operator 'EmptyOperator' associated with identifier '{'airflow_package': 'apache_airflow-2.6.2-py3-none-any.whl', 'file': 'airflow/operators/empty.py'}' does not have an __init__ function. Skipping...
[W 2024-01-12 22:25:00.587 ServerApp] Operator 'BranchPythonOperator' associated with identifier '{'airflow_package': 'apache_airflow-2.6.2-py3-none-any.whl', 'file': 'airflow/operators/python.py'}' does not have an __init__ function. Skipping...
[W 2024-01-12 22:25:00.592 ServerApp] Operator 'LatestOnlyOperator' associated with identifier '{'airflow_package': 'apache_airflow-2.6.2-py3-none-any.whl', 'file': 'airflow/operators/latest_only.py'}' does not have an __init__ function. Skipping...

let's check the GUI:

I can now for example see and use the BashOperator, with Airflow 2.x.

Bildschirmfoto 2024-01-12 um 23 31 57

Looking good, now need to see about sensors (airflow.hooks.base, airflow.sensors.base). However, the palette listing on the left is just for operators, not sensors, so should be fine.

@shalberd
Copy link
Contributor

shalberd commented Jan 17, 2024

@MR-GOYAL @lresende @ianonavy @thesuperzapper for Airflow 2.x package catalog connector / wheel file, it is not enough to just change the BaseOperator class for correct operator import as in the linked WIP PR here.
When the properties of airflow components are added, the parsing of the properties and its init fields needs to be changed, too, at https://github.com/elyra-ai/elyra/blob/main/elyra/pipeline/airflow/component_parser_airflow.py#L203, among others.

Why? Because the format changed (example bash operator) between

Airflow 1.10, for example

https://github.com/apache/airflow/blob/v1-10-stable/airflow/operators/bash_operator.py#L43
https://github.com/apache/airflow/blob/v1-10-stable/airflow/operators/bash_operator.py#L93

and Airflow current

https://github.com/apache/airflow/blob/main/airflow/operators/bash.py#L48
https://github.com/apache/airflow/blob/main/airflow/operators/bash.py#L138

Things like None handling, required properties in the operators etc. I'll make sure to cover and test this in my PR.
I found out, for example, that the cwd field of BashOperator currently fillls in empty strings when no textbox input in Elyra node editor for that property is done, when in fact it should be None ...

raise AirflowException(f"Can not find the cwd: {self.cwd}")
airflow.exceptions.AirflowException: Can not find the cwd:

https://github.com/apache/airflow/blob/main/airflow/operators/bash.py#L216

getting the general direction, WIP, for example, with the changes from PR 3167 and the recognition fix already in the PR linked to this issue here, BashOperator gets executed correctly, properties assembly and parsing needs some more work, though, as hinted toward with cwd property of BashOperator.

Bildschirmfoto 2024-01-16 um 19 18 22

Bildschirmfoto 2024-01-16 um 19 21 25

@shalberd shalberd added the help wanted Extra attention is needed label Dec 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
component:pipeline-editor pipeline editor component:pipeline-runtime issues related to pipeline runtimes e.g. kubeflow pipelines feedback:user help wanted Extra attention is needed inactive:invalid This doesn't seem right platform: pipeline-Airflow Related to usage of Apache Airflow as pipeline runtime
Projects
None yet
5 participants