[WIP] Airflow package catalog connector for Airflow 2.x wheel, make import of all core operators possible, parsing changes #3208
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
fixes #2124
@nanaones unsure if airflow package catalog connector was a feature in 2021 already, it is now ... still, even on package catalog connector initial setup and import, i.e. from
https://archive.apache.org/dist/airflow/2.6.2/apache_airflow-2.6.2-py3-none-any.whl
when importing the wheel file containing the airflow 2.x core operators, import is incomplete. For example, the BashOperator and Email Operator are missing. Looked like the detection logic needed fixes, something that @ianonavy fixed in a fork.
the message is as follows in Elyra container:
package catalog connector code needs fixes, this PR is based on work in a fork by @ianonavy for package catalog connector, so that bash operator and in general the provided base airflow operators work.
Concept behind the airflow package catalog connector in elyra main source: https://github.com/elyra-ai/elyra/tree/main/elyra/pipeline/airflow/package_catalog_connector
work done in fork outside community elyra but never tested and discussed for far here
I am including this change here so it makes it into community Elyra.
Cause as it is now, only an incomplete subset of operators is made available by the airflow package catalog connector with Airflow 2.x wheel file.
The fix in this PR commit, BaseOperator reference location being compatible with Airflow 2.x
https://airflow.apache.org/docs/apache-airflow/2.6.2/_api/airflow/models/baseoperator/index.html#
It's been in this new location ranging all the way back to 2.0.0.
https://airflow.apache.org/docs/apache-airflow/2.0.0/_api/airflow/models/baseoperator/index.html
change in this PR leads to all operators being available finally in Elyra pipeline editor:
https://airflow.apache.org/docs/apache-airflow/2.6.2/operators-and-hooks-ref.html
after I do the changes, I get a different log on Elyra start when evaluating the wheel file, looking much better, more operator classes (16 instead of 9) detected.
The 3 operators now not imported and usable are ones that are pipeline-related, but not via the mechanisms offered by KubernetesPodOperator and IBM pipelines, so they and should be skipped from a use case perspective, and also because they init themselves not on their own. Besides that, the EmptyOperator is kind of a placeholder dummy operator anyways, no functionality.
let's check the GUI after the wheel file import and the change in this PR
I can now for example see and use the BashOperator, with Airflow 2.x.
What changes were proposed in this pull request?
no code refactoring, just a small operator detection logic change for Airflow 2.0.0 and higher
It's been in this new location ranging all the way back to 2.0.0.
https://airflow.apache.org/docs/apache-airflow/2.0.0/_api/airflow/models/baseoperator/index.html
Add package connector support for Airflow 2.x The check for subclasses of BaseOperator uses an outdated package name. This commit and PR adds the new one from Airflow 2.
How was this pull request tested?
no changes in any existing Elyra unit tests.
However, flow of importing Airflow 2.x core operators explained above, with result before and after the change in this PR.
Import worked fine, operators visible in left palette of Elyra pipeline editor.
Tested with: Airflow 2.6.2
Developer's Certificate of Origin 1.1