v0.13.1
david-leifker
released this
02 Apr 19:40
·
1212 commits
to master
since this release
DataHub Release Notes
User Experience
- Capture and Manage Common Joins between Datasets: Users can now view and manage common join relationships between datasets, making it easier than ever to capture best practices and bespoke join logic. Watch the walkthrough here! 8325
- Head's up: you'll need to enable the
ER_MODEL_RELATIONSHIP_FEATURE_ENABLED
env variable to use this feature!
- Head's up: you'll need to enable the
- Enhanced UI Interactions: Users can now enjoy an improved markdown editor and filter policies by active/inactive statuses, resulting in a more intuitive and manageable interface. 9949, 9958
- Visual Context for Groups: You can now include picture links for groups in the UI, adding a richer visual context and enhancing the navigational experience. 9882
- Improved Error Visibility: The UI now displays error messages related to data size limitations, allowing for better troubleshooting and user experience. 10038
Developer Experience
- Enhanced Kafka Compatibility: Updated client version for Kafka setup ensures better compatibility and functionality for developers. 9962
- Optimized Docker Build: Docker setups now respect pip mirrors, optimizing the build process especially in restricted network environments. 9963
- Advanced Error Handling: New error handling for duplicate class names and improved
fspath
lint error management enhance the code reliability and quality. 9960, 9976 - Latest OpenSearch Image: Incorporation of OpenSearch image version 2.11.0 aligns with the latest stable releases, boosting performance and security. 9984
Metadata Ingestion
- NEW: Dagster Integration: You can now seamlessly ingest your Dagster Pipelines, Jobs, Ops, and lineage into DataHub. 10071
- Expanded Field Classification Support: This release introduces support for field-level classification during ingestion for Redshift, BigQuery, DynamoDB, and SQL Sources. 10013, 10031
- Enhanced Ingestion Capabilities: DataHub now offers stateful ingestion by default, optimizing routines for REST sinks and improving metadata accuracy across diverse sources like dbt and BigQuery. 9934, 10158, 10080
- Better Data Lineage: This release introduced support for Openlineage in service of the Spark Lineage Beta Plugin; additionally, we now support incremental Column-Level Lineage, improving the accuracy of detecting column-level relationships during ingestion.9870, 9967, 10090
- Schema Clarity: New descriptions support for JSON schema arrays and a mechanism to escape special characters in BigQuery table descriptions aid in clearer schema validation and ingestion processes. Databricks ingestion now supports Hive Metastore schemas with special characters. 9757, 9932, 10049
Version Upgrades
- Kafka client and OpenSearch image were updated to the latest versions.
Breaking Changes
This release introduces default settings for stateful ingestion and updates in handling dbt ingestion. For details on all breaking changes, view the full documentation here.
Contributors
MASSIVE shoutout to our contributors!
First-Time Contributors
akarsh991, alexs-101, AvaniSiddhapuraAPT, diegmonti, dushayntAW, filipe-caetano-ovo, HuanjieGuo, jayacryl, k7ragav, kopax-polyconseil, LePuppy, Nelvin73, pinakipb2, poorvi767, rae89, trialiya, valeral.
Repeat Contributors
ANich, shubhamjagtap639, sgomezvillamor, siladitya2, skrydal, sumitappt, Masterchen09, mayurinehate, ngamanda, gaurav2733, githendrik, jayasimhankv.
DataHub Maintainers
anshbansal, asikowitz, chriscollins3456, darnaut, david-leifker, eboneil, ethan-cartwright, gabe-lyons, hsheth2, pedro93, RyanHolstien, treff7es, yoonhyejin.
What's Changed
- bump(kafka-setup): client version bump by @david-leifker in #9962
- feat(ingest): throw codegen error on duplicate class names by @hsheth2 in #9960
- feat(docker): respect pip mirrors with uv by @hsheth2 in #9963
- Openlineage endpoint and Spark Lineage Beta Plugin by @treff7es in #9870
- fix(ingest/json-schema): adding support descriptions for array by @AvaniSiddhapuraAPT in #9757
- fix(ingest/redshift): fix bug in lineage v2 table renames by @hsheth2 in #9967
- feat(ingest): speed up to_obj() and validate() by @hsheth2 in #9969
- feat(ingest): fix fspath lint error by @hsheth2 in #9976
- docs: archive old version before 0.12.0 & fix broken links by @yoonhyejin in #9957
- fix(ui/markdown-editor): arrows change field when editing description… by @gaurav2733 in #9949
- feat(ui/policies): add filter for Active/Inactive/All on policy page by @gaurav2733 in #9958
- feat(ui): add option to add picture link for groups by @akarsh991 in #9882
- feat(ingest): add Looks subtype + stop reemitting browsePathV2 by @hsheth2 in #9978
- fix(ingest/bigquery): escape special characters for table descriptions by @AvaniSiddhapuraAPT in #9932
- feat(ui): add loading spin to access management table by @filipe-caetano-ovo in #9974
- fix(ingestion/fivetran): Fix fivetran get connector jobs bug by @shubhamjagtap639 in #9975
- feat(ingest/dbt): generate CLL for all node types by @hsheth2 in #9964
- chore(search): bump OpenSearch image version to 2.11.0 by @darnaut in #9984
- feat(ingest): enable stateful_ingestion by default for DataHub rest sink by @shubhamjagtap639 in #9934
- feat(ingestion/cli): Adding check option to validate allow/deny and path_specs by @treff7es in #9983
- fix(ingest): only import PathSpec when necessary by @hsheth2 in #9989
- feat(config): add configuration to reprocess UI sourced events by @RyanHolstien in #9988
- feat(pluginRegistry): add configuration to reduce runnable frequency by @RyanHolstien in #9990
- build(react): Fix typescript errors in test files by @sumitappt in #9982
- feat(docs): disable last update timestamps by @hsheth2 in #9987
- feat: add versioned content for 0.12.1 by @yoonhyejin in #9944
- doc: add version 0.13.0 by @yoonhyejin in #9991
- fix: fix mobile view and subtitles on slack/calendar page by @yoonhyejin in #9822
- fix(ingest/redshift): fix stl scan lineage for lineage v2 by @hsheth2 in #9986
- fix(ingest/delta-lake): support parsing nested types correctly by @dushayntAW in #9862
- fix(test): nested domains by @david-leifker in #9993
- fix(ci): refactor build-and-test command by @hsheth2 in #9999
- feat(ingest/snowflake): generate query nodes for snowflake by @mayurinehate in #9966
- fix(ingest/unity): creating group urn in case of group by @dushayntAW in #9951
- fix(ui/left-side-bar): hide data products option in left side bar by @gaurav2733 in #10001
- feat(ingest/redshift): make query generation configurable by @hsheth2 in #10000
- fix(opensearch): Rollover usage events at a file size rather than time-based manner by @darnaut in #10006
- chore(java): bump java dependency versions by @david-leifker in #10009
- ci(react): Update package.json to enable lint check by @sumitappt in #10011
- fix(ui/ingest): trim leading and trailing whitespaces from the text f… by @gaurav2733 in #10012
- fix(policy-backfull): fix policy backfill job by @david-leifker in #10016
- feat(opensearch): support for updating ISM policy used for usage events by @darnaut in #10018
- refactor(react): Provide option to skip importing theme in CustomThemeProvider; rearrange toplevel components by @asikowitz in #9940
- fix(openapi): fix openapi openlineage endpoint by @david-leifker in #10019
- feat(ingest): update sqlglot fork by @hsheth2 in #10022
- feat(ingest/superset): map awsathena platform name to athena by @LePuppy in #10005
- fix(ingest/redshift): patch instead of replace redshift custom properties by @ethan-cartwright in #9293
- fix(ingest/slack): tweak docs for slack source by @hsheth2 in #10007
- fix(ingest): use contextvar for cooperative timeout by @hsheth2 in #10021
- feat(ingest): improve custom package metadata by @hsheth2 in #9985
- feat(docs): build website using swc-loader instead of babel by @hsheth2 in #9977
- feat(ingest): add query formatting to sql aggregator by @hsheth2 in #10025
- feat(ingest): add DataHubGraph.emit_all method by @hsheth2 in #10002
- feat(ingestion): Support for Server-less Redshift by @skrydal in #9998
- fix(ingest/teradata): small teradata improvements by @treff7es in #9953
- feat(ingest): add classification for sql sources by @mayurinehate in #10013
- docs(monitoring): add health check endpoint by @kopax-polyconseil in #10033
- feat(ingest/dbt): capture both raw and compiled code by @hsheth2 in #10026
- fix(ingest/redshift): Temp table lineage fix by @treff7es in #10008
- feat(ingest): utilities for query logs by @hsheth2 in #10036
- docs: add missing api sample docs by @yoonhyejin in #9869
- feat(gms): add aspect name to siblings hook log by @hsheth2 in #10044
- feat(ingest): add classification to bigquery, redshift by @mayurinehate in #10031
- fix(ui/lineage): show data is too large error when limitation exceeds by @gaurav2733 in #10038
- feat(ci): exempt more names from community by @mayurinehate in #10039
- docs: improve versiondropdown design & set docs main to /features by @yoonhyejin in #9994
- fix(ingest/redshift): tweak lineage v2 queries by @hsheth2 in #10045
- chore(aws-msk-iam-auth): bump dependency version by @darnaut in #10063
- feat(lineage): add priority to via node by @RyanHolstien in #10034
- docs(acryl-cloud): notes for 0.2.16 by @anshbansal in #10069
- fix(ingest/unity-catalog): generate sibling and lineage by @dushayntAW in #9894
- fix(ingest): only auto-enable stateful ingestion if pipeline name is set by @hsheth2 in #10075
- feat(ingest/s3): set default spark version by @hsheth2 in #10057
- feat(ingest): better rest emitter error message by @hsheth2 in #10073
- docs(sdk): Update API guide with example for Acryl by @gabe-lyons in #10072
- feat(ingest): check for private import path usages by @hsheth2 in #10059
- feat(ingest): add sql formatter utility by @hsheth2 in #10064
- feat(ingest): refactor LineageConfig class by @hsheth2 in #10074
- feat(ingest/dbt): point dbt assertions at dbt nodes by @hsheth2 in #10055
- feat(dbt): show source and compiled code in the UI by @hsheth2 in #10028
- feat(ui/ingest): ingestion form for Okta and AzureAD by @gaurav2733 in #9829
- Update domains docs to include nested domains by @eboneil in #9890
- fix(ingestion): Handle Redshift string length limit in Serverless mode by @skrydal in #10051
- build(deps): bump follow-redirects from 1.15.4 to 1.15.6 in /docs-website by @dependabot in #10060
- build(deps): bump es5-ext from 0.10.62 to 0.10.63 in /docs-website by @dependabot in #9927
- fix(lineage): fix array out of bounds error by @david-leifker in #10081
- Add owners, tags, glossary terms to dataset yaml loader by @eboneil in #9859
- Add rate limiting to slack source by @eboneil in #10082
- fix(metadata-ingestion)glue connector failure when Optional field Type of PartitionKey is absent for a Table by @siladitya2 in #10052
- feat(redshift): adds flag to skip all external tables by @sgomezvillamor in #10040
- feat(models) : Joins (Datasets) schema, resolvers and UI by @poorvi767 in #8325
- feat(properties) Add upsertStructuredProperties graphql endpoint for assets by @chriscollins3456 in #9906
- Clean up logic for dataset.py yaml loader by @eboneil in #10089
- feat(ingest/dbt): add option to skip sources by @hsheth2 in #10077
- feat(ingest): support incremental column-level lineage by @hsheth2 in #10090
- feat(ingest/powerbi): add chart subtypes by @hsheth2 in #10076
- fix(ingest/metabase): Use connect_uri instead of display_uri to query Metabase API by @diegmonti in #9996
- feat(tableau): ability to force extraction of table/column level linage from SQL queries by @alexs-101 in #9838
- feat(ingest/datahub-gc): gc source to cleanup things by @anshbansal in #10085
- docs(acryl-cloud): fix year in notes from 2023 to 2024 by @anshbansal in #10095
- feeat(openapi): add batch endpoint to v2 using requestbody by @RyanHolstien in #10100
- fix(ingest/dbt): fix config validator for skip_sources_in_lineage by @hsheth2 in #10098
- docs: add gtm tag by @yoonhyejin in #10083
- docs: add doc for assertions & data contracts by @yoonhyejin in #10029
- test(ingest/mssql): use non-ephemeral mapping port by @hsheth2 in #10104
- fix(ingestion/unity-catalog): patch owners and properties by @dushayntAW in #10086
- fix(ingestion/transformer): added new transformer to cleanup suffix/prefix in owner URN by @dushayntAW in #10067
- fix(ui/user-group): add non existent entity page for user by @gaurav2733 in #10004
- fix(resolver): Allow users to add/remove related terms for children glossary terms by @pinakipb2 in #9895
- Increase role member count in listRoles query to 20 from 10 by @jayasimhankv in #10020
- fix(frontend): exclude plugins/frontend/auth/user.props config does not exist warnings from log by @Masterchen09 in #10043
- fix(ui): show dataset display name in browse paths v2 by @Masterchen09 in #10054
- fix(metrics): get fieldName for GraphQL Mutation queries by @trialiya in #9972
- feat(UI): disable access management ui when no roles are linked to entity by @githendrik in #9610
- ci(filters): add graphql code to backend trigger by @david-leifker in #10113
- test(urn): add test case by @david-leifker in #10112
- fix(ui) Add min width to the usage stats component by @chriscollins3456 in #10056
- log(system-update): Update DataHubStartupStep.java by @david-leifker in #9971
- fix(usage-stats): usage-stats error handling and filter by @david-leifker in #10105
- fix(elasticsearch logging): log how long bulk execution took by @darnaut in #10116
- feat(auth): view authorization by @david-leifker in #10066
- fix(searchContext): fix search flag immutability by @david-leifker in #10117
- fix(ingest/looker): use
external_base_url
for explore url generation by @k7ragav in #10093 - feat(ingest/dagster): Dagster source by @treff7es in #10071
- fix(forms) Fix a couple of small inconsistencies with forms by @chriscollins3456 in #9928
- fix: exclude Elasticsearch ignore_throttled warnings from log by @Masterchen09 in #10042
- Update build-and-test.yml by @david-leifker in #10127
- fix(mae-consumer): fix aspect retriever injections mae-consumer by @david-leifker in #10125
- fix(docs): fix docs build by @RyanHolstien in #10129
- fix(search): respect the search flags term bucket size by @david-leifker in #10130
- fix(ingestProposal): fix/handle no-op ingestion by @david-leifker in #10126
- fix(ci): simplify python release process by @hsheth2 in #10133
- feat(lineage): add a parameter to allow limiting the per hop exploration of lineage search by @RyanHolstien in #10062
- feat(ingest/bigquery): Respect dataset and table patterns when ingesting lineage via catalog api by @ANich in #10080
- feat(ingest): emit platform for query entities by @hsheth2 in #10103
- feat(ingest): loosen pyarrow dep by @hsheth2 in #10141
- fix(ingest/dbt): respect
convert_column_urns_to_lowercase
in mapping CLL by @hsheth2 in #10132 - chore(ingestion-base): update base requirements by @david-leifker in #10142
- feat(ingest/dbt): dbt model performance by @hsheth2 in #9992
- fix(ingest/databricks): support hive metastore schemas with special char by @mayurinehate in #10049
- feat(ui): sort partition keys to the top of the table for better visibility by @ngamanda in #9959
- fix: OBS-729 | Filters: Fix alignment on nested dropdown by @sumitappt in #10140
- feat(ingest/dynamodb): add support for classification by @mayurinehate in #10138
- feat(incidents) incident resolution note more clearly displayed by @jayacryl in #10151
- fix(entity-client): fix entity client cache and test by @david-leifker in #10149
- chore(ingest): update doc & log detail by @HuanjieGuo in #10139
- feat(ingest): loosen airflow plugin dependencies requirements by @hsheth2 in #10106
- feat(ingest): fix validators by @hsheth2 in #10115
- feat(ingest/bigquery): improve debug logs by @hsheth2 in #10101
- fix(graphQL): Ignore soft-deleted assertions in UI calls by @pedro93 in #10148
- fix(openapi): fix system-metadata response by @david-leifker in #10155
- docs: update markprompt project key by @yoonhyejin in #10134
- add row type for athena types by @rae89 in #10131
- fix(setup): fix postgres setup to create temp table with no data by @trialiya in #10154
- feat(ingest/looker): update browse paths to align with looker UI by @mayurinehate in #10147
- feat(ingest/airflow): allow plugin to load on listener exception by @hsheth2 in #10152
- feat(ingestion/bigquery): BigQuery Owner Label to Datahub Ownership by @shubhamjagtap639 in #10047
- feat(ingest): bump sqlglot dep by @hsheth2 in #10144
- docs(website): tweak eyebrow copy by @hsheth2 in #10143
- docs: upgrade markprompt version by @yoonhyejin in #10159
- fix(openapi): fix index out of bounds for sort order by @RyanHolstien in #10168
- fix(search): fix field name in api by @RyanHolstien in #10170
- build(docker): prefix pr on pr sha tags by @david-leifker in #10171
- Revert docker helper changes by @david-leifker in #10172
- feat(metadata-jobs): improve consumer logging by @darnaut in #10173
- test(graph): refactor graph test by @david-leifker in #10175
- fix(ingest/tableau) Fix Tableau lineage ingestion from Clickhouse by @valeral in #10167
- [oracle ingestion]: get database name when using service by @Nelvin73 in #10158
- fix(docker): fix versioning for compose file post release by @RyanHolstien in #10176
- fix(restoreIndices): batchSize vs limit by @david-leifker in #10178
- feat(ui): show classification in test connection by @hsheth2 in #10156
- fix(ingest): add classification dep for dynamodb by @hsheth2 in #10162
- feat(ingest/dbt): enable model performance and compiled code by default by @hsheth2 in #10164
- refactor(docker): move to acryldata repo for all images by @david-leifker in #9459
- fix(github): fix docker publish by @david-leifker in #10186
- feat(lineage): mark nodes as explored by @RyanHolstien in #10180
- feat(ingest/gc): add index truncation logic by @anshbansal in #10099
- fix(entity-service): fix findFirst when already present by @david-leifker in #10187
- fix(ingestion/salesforce): fixed the issue by escaping the markdown string by @dushayntAW in #10157
New Contributors
- @AvaniSiddhapuraAPT made their first contribution in #9757
- @akarsh991 made their first contribution in #9882
- @filipe-caetano-ovo made their first contribution in #9974
- @dushayntAW made their first contribution in #9862
- @LePuppy made their first contribution in #10005
- @kopax-polyconseil made their first contribution in #10033
- @poorvi767 made their first contribution in #8325
- @diegmonti made their first contribution in #9996
- @alexs-101 made their first contribution in #9838
- @pinakipb2 made their first contribution in #9895
- @trialiya made their first contribution in #9972
- @k7ragav made their first contribution in #10093
- @jayacryl made their first contribution in #10151
- @HuanjieGuo made their first contribution in #10139
- @rae89 made their first contribution in #10131
- @valeral made their first contribution in #10167
- @Nelvin73 made their first contribution in #10158
Full Changelog: v0.13.0...v0.13.1