DataHub v0.10.2
Known Issues
- Postgresql: In release v0.10.1 the default value for
max_threads
was increased in the CLI from1
to15
. This creates an issue with Postgresql transactions. The recommended workaround is to decrease themax_threads
in your ingestion recipes to1
if running Postgresql for the GMS backend. - BigQuery: BigQuery connector depends on a bad version of SQLParse, which manifest as
str object is not callable
error. This has since been fixed in CLI release version v0.10.2.2.
Release Highlights
Metadata Ingestion
New
- [redshift] Redshift Combining Usage and Metadata Extraction
- [bigquery] Cross-Project Usage Support (using File System)
- [snowflake] Push down Lineage Extraction to Snowflake Access History API
- [azure-ad] Support stateful ingestion - Automatically remove groups and users when they are removed in Azure.
- [okta] Support stateful ingestion - Automatically remove groups and users when they are removed in Okta.
- [tableau] Extract lineage from CSQL queries in Tableau ingestion
- [snowflake] Better error message on key pair authentication
- [sdk] Support executing GraphQL Queries via DataHubGraph
- [unity] Support extracting ownership
- [postgres] Support extracting metadata from all databases in a single recipe
Bug Fixes
- [bigquery] Capture all operation types when ingesting operational stats
- [bigquery] Fix and refactor exported audit logs query
- [redshift] Fix SQL for extracting lineage from insert queries
User Experience
New
- Auto-Complete UX Refresh - Quickly filter search results inside autocomplete experience
- View: Support views on the Auto-Complete Search Bar
Bug Fixes
- Fix bug where Tag names do not render properly in search previews
- Fix bug where Tag color does not render properly in search autocomplete
- Fix bug when adding Tags and Glossary Terms to nested schema fields
- Fix bug where DataHub would redirect you when clicking to navigate back home
- Fix bug where Metadata Tests results did not show if they were all passing
Documentation
- Redshift Ingestion Quickstart Guide: https://datahubproject.io/docs/quick-ingestion-guides/redshift/overview
- Tableau Ingestion Quickstart Guide: https://datahubproject.io/docs/quick-ingestion-guides/tableau/overview
- PowerBI Ingestion Quickstart Guide: https://datahubproject.io/docs/quick-ingestion-guides/powerbi/overview
- Add docs on creating users and groups: https://datahubproject.io/docs/api/tutorials/creating-users-and-groups/
- Add docs for our Python SDK: https://datahubproject.io/docs/python-sdk/builder
- Add docs on Windows compatibility: https://datahubproject.io/docs/developers/#windows-compatibility
Developer Experience
- Add performance testing framework for BigQuery usage
What's Changed
- fix(cli): allow usage without kafka by @hsheth2 in #7677
- test(elasticsearch): Add unit test for timestamp-based lineage feature by @iprentic in #7661
- feat(docs-website): add docs on creating users and groups by @yoonhyejin in #7574
- chore(ci): add coverage code for python by @anshbansal in #7681
- doc(release): managed datahub v0.2.4 release notes by @anshbansal in #7679
- refactor(ingest/bigquery): add inline comments + refactor in table name parsing by @mayurinehate in #7609
- fix(ingest/looker): skip empty user ids for usage by @hsheth2 in #7686
- fix(ingest/dbt): enable incremental lineage by default by @hsheth2 in #7674
- fix(ingest/bigquery): Fix BigQueryTableType enum accesses by @asikowitz in #7685
- fix(ingest/looker): correct looker/lookml capability reports by @hsheth2 in #7683
- feat(ingest/looker): enable looker usage ingestion by default by @hsheth2 in #7684
- doc(freshness): add faq for dataset freshness by @anshbansal in #7693
- chore(lint): fix lint in looker by @anshbansal in #7695
- fix(ingest/bigquery): quote string constants in query by @mayurinehate in #7694
- feat(ui) Update auto-complete functionality and design by @chriscollins3456 in #7515
- fix(ui) Update Looker/Lookml forms to set client id and deploy key as Secrets by @chriscollins3456 in #7479
- perf(ingest): Improve FileBackedDict iteration performance; minor refactoring by @asikowitz in #7689
- feat(quickstart): move quickstart back to master by @hsheth2 in #7697
- test(ingest/dbt): add test for column meta match by @hsheth2 in #7673
- feat(ingest/postgres): support extracting metadata from all databases in single recipe by @mayurinehate in #7581
- docs(): generate docs for our Python SDK by @hsheth2 in #7612
- fix(ingest/redshift): Lineage query fix to work with the latest redshift by @treff7es in #7698
- feat(ingestion): azure-ad stateful ingestion by @mohdsiddique in #7701
- chore(ingest): formatting + cleanup MCPW usages by @hsheth2 in #7706
- test(ingest/bigquery): Add performance testing framework for bigquery usage by @asikowitz in #7690
- fix(docs): Fixing timeseries delete doc until code path is fixed by @jjoyce0510 in #7711
- docs: add concept section by @yoonhyejin in #7655
- JWT authenticator with asymmetric PublicKey verification for JWT token. by @syedzoherer in #6495
- fix(ingestion): fix AssertionError in base_transformer by @sgomezvillamor in #7702
- feat(docs): support inlining code snippets from files by @hsheth2 in #7712
- feat(ingestion) Allow for ingestion to read files remotely by @xiphl in #7552
- feat: add pre-commit by @yoonhyejin in #7680
- docs(okta): add how to use email in urns by @anshbansal in #7708
- feat(ingest/snowflake): hide
host_port
from snowflake docs by @hsheth2 in #7717 - feat(ingest/bigquery): Capture all operation types when ingesting operational stats by @asikowitz in #7723
- doc(redshift) - Adding Redshift ingestion quickstart guide by @treff7es in #7700
- refactor(ingest): Minor cleanup of File, CsvEnricher, BusinessGlossary, and FileLineage sources by @asikowitz in #7718
- feat(ingest/lookml): support views with
derived_table
.explore_source
by @hsheth2 in #7704 - fix(ci): Fixing broken Domains Test by @jjoyce0510 in #7746
- feat(ingest/dbt): include dbt unique_id in properties by @hsheth2 in #7737
- docs(airflow): update with information for new plugin by @anshbansal in #7732
- chore(ingest): change kafka connect mapped ports by @hsheth2 in #7728
- feat(docs): clear up source configs by @hsheth2 in #7720
- feat(ingest): emit state payloads as soft-deleted by @hsheth2 in #7714
- fix(sdk): remove rest emitter to graph cache in CorpGroup by @bossenti in #7743
- refactor(ingest): Use sqlite.Row row_factory for FileBackedCollections by @asikowitz in #7739
- refactor(ingest/bigquery): Standardize audit log parsing and make TopKDict a DefaultDict by @asikowitz in #7738
- doc(ingestion): tableau quick ingestion guide by @mohdsiddique in #7682
- docs(search): Add example search for finding tables without the name field by @iprentic in #7647
- feat(ingest/dbt): update subtypes for dbt by @hsheth2 in #7750
- feat(snowflake): better error message on key pair authentication by @anshbansal in #7734
- feat(sdk): fix ownership emission for groups by @hsheth2 in #7751
- fix(TestResults UI):show non-failing TestResult by @blankon123 in #7747
- fix(ingest/bigquery): fix and refractor exported audit logs query by @mayurinehate in #7699
- fix(ingest/demo-data): fix bug in path type by @hsheth2 in #7749
- fix(auto-complete) Pass in views to auto-complete endpoint for filtering by @chriscollins3456 in #7754
- fix(ingest/dbt-cloud): use correct dbt cloud IDE urls by @hsheth2 in #7755
- docs(ingest/lookml): update error message for Looker connection fetch by @hsheth2 in #7756
- docs(docker): fix typo in README.md by @PedroMiguelFigueiredo in #7729
- fix(search) Increase weight on fieldPath field for searching by @iprentic in #7725
- feat: add docs on windows compatibility by @yoonhyejin in #7713
- feat: make demo site accessible directly from navbar by @yoonhyejin in #7715
- fix(ingest/bigquery) - Lineage edges use datetime with timezone by @treff7es in #7762
- fix(ingest/redshift): Fixing adding back db name in redshift urn by @treff7es in #7765
- fix(ingest/redshift): fixing sql which extracts lineage from insert queries by @treff7es in #7770
- ci: limit qodana concurrency to group by @hsheth2 in #7764
- fix(ingest/bigquery): Raise report_failure threshold; add robustness around table parsing by @asikowitz in #7772
- feat(config): allow hooks to be enabled/disabled by @RyanHolstien in #7761
- feat: add quickstart snippet on main page by @yoonhyejin in #7716
- feat(snowflake): improve snowflake lineage perf and memory, push down to snowflake by @mayurinehate in #7710
- docs(): add styles for sphinx generated python docs by @jeffmerrick in #7773
- fix(ingest/bigquery): Support cross project usage using FileBackedDict by @asikowitz in #7663
- fix(ingest/snowflake): fix incorrect tag urn case, improve tag display name by @mayurinehate in #7758
- feat(ingestion/okta): okta stateful ingestion by @mohdsiddique in #7736
- feat(docs): refactor guide on graphql by @yoonhyejin in #7745
- fix(docs): replaced outdated docs command for building breaking changes of the metadata model by @ArneCJacobs in #7759
- doc(cli): fix get cli example by @anshbansal in #7742
- fix(ingest/snowflake): fix tags without lineage query, remove comma by @mayurinehate in #7779
- chore(ingest): enable flake8 bugbear linting by @hsheth2 in #7763
- fix(ui): Fix tags display name + color in UI for autocomplete, search preview, entity profile by @jjoyce0510 in #7785
- fix(ui) Fix tags and terms columns on nested schema fields by @chriscollins3456 in #7782
- chore(ingest): cleanup unused fields in bigquery/snowflake by @hsheth2 in #7787
- docs(managed datahub): release notes for v0.2.5 by @anshbansal in #7780
- fix(ingest/snowflake): fix to not emit upstream external lineage for … by @mayurinehate in #7778
- (build) Upgrade json-smart dependency to 2.4.9 by @iprentic in #7788
- feat(ingest/tableau): extract lineage from csql queries by @maaaikoool in #7561
- build: Use external dependency to set jsonSmart version in frontend build file by @iprentic in #7793
- build: Upgrade jettison dependency to 1.5.4 by @iprentic in #7794
- fix(snakeyaml): cve-2022-1471 upgrade by @meyerkev in #7795
- test(ingest/snowflake): fix tests around host_port by @asikowitz in #7791
- fix(ingest/bigquery): Fix lineage / usage table ref checks by @asikowitz in #7800
- config(ingest/bigquery): Default lineage_use_sql_parser to true; update description by @asikowitz in #7797
- fix(dep): add sqllineage dependency for tableau by @mayurinehate in #7803
- feat(ingest): redshift - Redshift rework by @treff7es in #6906
- test(ingest/bigquery): Add sql parser xfail test to fix late by @asikowitz in #7792
- feat(sdk): support executing graphql via DataHubGraph by @hsheth2 in #7753
- doc(ingestion): powerbi quick ingestion guide by @mohdsiddique in #7670
- feat(search): allow longer customProperties by @david-leifker in #7804
- fix(docs): Timeline API examples: should be owner, not ownership by @jeremypharo in #7744
- feat(upgrade): add gms protocol variable by @felipeac in #7752
- feat(ingest/unity): support extracting ownership by @hsheth2 in #7801
- chore(security): removing unmaintained es7 upgrade proces by @david-leifker in #7790
- feat: set generateParameterizedFieldsResolvers to false to have parameterized queries be generated by @TonyOuyangGit in #7806
- chore(snakeyaml): upgrade to snakeyaml 2 by @RyanHolstien in #7786
- servlet(config): add search configuration endpoint by @david-leifker in #7641
- feat(ingest/lookml): correctly handle include directives from imported projects by @hsheth2 in #7798
- feat(patch): patch support for flow info and job info and refactor patchbuilders for java sdk by @RyanHolstien in #7495
- fix(ui): Fix subtle initial redirect bug by @jjoyce0510 in #7796
- feat(ingest): Track disk usage in report by @asikowitz in #7812
- fix(ingest/redshift) - Remove pg_user table from metadata queries by @treff7es in #7815
New Contributors
- @PedroMiguelFigueiredo made their first contribution in #7729
- @ArneCJacobs made their first contribution in #7759
- @meyerkev made their first contribution in #7795
- @jeremypharo made their first contribution in #7744
- @felipeac made their first contribution in #7752
Full Changelog: v0.10.1...v0.10.2