Skip to content

DataHub v0.10.2

Compare
Choose a tag to compare
@iprentic iprentic released this 13 Apr 23:26
· 2869 commits to master since this release
4ec280e

Known Issues

  • Postgresql: In release v0.10.1 the default value for max_threads was increased in the CLI from 1 to 15. This creates an issue with Postgresql transactions. The recommended workaround is to decrease the max_threads in your ingestion recipes to 1 if running Postgresql for the GMS backend.
  • BigQuery: BigQuery connector depends on a bad version of SQLParse, which manifest as str object is not callable error. This has since been fixed in CLI release version v0.10.2.2.

Release Highlights

Metadata Ingestion

New

  • [redshift] Redshift Combining Usage and Metadata Extraction
  • [bigquery] Cross-Project Usage Support (using File System)
  • [snowflake] Push down Lineage Extraction to Snowflake Access History API
  • [azure-ad] Support stateful ingestion - Automatically remove groups and users when they are removed in Azure.
  • [okta] Support stateful ingestion - Automatically remove groups and users when they are removed in Okta.
  • [tableau] Extract lineage from CSQL queries in Tableau ingestion
  • [snowflake] Better error message on key pair authentication
  • [sdk] Support executing GraphQL Queries via DataHubGraph
  • [unity] Support extracting ownership
  • [postgres] Support extracting metadata from all databases in a single recipe

Bug Fixes

  • [bigquery] Capture all operation types when ingesting operational stats
  • [bigquery] Fix and refactor exported audit logs query
  • [redshift] Fix SQL for extracting lineage from insert queries

User Experience

New

  • Auto-Complete UX Refresh - Quickly filter search results inside autocomplete experience
  • View: Support views on the Auto-Complete Search Bar

Bug Fixes

  • Fix bug where Tag names do not render properly in search previews
  • Fix bug where Tag color does not render properly in search autocomplete
  • Fix bug when adding Tags and Glossary Terms to nested schema fields
  • Fix bug where DataHub would redirect you when clicking to navigate back home
  • Fix bug where Metadata Tests results did not show if they were all passing

Documentation

Developer Experience

  • Add performance testing framework for BigQuery usage

What's Changed

  • fix(cli): allow usage without kafka by @hsheth2 in #7677
  • test(elasticsearch): Add unit test for timestamp-based lineage feature by @iprentic in #7661
  • feat(docs-website): add docs on creating users and groups by @yoonhyejin in #7574
  • chore(ci): add coverage code for python by @anshbansal in #7681
  • doc(release): managed datahub v0.2.4 release notes by @anshbansal in #7679
  • refactor(ingest/bigquery): add inline comments + refactor in table name parsing by @mayurinehate in #7609
  • fix(ingest/looker): skip empty user ids for usage by @hsheth2 in #7686
  • fix(ingest/dbt): enable incremental lineage by default by @hsheth2 in #7674
  • fix(ingest/bigquery): Fix BigQueryTableType enum accesses by @asikowitz in #7685
  • fix(ingest/looker): correct looker/lookml capability reports by @hsheth2 in #7683
  • feat(ingest/looker): enable looker usage ingestion by default by @hsheth2 in #7684
  • doc(freshness): add faq for dataset freshness by @anshbansal in #7693
  • chore(lint): fix lint in looker by @anshbansal in #7695
  • fix(ingest/bigquery): quote string constants in query by @mayurinehate in #7694
  • feat(ui) Update auto-complete functionality and design by @chriscollins3456 in #7515
  • fix(ui) Update Looker/Lookml forms to set client id and deploy key as Secrets by @chriscollins3456 in #7479
  • perf(ingest): Improve FileBackedDict iteration performance; minor refactoring by @asikowitz in #7689
  • feat(quickstart): move quickstart back to master by @hsheth2 in #7697
  • test(ingest/dbt): add test for column meta match by @hsheth2 in #7673
  • feat(ingest/postgres): support extracting metadata from all databases in single recipe by @mayurinehate in #7581
  • docs(): generate docs for our Python SDK by @hsheth2 in #7612
  • fix(ingest/redshift): Lineage query fix to work with the latest redshift by @treff7es in #7698
  • feat(ingestion): azure-ad stateful ingestion by @mohdsiddique in #7701
  • chore(ingest): formatting + cleanup MCPW usages by @hsheth2 in #7706
  • test(ingest/bigquery): Add performance testing framework for bigquery usage by @asikowitz in #7690
  • fix(docs): Fixing timeseries delete doc until code path is fixed by @jjoyce0510 in #7711
  • docs: add concept section by @yoonhyejin in #7655
  • JWT authenticator with asymmetric PublicKey verification for JWT token. by @syedzoherer in #6495
  • fix(ingestion): fix AssertionError in base_transformer by @sgomezvillamor in #7702
  • feat(docs): support inlining code snippets from files by @hsheth2 in #7712
  • feat(ingestion) Allow for ingestion to read files remotely by @xiphl in #7552
  • feat: add pre-commit by @yoonhyejin in #7680
  • docs(okta): add how to use email in urns by @anshbansal in #7708
  • feat(ingest/snowflake): hide host_port from snowflake docs by @hsheth2 in #7717
  • feat(ingest/bigquery): Capture all operation types when ingesting operational stats by @asikowitz in #7723
  • doc(redshift) - Adding Redshift ingestion quickstart guide by @treff7es in #7700
  • refactor(ingest): Minor cleanup of File, CsvEnricher, BusinessGlossary, and FileLineage sources by @asikowitz in #7718
  • feat(ingest/lookml): support views with derived_table.explore_source by @hsheth2 in #7704
  • fix(ci): Fixing broken Domains Test by @jjoyce0510 in #7746
  • feat(ingest/dbt): include dbt unique_id in properties by @hsheth2 in #7737
  • docs(airflow): update with information for new plugin by @anshbansal in #7732
  • chore(ingest): change kafka connect mapped ports by @hsheth2 in #7728
  • feat(docs): clear up source configs by @hsheth2 in #7720
  • feat(ingest): emit state payloads as soft-deleted by @hsheth2 in #7714
  • fix(sdk): remove rest emitter to graph cache in CorpGroup by @bossenti in #7743
  • refactor(ingest): Use sqlite.Row row_factory for FileBackedCollections by @asikowitz in #7739
  • refactor(ingest/bigquery): Standardize audit log parsing and make TopKDict a DefaultDict by @asikowitz in #7738
  • doc(ingestion): tableau quick ingestion guide by @mohdsiddique in #7682
  • docs(search): Add example search for finding tables without the name field by @iprentic in #7647
  • feat(ingest/dbt): update subtypes for dbt by @hsheth2 in #7750
  • feat(snowflake): better error message on key pair authentication by @anshbansal in #7734
  • feat(sdk): fix ownership emission for groups by @hsheth2 in #7751
  • fix(TestResults UI):show non-failing TestResult by @blankon123 in #7747
  • fix(ingest/bigquery): fix and refractor exported audit logs query by @mayurinehate in #7699
  • fix(ingest/demo-data): fix bug in path type by @hsheth2 in #7749
  • fix(auto-complete) Pass in views to auto-complete endpoint for filtering by @chriscollins3456 in #7754
  • fix(ingest/dbt-cloud): use correct dbt cloud IDE urls by @hsheth2 in #7755
  • docs(ingest/lookml): update error message for Looker connection fetch by @hsheth2 in #7756
  • docs(docker): fix typo in README.md by @PedroMiguelFigueiredo in #7729
  • fix(search) Increase weight on fieldPath field for searching by @iprentic in #7725
  • feat: add docs on windows compatibility by @yoonhyejin in #7713
  • feat: make demo site accessible directly from navbar by @yoonhyejin in #7715
  • fix(ingest/bigquery) - Lineage edges use datetime with timezone by @treff7es in #7762
  • fix(ingest/redshift): Fixing adding back db name in redshift urn by @treff7es in #7765
  • fix(ingest/redshift): fixing sql which extracts lineage from insert queries by @treff7es in #7770
  • ci: limit qodana concurrency to group by @hsheth2 in #7764
  • fix(ingest/bigquery): Raise report_failure threshold; add robustness around table parsing by @asikowitz in #7772
  • feat(config): allow hooks to be enabled/disabled by @RyanHolstien in #7761
  • feat: add quickstart snippet on main page by @yoonhyejin in #7716
  • feat(snowflake): improve snowflake lineage perf and memory, push down to snowflake by @mayurinehate in #7710
  • docs(): add styles for sphinx generated python docs by @jeffmerrick in #7773
  • fix(ingest/bigquery): Support cross project usage using FileBackedDict by @asikowitz in #7663
  • fix(ingest/snowflake): fix incorrect tag urn case, improve tag display name by @mayurinehate in #7758
  • feat(ingestion/okta): okta stateful ingestion by @mohdsiddique in #7736
  • feat(docs): refactor guide on graphql by @yoonhyejin in #7745
  • fix(docs): replaced outdated docs command for building breaking changes of the metadata model by @ArneCJacobs in #7759
  • doc(cli): fix get cli example by @anshbansal in #7742
  • fix(ingest/snowflake): fix tags without lineage query, remove comma by @mayurinehate in #7779
  • chore(ingest): enable flake8 bugbear linting by @hsheth2 in #7763
  • fix(ui): Fix tags display name + color in UI for autocomplete, search preview, entity profile by @jjoyce0510 in #7785
  • fix(ui) Fix tags and terms columns on nested schema fields by @chriscollins3456 in #7782
  • chore(ingest): cleanup unused fields in bigquery/snowflake by @hsheth2 in #7787
  • docs(managed datahub): release notes for v0.2.5 by @anshbansal in #7780
  • fix(ingest/snowflake): fix to not emit upstream external lineage for … by @mayurinehate in #7778
  • (build) Upgrade json-smart dependency to 2.4.9 by @iprentic in #7788
  • feat(ingest/tableau): extract lineage from csql queries by @maaaikoool in #7561
  • build: Use external dependency to set jsonSmart version in frontend build file by @iprentic in #7793
  • build: Upgrade jettison dependency to 1.5.4 by @iprentic in #7794
  • fix(snakeyaml): cve-2022-1471 upgrade by @meyerkev in #7795
  • test(ingest/snowflake): fix tests around host_port by @asikowitz in #7791
  • fix(ingest/bigquery): Fix lineage / usage table ref checks by @asikowitz in #7800
  • config(ingest/bigquery): Default lineage_use_sql_parser to true; update description by @asikowitz in #7797
  • fix(dep): add sqllineage dependency for tableau by @mayurinehate in #7803
  • feat(ingest): redshift - Redshift rework by @treff7es in #6906
  • test(ingest/bigquery): Add sql parser xfail test to fix late by @asikowitz in #7792
  • feat(sdk): support executing graphql via DataHubGraph by @hsheth2 in #7753
  • doc(ingestion): powerbi quick ingestion guide by @mohdsiddique in #7670
  • feat(search): allow longer customProperties by @david-leifker in #7804
  • fix(docs): Timeline API examples: should be owner, not ownership by @jeremypharo in #7744
  • feat(upgrade): add gms protocol variable by @felipeac in #7752
  • feat(ingest/unity): support extracting ownership by @hsheth2 in #7801
  • chore(security): removing unmaintained es7 upgrade proces by @david-leifker in #7790
  • feat: set generateParameterizedFieldsResolvers to false to have parameterized queries be generated by @TonyOuyangGit in #7806
  • chore(snakeyaml): upgrade to snakeyaml 2 by @RyanHolstien in #7786
  • servlet(config): add search configuration endpoint by @david-leifker in #7641
  • feat(ingest/lookml): correctly handle include directives from imported projects by @hsheth2 in #7798
  • feat(patch): patch support for flow info and job info and refactor patchbuilders for java sdk by @RyanHolstien in #7495
  • fix(ui): Fix subtle initial redirect bug by @jjoyce0510 in #7796
  • feat(ingest): Track disk usage in report by @asikowitz in #7812
  • fix(ingest/redshift) - Remove pg_user table from metadata queries by @treff7es in #7815

New Contributors

Full Changelog: v0.10.1...v0.10.2