DataHub v0.8.7
Pre-release
Pre-release
Release Stability
- There are a few bugs reported on this release that are fixed in 0.8.8. Users are highly recommended to skip past this release!
Release Highlights
- Dataset Profiling and support for time-series metadata
- UI for ML Models, Features; support for AWS SageMaker and Feast
- Cli: support for rollback operations after ingestion
- Integration fixes for Looker, dbt, and many more.
- Demos for all these features are available in our July Townhall video
ChangeLog
- #3021 @kevinhu feat(ingest): extract dbt versions into custom properties
- #3020 @gabe-lyons fix(caching): refetch query on update
- #3019 @kevinhu fix(ingest): don't assume Glue job description always exists
- #3000 @topwebtek7 fix(react): fix weird 0 rendering possible bugs
- #3018 @dexter-mh-lee feat(ingest): add kafka emitters for MetadataChangeProposal format
- #2999 @jjoyce0510 fix(gms): Adding Rest.li Write-Time Model Validation
- #3009 @jjoyce0510 fix(quickstart): Bumping Default Memory for GMS and Frontend
- #3007 @jjoyce0510 fix(gms): better logging on failed MCL / MAE
- #3008 @gabe-lyons fix(blank pages): removing apollo caching
- #3006 @jjoyce0510 fix(ci): using AspectExtractor instead of removed SnapshotToAspectMap
- #2998 @gabe-lyons fix(graphql): fetching data platforms using standard procedure
- #2944 @EnricoMi refactor(test): Refactor GraphService tests
- #2972 @jameslamb fix(ingest): map all LookML dimension types to corresponding avro types
- #3005 @dexter-mh-lee fix(ingestion): Safeguard against empty values for profile ingestion
- #3002 @dexter-mh-lee fix(datahub-upgrade) add config registry to datahub upgrade container
- #3003 @jjoyce0510 fix(dataset stats): Fix checks for existence of row and column counts
- #2997 @topwebtek7 feat(react): update dataset documents tab with a merged document column
- #2991 @topwebtek7 feat(react): update search result has result counts for each entities that has result
- #2983 @jjoyce0510 Introducing TimeSeries Aspects + Dataset Profile (Stats) Aspect
- #2984 @dexter-mh-lee fix(browse): Fix browse pagination and multi-browse path issue
- #2995 @aseembansal-gogo docs(ingest): Add instructions to install required dependency
- #2960 @gabe-lyons feat(deletes): add run commands (list, show, rollback) to datahub ingest
- #2994 @chinmay-bhat docs(ingest): fixed Snowflake recipe to escape dollar-sign
- #2981 @hsheth2 docs: remove a few outdated docs
- #2988 @jjoyce0510 docs: add docs on extracting container logs
- #2963 @hsheth2 test(ingestion): run full tests on both python versions
- #2967 @jameslamb fix(ingest): add more debug logging to LookML metadata ingestion
- #2966 @jameslamb fix(ingest): ensure that LookML files are always parsed in the same order
- #2965 @jameslamb fix(ingest): ensure workunits are created for all LookML views
- #2982 @gabe-lyons fix(tags): fixing tag applied to module for tags w/ colons in the name
- #2961 @gabe-lyons feat(ml-model): adding ml models and ml model groups
- #2975 @kevinhu feat(ingest): type stubs for boto3
- #2979 @jameslamb perf(ingest): remove unused variable in Looker ingestion
- #2980 @hsheth2 fix(ingest): infer bigquery project identifier
- #2978 @chinmay-bhat fix(ingest): fix hive ingestion to respect database configuration
- #2976 @hsheth2 feat(ingest): stricter deserialization for MCE JSONs
- #2959 @kevinhu feat(docs): tutorial for writing a custom transformer
- #2977 @hsheth2 fix(ingestion): isolate dependency requirements of airflow hooks
- #2962 @hsheth2 feat(ingest): add timezone validation to bigquery usage
- #2974 @dexter-mh-lee fix(elasticsearch-setup): fix elasticsearch setup for aws
- #2952 @hsheth2 text(ingestion): test multiple python versions in CI
- #2958 @hsheth2 feat(ingest): add Airflow TaskFlow example
- #2950 @kevinhu fix(ingest): patch lookml types and refactor ingestion sources layout
- #2957 @jameslamb fix(ingest): match nested LookML files mentioned in 'include' statements
- #2956 @gabe-lyons Revert "fix(gql): removing data platform caching in gql (#2947)"
- #2955 @kevinhu feat(ingest): ingest descriptions from dbt models
- #2948 @hsheth2 fix(ingestion): add more mypy annotations
- #2946 @hsheth2 feat(ingestion): test GMS connections before ingestion
- #2947 @gabe-lyons fix(gql): removing data platform caching in gql
- #2949 @hsheth2 test(ingestion): fix flaky package discovery test
- #2951 @kevinhu feat(docs): update videos and integration logos
- #2953 @hsheth2 fix(ingestion): resolve test bugs for 3.6
- #2943 @kevinhu feat(ingest): add logo and platform entry for Glue
- #2940 @hsheth2 fix(ingest): handle quotes in lookml properly
- #2938 @kevinhu feat(models): remove versions from metrics and hyperparams
- #2942 @hsheth2 fix(ingestion): make snowflake database names lowercase
- #2939 @hsheth2 feat(ingest): use urn builders in looker and validate data platforms
- #2941 @aseembansal-gogo refactor(ingest): make code pythonic
- #2937 @kevinhu fix(ingest): allow custom Glue scripts
- #2921 @kafkahw refactor(datahub-web): removing frontend Ember app (i.e. datahub-web folder)
- #2913 @hsheth2 fix(ingest): refactor + fix recursion in lookml file loading logic
- #2925 @hsheth2 feat(ingest): improve bigquery-usage robustness and docs
- #2931 @aseembansal-gogo fix(ingest): fix workunit name to be consistent with other sources
- #2935 @kevinhu fix(ingest): fix browsepaths and ownership urns
- #2930 @aseembansal-gogo fix(ingest): glue add support for mapping varchar, decimal types
- #2929 @kevinhu feat(ingest): refactor mlModel grouping and add browsepaths
- #2934 @hsheth2 docs(ingest): update looker + docker script docs
- #2926 @hsheth2 feat(ingest): add
make_data_platform_urn
method to builder - #2932 @topwebtek7 feat(react): surface edited descriptions on search preview for dataset, datajob, dataflow, chart, dashboard
- #2911 @hsheth2 fix(ingest): add quotes to secured kafka yaml config example
- #2927 @kevinhu feat(ingest): dbt aliases
- #2806 @saxo-lalrishav fix(react): enable relation between glossary term and datasets searchable
- #2910 @kevinhu feat(ingest): extract SageMaker metrics, hyperparameters, and external URLs
- #2915 @aseembansal-gogo docs: update docs for consistency in naming
- #2922 @kevinhu feat(ingest): test dbt ingestion with and without schemas
- #2924 @hsheth2 fix(ingest): note that views are not supported for Athena
- #2920 @hsheth2 feat(ingestion): support multiple project IDs in bigquery usage stats
- #2923 @hsheth2 fix(ingest): pin snowflake sqlalchemy connector
- #2909 @hsheth2 feat(ingest): add support for Oracle spatial types
- #2917 @kevinhu docs(ingest): update sample recipe and test input for dbt
- #2887 @topwebtek7 feat(mlFeatureTable): add graphql, ui/ux for mlFeatureTable, mlFeature, mlPrimaryKey entities
- #2916 @kevinhu fix(ingest): stringify all dbt custom props
- #2898 @aseembansal-gogo feat(ingest): Add option to change name of database for postgres
- #2912 @hsheth2 fix(ingest): issue a warning if the column list is empty
- #2894 @kevinhu feat(ingest): lineage for SageMaker model endpoints and groups
- #2905 @hsheth2 feat(ingest): add
can_add_aspect
method for MCEs - #2906 @hsheth2 test(ingest): update tox test configurations and test airflow 2.x by default
- #2904 @jjoyce0510 fix(frontend): Don't use Apollo Cache for IsAnalyticsEnabled query.
- #2877 @remisalmon feat(ingest): use node comment as description if existing else default to key
- #2889 @hsheth2 fix(react): avoid displaying "0" for ignored timestamps
- #2890 @gabe-lyons fix(search): fixing case where someone issues a null query
- #2893 @hsheth2 fix(ingest): use logger.warning instead of logger.warn
- #2888 @jameslamb fix(ingest): change LookMLSource._get_upsteam_lineage() to _get_upstream_lineage()
- #2901 @topwebtek7 feat(react): update schema history visualizing, truncate long type, original desc bug
- #2891 @hsheth2 fix(ingest): correct globs in lookml model discovery
- #2902 @kevinhu feat(ingest): add connectivity check for Looker
- #2597 @wan54 feat(react): configure Cypress + MirageJS + GraphQL mock for functional testing plus a couple of example tests
- #2903 @shirshanka docs: update docs for July townhall
- #2900 @kevinhu fix(ingest): string-ify dbt custom props
- #2899 @jjoyce0510 fix(docs): fixing miscellaneous docs
- #2788 @saxo-lalrishav fix(glossary):default browse path for glossary term
- #2868 @kevinhu feat(ingest): extract lineage between SageMaker jobs and models
- #2884 @dexter-mh-lee fix(search): Fix index builder
- #2883 @hsheth2 docs: revamp adoption section
- #2882 @hsheth2 fix(ingest): fix druid misconfiguration bug
- #2881 @hsheth2 fix(ingest): default to unlimited query log delay in bigquery-usage
- #2790 @saxo-lalrishav fix(search): enable search on business glossary terms
- #2872 @hsheth2 build(ingest): reduce dependencies for dev install
- #2874 @topwebtek7 fix(react): fix bug in description update modal
- #2866 @hsheth2 build(ingestion): add version prompt to release script
- #2869 @kevinhu feat(ingest): update golden files only when diff fails
- #2876 @kevinhu feat(ingest): extract dbt meta fields
- #2875 @hsheth2 docs(quickstart): add default password to quickstart
- #2873 @hsheth2 fix(quickstart): update compose spec version
- #2862 @hsheth2 build(ingest): separate metadata-ingestion build workflow fully
- #2867 @hsheth2 fix(build): increase retries for dependency fetches
- #2849 @kevinhu feat(ingest): add browse paths + dataplatform for Feast features
- #2859 @kevinhu feat(docs): swap Medium and videos sections
- #2858 @hsheth2 feat(ingest): support dynamic imports for transfomer methods