Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

master merge for 0.4.9 release #1278

Merged
merged 43 commits into from
Apr 25, 2024
Merged

master merge for 0.4.9 release #1278

merged 43 commits into from
Apr 25, 2024

Conversation

rudolfix
Copy link
Collaborator

Description

master merge for 0.4.9 release

dat-a-man and others added 30 commits April 1, 2024 05:30
* feat(transform): implement columns pivot map function

* add str test

* support JSON paths

* enumerate columns

---------

Co-authored-by: Marcin Rudolf <[email protected]>
* format examples

* add core functionality for scd2 merge strategy

* make scd2 validity column names configurable

* make alias descriptive

* add validity column name conflict checking

* extend write disposition with dictionary configuration option

* add default delete-insert merge strategy

* update write_disposition type hints

* extend tested destinations

* 2nd time setup (#1202)

* remove obsolete deepcopy

* add scd2 docs

* add write_disposition existence condition

* add nullability hints to validity columns

* cache functions to limit schema lookups

* add row_hash_column_name config option

* default to default merge strategy

* replace hardcoded column name with variable to fix test

* fix doc snippets

* compares records without order and with caps timestamps precision in scd2 tests

* defines create load id, stores package state typed, allows package state to be passed on, uses load_id as created_at if possible

* creates new package to normalize from extracted package so state is carried on

* bans direct pendulum import

* uses timestamps with properly reduced precision in scd2

* selects newest state by load_id, not created_at. this will not affect execution as long as packages are processed in order

* adds formating datetime literal to escape

* renames x-row-hash to x-row-version

* corrects json and pendulum imports

* uses unique column in scd2 sql generation

* renames arrow items literal

* adds limitations to docs

* passes only complete columns to arrow normalize

* renames mode to disposition

* saves parquet with timestamp precision corresponding to the destination and updates schema in the normalizer

* adds transform that computes hashes of tables

* tests arrow/pandas + scd2

* allows scd2 columns to be added to arrow items

* various renames

* uses generic caps when writing parquet if no destination context

* disables coercing timestamps in parquet arrow writer

---------

Co-authored-by: Jorrit Sandbrink <[email protected]>
Co-authored-by: adrianbr <[email protected]>
Co-authored-by: rudolfix <[email protected]>
Setting hide password property to `True`
* Added docs for 'deploy dlt with Prefect'.

* Updated doc

* Update deploy-with-prefect.md

---------

Co-authored-by: Zaeem Athar <[email protected]>
* Introduce new config fields to filesystem destination configuration

* Merge layout.py with path_utils.py

* Adjust tests

* Fix linting issues and extract common types

* Use pendulum.now if current_datetime is not defined

* Add more layout tests

* Fix failing tests

* Cleanup tests

* Adjust tests

* Enable auto_mkdir for local filesystem

* Format code

* Extract re-usable functions and fix test

* Add invalid layout

* Accept load_package timestamp in create_path

* Collect all files and directories and then delete files first then directories

* Recursively descend and collect files and directories to truncate for filesystem destination

* Mock load_package timestamp and add more test layouts

* Fix linting issues

* Use better variable name to avoid shadowing builtin name

* Fix dummy tests

* Revert changes to filesystem impl

* Use timestamp if it is specified

* Cleanup path_utils and remove redundant code

* Revert factory

* Refactor path_utils and simplify things

* Remove custom datetime format parameter

* Pass load_package_timestamp

* Remove custom datetime format and current_datetime parameter parameter

* Cleanup imports

* Fix path_utils tests

* Make load_package_timestamp optional

* Revert some changes in tests

* Uncomment layout test item

* Revert fs client argument changes

* Fix mypy issue

* Fix linting issues

* Use all aggregated placeholder when checking layout

* Enable auto_mkdir for filesystem config tests

* Enable auto_mkdir for filesystem config tests

* Enable auto_mkdir only for local filesystem

* Add more placeholders

* Remove extra layout options

* Accepts current_datetime

* Adjust type names

* Pass current datetime

* Resolve current_datetime if it is callable

* Fix linting issues

* Parametrize layout check test

* Fix mypy issues

* Fix mypy issues

* Add more tests

* Add more test checks for create_path flow

* Add test flow comment

* Add more tests

* Add test to check callable extra placeholders

* Test if unused layout placeholders are printed

* Adjust timestamp selection logic

* Fix linting issues

* Extend default placeholders and remove redundant ones

* Add quarter placeholder

* Add test case for layout with quarter of the year

* Adjust type alias for placeholder callback

* Simplify code

* Adjust tests

* Validate placeholders in on_resolve

* Avoid assigning current_datetime callback value during validation

* Fix mypy warnings

* Fix linting issues

* Log warning message with yellow foreground

* Remove useless test

* Adjust error message for placeholder callback functions

* Lowercase formatted datetime values

* Adjust comments

* Re-import load_storage

* Adjust logic around timestamp and current datetime

* Fix mypy warnings

* Add test configuraion override when values passed via filesystem factory

* Better logic to handle current timestamp and current datetime

* Add more test checks

* Introduce new InvalidPlaceholderCallback exception

* Raise InvalidPlaceholderCallback instead of plain TypeError

* Fix import ban error

* Add more test cases for path utils layout check

* Adjust text color

* Small cleanup

* Verify path parts and layout parts are equal

* Remove unnecessary log

* Add test with actual pipeline run and checks for callback calls

* Revert conftest changes

* Cleanup and move current_datetime calling inside path_utils

* Adjust test

* Add clarification comment

* Use logger instead of printing out

* Make InvalidPlaceholderCallback of transient kind

* Move tests to new place

* Cleanup

* Add load_package_timestamp placeholder

* Fix mypy warning

* Add pytest-mock

* Adjust tests

* Adjust logic

* Fix mypy issue

* Use spy on logger for a test

* Add test layout example with load_package_timestamp

* Add test layout example with load_package_timestamp in pipeline tests

* Check created paths and assert validity of placeholders

* Rename variable to better fit the context

* Assert arguments in extra placeholder callbacks

* Make invalid placeholders exception more useful

* Assert created path with spy values

* Make error messages better for InvalidFilesystemLayout exception

* Fix mypy errors

* Also check created path

* Run pipeline then create path and check if files exist

* Fix mypy errors

* Check all load packages

* Add more layout samples using custom placeholders

* Add more layout samples with callable extra placeholder

* Add more layout samples with callable extra placeholder

* Remove redundant import

* Check expected paths created by create_path

* Fix mypy issues

* Add explanation comment to ALL_LAYOUTS

* Re-use frozen datetime

* Use dlt.common.pendulum

* Use ensure_pendulum_datetime instead of pendulum.parse

* Fix mypy issues

* Add invalid layout with extra placeholder before table_name

* Adjust exception message from invalid to missing placeholders
* Update tzdata to 2024.1

* Update lock hash
* Pass options to parse iso like strings

* Update testcase for iso detection
…1220)

* Add section about new placeholders

* Add basic information about additional placeholders

* Add more examples of layout configuration

* Add code snippet examples

* Remove typing info

* Add note

* Add note about auto_mkdir

* Try concurrent snippet linting

* Try concurrent snippet linting

* Adjust wording and format check_embedded_snippets.py

* Uncomment examples and submit task to pool properly

* Submit snippets to workers

* Revert parallelization stuff

* Comment out unused laoyuts

* Fix mypy issues

* Add a section about the recommended layout

* Adjust text

* Better text

* Adjust section titles

* Adjust code section language identifier

* Fix mypy errors

* More cosmetic changes for the doc

---------

Co-authored-by: Violetta Mishechkina <[email protected]>
* clean some stuff

* first messy version of filesystem state sync

* clean up a bit

* fix bug in state sync

* enable state tests for all bucket providers

* do not store state to uninitialized dataset folders

* fix linter errors

* get current pipeline from pipeline context

* fix bug in filesystem table init

* update testing pipe

* move away from "current" file, rather iterator bucket path contents

* store pipeline state in load package state and send to filesystem destination from there

* fix tests for changed number of files in filesystem destination

* remove dev code

* create init file also to mark datasets

* fix tests to respect new init file
change filesystem to fallback, to old state loading when used as staging destination

* update filesystem docs

* fix incoming tests of placeholders

* small fixes

* adds some tests for filesystem state
also fixes table count loading to work for all bucket destinations

* fix test helper

* save schema with timestamp instead of load_id

* pr fixes and move pipeline state saving to committing of extracted packages

* ensure pipeline state is only saved to load package if it has changed

* adds missing state injection into state package

* fix athena iceberg locations

* fix google drive filesystem with missing argument
* remove staging-optimized replace strategy for synapse

* fix athena iceberg locations

---------

Co-authored-by: Jorrit Sandbrink <[email protected]>
Co-authored-by: Dave <[email protected]>
* Revert tzdata update and update lock

* Add guide for contributors about dependency updates

* Adjust section title

* Revert black update

* Adjust section title

* Revert lockfile

* Update lock hash

* Remove example
…ks` (#1247)

* add bigquery datetime literal formatting

* refactor not exists to not in for bigquery and databricks compatibility

* mark main scd2 test as essential

---------

Co-authored-by: Jorrit Sandbrink <[email protected]>
* Check for default schema and schema name in streamlit session

* Do not show resource state if it is not available

* Fix mypy errors

* Remove the message if there is no schema in state

* Simplify code
* fix test_dbt_commands profile

* update dbt core tests
#1260)

* Add seconds to filesystem date placeholders

* Update docs

* Fix formatting

* Add milliseconds timestamps and placeholders
zem360 and others added 12 commits April 23, 2024 19:29
* Adding dlthub_telemetry_endpoint to RunConfiguration.

* Adding dlthub_telemetry_endpoint to test_configuration.

* Segment Changes:

1. In init_segment() adding checks for env RUNTIME__TELEMETRY_ENDPOINT.
2. Update _SEGMENT_ENDPOINT based on env variable. Set default value if None provided with default write key.
3. Adjusting header based on endpoint.

* Accessing values through config.

* fix minor things and add new endpoint to common tests

* add new endpoint url to local destinations

* Adding new endpoint url to all destinations.

* Adding test for init_segment.

* formating tests.

---------

Co-authored-by: Dave <[email protected]>
…ge keys are specified (#1225)

* add sanity check to prevent missing config setup

* fall back to append for merge without merge keys

* add test for checking behavior of hard_delete without key

* add schema warning

* fix athena iceberg locations

* add note in docs about merge fallback behavior

* fix merge switching tests

* fix one additional test with fallback
* add pydantic contracts implementation tests

* add tests for removal of normalizer section in schema

* add tests for contracts on nested dicts

* start working on pyarrow tests

* start adding tests of pyarrow normalizer

* add pyarrow normalizer tests

* add basic arrow tests

* merge fixes

* update tests

---------

Co-authored-by: Marcin Rudolf <[email protected]>
* adding images and wordsmithing

* changing image location

* fixing image name
* Add max_table_nesting to resource decorator

* Handle max_table_nesting in normalizer

* Use dict.get to retrieve table from schema

* Use schema.get_table and format code

* Fix bugs and parametrize test

* Add one more test case

* Get table from schema.tables

* Add comments and cleanup code

* Add more test cases when max_table_nesting is overridden via the property setter

* Assert property setter sets the value and a test with source and resource with max_table_nesting set

* Clarify test scenario description

* Add checks if max_nesting set in x-normalizer hints

* Add another case when resource does not define but source has defined max_table_nesting

* Check max_table_nesting propery accessor

* Update resource.md
* Automatically create folders for local filesystem

* Use simple equals check for protocol==file
* Add snowflake to application parameter to configuration

* Set default application parameter if it is not specified

* Adjust tests and for connection params

* Use empty string to skip setting the application parameter

* Set default value for application parameter

* Fix if check bug

* Uppercase SNOWFLAKE_APPLICATION_ID and re-use in tests

* Add note in docs about application parameter for snowflake

* Update text for snowflake's application connection parameter

* Fix typo

* Update docs/website/docs/dlt-ecosystem/destinations/snowflake.md

Co-authored-by: VioletM <[email protected]>

* Update snowflake.md

* Update doc

---------

Co-authored-by: VioletM <[email protected]>
* clean up pipeline utils a bit

* add fs client base interface and use it in tests

* make truncating code easier and fix bug in list table files

* create truncate method

* add some filesystem tests

* adds two bug fixes

* create dirs in loadjob

* make pipeline fs_client function private for now

---------

Co-authored-by: rudolfix <[email protected]>
Copy link

netlify bot commented Apr 24, 2024

Deploy Preview for dlt-hub-docs ready!

Name Link
🔨 Latest commit a529924
🔍 Latest deploy log https://app.netlify.com/sites/dlt-hub-docs/deploys/6629633f95f9910008db37ca
😎 Deploy Preview https://deploy-preview-1278--dlt-hub-docs.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

@rudolfix rudolfix added the ci full run the full load tests on pr label Apr 24, 2024
@sh-rp
Copy link
Collaborator

sh-rp commented Apr 24, 2024

@rudolfix you need to add the ci-full label when creating the pr, if you add it later, it will only be applied if there is another push unfortunately

@rudolfix rudolfix merged commit efaedc2 into master Apr 25, 2024
59 of 69 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ci full run the full load tests on pr
Projects
None yet
Development

Successfully merging this pull request may close these issues.