master merge for 0.4.9 release #1278

rudolfix · 2024-04-24T19:52:55Z

Description

master merge for 0.4.9 release

* feat(transform): implement columns pivot map function * add str test * support JSON paths * enumerate columns --------- Co-authored-by: Marcin Rudolf <[email protected]>

…#1211)

* format examples * add core functionality for scd2 merge strategy * make scd2 validity column names configurable * make alias descriptive * add validity column name conflict checking * extend write disposition with dictionary configuration option * add default delete-insert merge strategy * update write_disposition type hints * extend tested destinations * 2nd time setup (#1202) * remove obsolete deepcopy * add scd2 docs * add write_disposition existence condition * add nullability hints to validity columns * cache functions to limit schema lookups * add row_hash_column_name config option * default to default merge strategy * replace hardcoded column name with variable to fix test * fix doc snippets * compares records without order and with caps timestamps precision in scd2 tests * defines create load id, stores package state typed, allows package state to be passed on, uses load_id as created_at if possible * creates new package to normalize from extracted package so state is carried on * bans direct pendulum import * uses timestamps with properly reduced precision in scd2 * selects newest state by load_id, not created_at. this will not affect execution as long as packages are processed in order * adds formating datetime literal to escape * renames x-row-hash to x-row-version * corrects json and pendulum imports * uses unique column in scd2 sql generation * renames arrow items literal * adds limitations to docs * passes only complete columns to arrow normalize * renames mode to disposition * saves parquet with timestamp precision corresponding to the destination and updates schema in the normalizer * adds transform that computes hashes of tables * tests arrow/pandas + scd2 * allows scd2 columns to be added to arrow items * various renames * uses generic caps when writing parquet if no destination context * disables coercing timestamps in parquet arrow writer --------- Co-authored-by: Jorrit Sandbrink <[email protected]> Co-authored-by: adrianbr <[email protected]> Co-authored-by: rudolfix <[email protected]>

Setting hide password property to `True`

docs/updated synapse documentation

* Added docs for 'deploy dlt with Prefect'. * Updated doc * Update deploy-with-prefect.md --------- Co-authored-by: Zaeem Athar <[email protected]>

* Introduce new config fields to filesystem destination configuration * Merge layout.py with path_utils.py * Adjust tests * Fix linting issues and extract common types * Use pendulum.now if current_datetime is not defined * Add more layout tests * Fix failing tests * Cleanup tests * Adjust tests * Enable auto_mkdir for local filesystem * Format code * Extract re-usable functions and fix test * Add invalid layout * Accept load_package timestamp in create_path * Collect all files and directories and then delete files first then directories * Recursively descend and collect files and directories to truncate for filesystem destination * Mock load_package timestamp and add more test layouts * Fix linting issues * Use better variable name to avoid shadowing builtin name * Fix dummy tests * Revert changes to filesystem impl * Use timestamp if it is specified * Cleanup path_utils and remove redundant code * Revert factory * Refactor path_utils and simplify things * Remove custom datetime format parameter * Pass load_package_timestamp * Remove custom datetime format and current_datetime parameter parameter * Cleanup imports * Fix path_utils tests * Make load_package_timestamp optional * Revert some changes in tests * Uncomment layout test item * Revert fs client argument changes * Fix mypy issue * Fix linting issues * Use all aggregated placeholder when checking layout * Enable auto_mkdir for filesystem config tests * Enable auto_mkdir for filesystem config tests * Enable auto_mkdir only for local filesystem * Add more placeholders * Remove extra layout options * Accepts current_datetime * Adjust type names * Pass current datetime * Resolve current_datetime if it is callable * Fix linting issues * Parametrize layout check test * Fix mypy issues * Fix mypy issues * Add more tests * Add more test checks for create_path flow * Add test flow comment * Add more tests * Add test to check callable extra placeholders * Test if unused layout placeholders are printed * Adjust timestamp selection logic * Fix linting issues * Extend default placeholders and remove redundant ones * Add quarter placeholder * Add test case for layout with quarter of the year * Adjust type alias for placeholder callback * Simplify code * Adjust tests * Validate placeholders in on_resolve * Avoid assigning current_datetime callback value during validation * Fix mypy warnings * Fix linting issues * Log warning message with yellow foreground * Remove useless test * Adjust error message for placeholder callback functions * Lowercase formatted datetime values * Adjust comments * Re-import load_storage * Adjust logic around timestamp and current datetime * Fix mypy warnings * Add test configuraion override when values passed via filesystem factory * Better logic to handle current timestamp and current datetime * Add more test checks * Introduce new InvalidPlaceholderCallback exception * Raise InvalidPlaceholderCallback instead of plain TypeError * Fix import ban error * Add more test cases for path utils layout check * Adjust text color * Small cleanup * Verify path parts and layout parts are equal * Remove unnecessary log * Add test with actual pipeline run and checks for callback calls * Revert conftest changes * Cleanup and move current_datetime calling inside path_utils * Adjust test * Add clarification comment * Use logger instead of printing out * Make InvalidPlaceholderCallback of transient kind * Move tests to new place * Cleanup * Add load_package_timestamp placeholder * Fix mypy warning * Add pytest-mock * Adjust tests * Adjust logic * Fix mypy issue * Use spy on logger for a test * Add test layout example with load_package_timestamp * Add test layout example with load_package_timestamp in pipeline tests * Check created paths and assert validity of placeholders * Rename variable to better fit the context * Assert arguments in extra placeholder callbacks * Make invalid placeholders exception more useful * Assert created path with spy values * Make error messages better for InvalidFilesystemLayout exception * Fix mypy errors * Also check created path * Run pipeline then create path and check if files exist * Fix mypy errors * Check all load packages * Add more layout samples using custom placeholders * Add more layout samples with callable extra placeholder * Add more layout samples with callable extra placeholder * Remove redundant import * Check expected paths created by create_path * Fix mypy issues * Add explanation comment to ALL_LAYOUTS * Re-use frozen datetime * Use dlt.common.pendulum * Use ensure_pendulum_datetime instead of pendulum.parse * Fix mypy issues * Add invalid layout with extra placeholder before table_name * Adjust exception message from invalid to missing placeholders

* Update tzdata to 2024.1 * Update lock hash

Co-authored-by: roman peresypkin <[email protected]>

* Pass options to parse iso like strings * Update testcase for iso detection

…1220) * Add section about new placeholders * Add basic information about additional placeholders * Add more examples of layout configuration * Add code snippet examples * Remove typing info * Add note * Add note about auto_mkdir * Try concurrent snippet linting * Try concurrent snippet linting * Adjust wording and format check_embedded_snippets.py * Uncomment examples and submit task to pool properly * Submit snippets to workers * Revert parallelization stuff * Comment out unused laoyuts * Fix mypy issues * Add a section about the recommended layout * Adjust text * Better text * Adjust section titles * Adjust code section language identifier * Fix mypy errors * More cosmetic changes for the doc --------- Co-authored-by: Violetta Mishechkina <[email protected]>

* clean some stuff * first messy version of filesystem state sync * clean up a bit * fix bug in state sync * enable state tests for all bucket providers * do not store state to uninitialized dataset folders * fix linter errors * get current pipeline from pipeline context * fix bug in filesystem table init * update testing pipe * move away from "current" file, rather iterator bucket path contents * store pipeline state in load package state and send to filesystem destination from there * fix tests for changed number of files in filesystem destination * remove dev code * create init file also to mark datasets * fix tests to respect new init file change filesystem to fallback, to old state loading when used as staging destination * update filesystem docs * fix incoming tests of placeholders * small fixes * adds some tests for filesystem state also fixes table count loading to work for all bucket destinations * fix test helper * save schema with timestamp instead of load_id * pr fixes and move pipeline state saving to committing of extracted packages * ensure pipeline state is only saved to load package if it has changed * adds missing state injection into state package * fix athena iceberg locations * fix google drive filesystem with missing argument

* remove staging-optimized replace strategy for synapse * fix athena iceberg locations --------- Co-authored-by: Jorrit Sandbrink <[email protected]> Co-authored-by: Dave <[email protected]>

* Revert tzdata update and update lock * Add guide for contributors about dependency updates * Adjust section title * Revert black update * Adjust section title * Revert lockfile * Update lock hash * Remove example

…ks` (#1247) * add bigquery datetime literal formatting * refactor not exists to not in for bigquery and databricks compatibility * mark main scd2 test as essential --------- Co-authored-by: Jorrit Sandbrink <[email protected]>

* Check for default schema and schema name in streamlit session * Do not show resource state if it is not available * Fix mypy errors * Remove the message if there is no schema in state * Simplify code

* fix test_dbt_commands profile * update dbt core tests

#1260) * Add seconds to filesystem date placeholders * Update docs * Fix formatting * Add milliseconds timestamps and placeholders

Co-authored-by: Jorrit Sandbrink <[email protected]>

* Adding dlthub_telemetry_endpoint to RunConfiguration. * Adding dlthub_telemetry_endpoint to test_configuration. * Segment Changes: 1. In init_segment() adding checks for env RUNTIME__TELEMETRY_ENDPOINT. 2. Update _SEGMENT_ENDPOINT based on env variable. Set default value if None provided with default write key. 3. Adjusting header based on endpoint. * Accessing values through config. * fix minor things and add new endpoint to common tests * add new endpoint url to local destinations * Adding new endpoint url to all destinations. * Adding test for init_segment. * formating tests. --------- Co-authored-by: Dave <[email protected]>

…ge keys are specified (#1225) * add sanity check to prevent missing config setup * fall back to append for merge without merge keys * add test for checking behavior of hard_delete without key * add schema warning * fix athena iceberg locations * add note in docs about merge fallback behavior * fix merge switching tests * fix one additional test with fallback

* add pydantic contracts implementation tests * add tests for removal of normalizer section in schema * add tests for contracts on nested dicts * start working on pyarrow tests * start adding tests of pyarrow normalizer * add pyarrow normalizer tests * add basic arrow tests * merge fixes * update tests --------- Co-authored-by: Marcin Rudolf <[email protected]>

* adding images and wordsmithing * changing image location * fixing image name

* Add max_table_nesting to resource decorator * Handle max_table_nesting in normalizer * Use dict.get to retrieve table from schema * Use schema.get_table and format code * Fix bugs and parametrize test * Add one more test case * Get table from schema.tables * Add comments and cleanup code * Add more test cases when max_table_nesting is overridden via the property setter * Assert property setter sets the value and a test with source and resource with max_table_nesting set * Clarify test scenario description * Add checks if max_nesting set in x-normalizer hints * Add another case when resource does not define but source has defined max_table_nesting * Check max_table_nesting propery accessor * Update resource.md

* Automatically create folders for local filesystem * Use simple equals check for protocol==file

* Add snowflake to application parameter to configuration * Set default application parameter if it is not specified * Adjust tests and for connection params * Use empty string to skip setting the application parameter * Set default value for application parameter * Fix if check bug * Uppercase SNOWFLAKE_APPLICATION_ID and re-use in tests * Add note in docs about application parameter for snowflake * Update text for snowflake's application connection parameter * Fix typo * Update docs/website/docs/dlt-ecosystem/destinations/snowflake.md Co-authored-by: VioletM <[email protected]> * Update snowflake.md * Update doc --------- Co-authored-by: VioletM <[email protected]>

* clean up pipeline utils a bit * add fs client base interface and use it in tests * make truncating code easier and fix bug in list table files * create truncate method * add some filesystem tests * adds two bug fixes * create dirs in loadjob * make pipeline fs_client function private for now --------- Co-authored-by: rudolfix <[email protected]>

netlify · 2024-04-24T19:53:13Z

✅ Deploy Preview for dlt-hub-docs ready!

Name	Link
🔨 Latest commit	`a529924`
🔍 Latest deploy log	https://app.netlify.com/sites/dlt-hub-docs/deploys/6629633f95f9910008db37ca
😎 Deploy Preview	https://deploy-preview-1278--dlt-hub-docs.netlify.app
📱 Preview on mobile	Toggle QR Code... Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

sh-rp · 2024-04-24T21:42:02Z

@rudolfix you need to add the ci-full label when creating the pr, if you add it later, it will only be applied if there is another push unfortunately

dat-a-man and others added 30 commits April 1, 2024 05:30

Updated synapse documentation

47c657a

feat(transform): implement columns pivot map function (#1152)

26fed1d

* feat(transform): implement columns pivot map function * add str test * support JSON paths * enumerate columns --------- Co-authored-by: Marcin Rudolf <[email protected]>

Added code snippets

d53606c

Fix formatting (#1206)

aabc320

Import Request and Response directly from requests (#1210)

d48942f

Import Request and Response directly from requests in client.py (…

1c01821

…#1211)

Corrected a snippet for naming = "direct" (#1215)

f5a6dbd

Update synapse.md

e04221e

Setting hide password property to `True`

Merge pull request #1167 from dlt-hub/docs/adding-info-to-synapse-docs

cdb8d5e

docs/updated synapse documentation

Added docs for deploying dlt with Prefect. (#1138)

11b0c68

* Added docs for 'deploy dlt with Prefect'. * Updated doc * Update deploy-with-prefect.md --------- Co-authored-by: Zaeem Athar <[email protected]>

picks file format matching item format (#1222)

0abad12

Update tzdata to 2024.1 (#1223)

a8d721b

* Update tzdata to 2024.1 * Update lock hash

fix athena iceberg's trailing location (#1230)

c1f2b8f

Co-authored-by: roman peresypkin <[email protected]>

Pass options to parse iso like strings (#1219)

95e11f3

* Pass options to parse iso like strings * Update testcase for iso detection

Remove staging-optimized replace strategy for synapse (#1231)

77e2499

* remove staging-optimized replace strategy for synapse * fix athena iceberg locations --------- Co-authored-by: Jorrit Sandbrink <[email protected]> Co-authored-by: Dave <[email protected]>

fixes bug, where configs where not injected for async functions (#1241)

89b8da5

adds options to write headers, change delimiter (#1239)

cc9685f

Revert tzdata update and update lock (#1238)

4750f62

* Revert tzdata update and update lock * Add guide for contributors about dependency updates * Adjust section title * Revert black update * Adjust section title * Revert lockfile * Update lock hash * Remove example

bumps to pre-release 0.4.9a2

902963c

enable all tests for bigquery always (#1245)

10d9e20

Check for default schema and schema name in streamlit session (#1155)

f6295f9

* Check for default schema and schema name in streamlit session * Do not show resource state if it is not available * Fix mypy errors * Remove the message if there is no schema in state * Simplify code

adds quoting style option to csv writer config (#1262)

88ac111

fix dbt tests (#1256)

39a1bd8

* fix test_dbt_commands profile * update dbt core tests

Add seconds and millisecond timestamps to filesystem date placeholders (

a799ec1

#1260) * Add seconds to filesystem date placeholders * Update docs * Fix formatting * Add milliseconds timestamps and placeholders

mark scd2 child table test essential (#1265)

a225a98

Co-authored-by: Jorrit Sandbrink <[email protected]>

zem360 and others added 12 commits April 23, 2024 19:29

a note on scd2 incoming high ts change (#1273)

6432ed7

Updated SQL documentation for windows authentication. (#1251)

c57fe0e

Updated autodetectors (#1253)

e0a7fe0

adding images and wordsmithing to Prefect walkthrough (#1276)

43f2e8f

* adding images and wordsmithing * changing image location * fixing image name

Create additional folders when coping files on local filesystem (#1263)

4763496

* Automatically create folders for local filesystem * Use simple equals check for protocol==file

bumps to version 0.4.9

10b9b47

Merge branch 'master' into devel

a529924

rudolfix added the ci full run the full load tests on pr label Apr 24, 2024

rudolfix merged commit efaedc2 into master Apr 25, 2024
59 of 69 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

master merge for 0.4.9 release #1278

master merge for 0.4.9 release #1278

rudolfix commented Apr 24, 2024

netlify bot commented Apr 24, 2024 •

edited

Loading

sh-rp commented Apr 24, 2024

master merge for 0.4.9 release #1278

master merge for 0.4.9 release #1278

Conversation

rudolfix commented Apr 24, 2024

Description

netlify bot commented Apr 24, 2024 • edited Loading

✅ Deploy Preview for dlt-hub-docs ready!

sh-rp commented Apr 24, 2024

netlify bot commented Apr 24, 2024 •

edited

Loading