Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support uploading more files from the target directory to remote_target_path #1293

Open
pankajkoti opened this issue Oct 30, 2024 · 1 comment
Labels
area:config Related to configuration, like YAML files, environment variables, or executer configuration area:execution Related to the execution environment/mode, like Docker, Kubernetes, Local, VirtualEnv, etc triage-needed Items need to be reviewed / assigned to milestone

Comments

@pankajkoti
Copy link
Contributor

pankajkoti commented Oct 30, 2024

Currently, the remote_target_path configuration, added in PR #1224, only uploads files from the compiled directory within the target directory of the dbt project—and solely when ExecutionMode.AIRFLOW_ASYNC is enabled. However, the target directory contains several other files and folders that could benefit users if they were also uploaded to remote_target_path.

Beyond the compiled directory, the target directory typically includes:

  1. run/ folder
  2. graph.gpickle
  3. graph_summary.json
  4. manifest.json
  5. partial_parse.msgpack
  6. run_results.json
  7. semantic_manifest.json

A specific request was made in a Slack conversation to have run_results.json uploaded and accessible in remote_target_path, highlighting its value to users.

We should evaluate the potential benefits of supporting uploads for these additional files and folders and explore enabling this feature across all execution modes, not just ExecutionMode.AIRFLOW_ASYNC. Additionally, it may be worthwhile to consider uploading files from the compiled directory in other execution modes if it proves beneficial.

We could potentially create sub tasks for each of these files & folders for evaluation of the benefits & supporting to upload those to the remote_target_path

@dosubot dosubot bot added area:config Related to configuration, like YAML files, environment variables, or executer configuration area:execution Related to the execution environment/mode, like Docker, Kubernetes, Local, VirtualEnv, etc labels Oct 30, 2024
@pankajkoti pankajkoti added the triage-needed Items need to be reviewed / assigned to milestone label Oct 30, 2024
@joppedh
Copy link

joppedh commented Oct 31, 2024

@pankajkoti partial_parse would speed-up the runtime. Now on each task run it still needs to parse, even when providing a manifest.json

[2024-10-31, 06:03:48 UTC] {logging_mixin.py:188} INFO - 06:03:48  Unable to do partial parsing because saved manifest not found. Starting full parse.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:config Related to configuration, like YAML files, environment variables, or executer configuration area:execution Related to the execution environment/mode, like Docker, Kubernetes, Local, VirtualEnv, etc triage-needed Items need to be reviewed / assigned to milestone
Projects
None yet
Development

No branches or pull requests

2 participants