Skip to content
This repository has been archived by the owner on Dec 29, 2022. It is now read-only.

Commit

Permalink
Merge pull request #15 from google/kw-args
Browse files Browse the repository at this point in the history
Update Readme and update Entity Read Files for new Hook
  • Loading branch information
oczos authored Dec 9, 2019
2 parents 841d30d + b92dad0 commit 864c444
Show file tree
Hide file tree
Showing 2 changed files with 8 additions and 7 deletions.
10 changes: 5 additions & 5 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -24,13 +24,14 @@ It is recommended that you install this solution through the Google Cloud Platfo

We recommend familiarising yourself with Composer [here](https://cloud.google.com/composer/docs/).

**_Orchestra_** is an open source project, built on top of Composer, for managing common Display and Video 360 ETL tasks such as downloading Entity Read Files and uploading them to BigQuery.
**_Orchestra_** is an open source project, built on top of Composer, that is custom operators for Airflow designed to solve the needs of Advertisers.

It is available on [github](https://github.com/google/orchestra).
Orchestra lets Enterprise Clients build their Advertising Data Lake out of the box and customize it to their needs

Below we will explain how to set up an environment for Composer, which files to use from Orchestra and how to grant access to your DV360 account to your Cloud Project.
Orchestra lets sophisticated clients automate workflows at scale for huge efficiency gains.

Orchestra is a fully open sourced Solution Toolkit for building enterprise data solutions on Airflow.

This will create a fully managed workflow that - in our example - will import your required Entity Read Files to BigQuery.


# Setting up your Orchestra environment in GCP
Expand All @@ -55,7 +56,6 @@ In you GCP Project menu (or directly through [this link](https://console.cloud.g
### Create a Composer environment
[Follow these steps to create a Composer environment](https://cloud.google.com/composer/docs/how-to/managing/creating) in Google Cloud Platform - please note that it can take up to 20/30 minutes.

For your installation you **must** set your Python version to 2, and we are assuming you are using the default service account.

Environment Variables, Tags and Configuration Properties (airflow.cfg) can all be left as standard and you can use the default values for number of nodes, machine types and disk size (you can use a smaller disk size if you want to save some costs).

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -319,8 +319,9 @@ def execute(self, context):

bq_base_cursor = self.bq_hook.get_conn().cursor()
bq_base_cursor.run_load(
self.bq_table,
self.schema, [entity_read_file_ndj],
destination_project_dataset_table=self.bq_table,
schema_fields=self.schema,
source_uris=[entity_read_file_ndj],
source_format='NEWLINE_DELIMITED_JSON',
write_disposition=self.write_disposition)
self.gcs_hook.delete(self.gcs_bucket, filename)
Expand Down

0 comments on commit 864c444

Please sign in to comment.