Skip to content
This repository has been archived by the owner on Dec 29, 2022. It is now read-only.

Commit

Permalink
Move erf and sdf documentation to their own markdown. Remove reportin…
Browse files Browse the repository at this point in the history
…g documentation

Change-Id: I973e7e699f627b529812300f2d9e7774e4efde4d
  • Loading branch information
KingsleyKelly committed Oct 31, 2019
1 parent 8af2977 commit 2bf53c1
Show file tree
Hide file tree
Showing 3 changed files with 276 additions and 267 deletions.
268 changes: 1 addition & 267 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -101,25 +101,6 @@ You need to have partner-level access to your DV360 account to be able to add a
* Save!


### DV360 Entity Read File Authorization
Entity Read Files are large json files showing the state of an account. These are held in Google Cloud Storage. Access is granted via a **Google Group**.

This is found in

**Settings > Basic Details > Entity Read Files Configuration > Entity Read Files Read Google Group**

You should add the service account to the Entity **Read** Files Read Google Group.


![alt_text](docs/images/erf.png "image_tooltip")

Add the **service account** email to this Google Group to allow it to read private entity read files.

You can find more info on Entity Read Files access here: https://developers.google.com/bid-manager/guides/entity-read/overview.


### Multiple Partners
If you are intending to use many google groups, it is also possible to set up a single Google Group containing all other Google Groups. You can then Add the Service account to this Google Group to grant access to all accounts at once


# Configuring Orchestra
Expand Down Expand Up @@ -303,23 +284,13 @@ Image2:
### Adding Workflows
As with any other Airflow deployment, you will need DAG files describing your Workflows to schedule and run your tasks; plus, you'll need hooks, operators and other libraries to help building those tasks.

You can find the core files for Orchestra in [our github repository:](https://github.com/google/orchestra) clone the repo (or directly download the files) and you will obtain the following folders:



* **dags:** includes a sample DAG file to upload multiple partners ERF files from the Cloud Storage Bucket to BigQuery
* **hooks:** includes the hooks needed to connect to the reporting APIs of GMP platforms (CM and DV360)
* **operators**: includes two subfolders for basic operators for CM and DV360 APIs, respectively
* **schema:** includes files describing the structure of most CM and DV360 entities (can be useful when creating new report or to provide the schema to create a BQ table)
* **utils**: a general purpose folder to include utility files
You can find the core files for Orchestra in [our github repository:](https://github.com/google/orchestra) clone the repo (or directly download the files)

You can then design the dags you wish to run and add them to the **dags** folder.

Upload all the DAGs and other required files to the DAGs Storage Folder that you can access from the Airflow UI.


###



![alt_text](docs/images/buckets.png "image_tooltip")
Expand All @@ -329,243 +300,6 @@ This will automatically generate the DAGs and schedule them to run (you will be

From now, you can use (the Composer-managed instance of) Airflow as you normally would - including the different available functionalities for scheduling, troubleshooting, …

With the sample DAG provided, if all proper accesses have been granted to the Service Account, you will be able to see the results directly in BigQuery: in the Dataset you've selected in the corresponding Variable, you will find different tables for all the Entity Read Files entities that you've chosen to import.

Congratulations!


# GMP Reporting

The example workflow we've just set up (importing Entity Read Files from Cloud Storage to Big Query) doesn't require access to DV360 (or in general GMP) reports, but that's a task that you might end up needing in other workflows. For instance, rather (or in addition to) Entity Read Files data, you might want to add aggregated performance data to BigQuery.

In order to be able to do this, you'll need to **setup a Connection to GMP reporting** (i.e. specifying your Service Accounts credentials to be used to leverage GMP APIs) and then **create a report** (and collect its results).


### Create the Airflow Connection to GMP reporting

An [Airflow Connection](https://airflow.apache.org/howto/manage-connections.html) to the GMP Reporting API is needed for the tasks which will collect DV360 (or CM) reports.

First of all, you will need to enable your Service Account to access GMP Reporting API (and the DV360 Reporting API in particular):



1. From the _API & Services > Library_ menu in the GCP console, **look for and enable the DoubleClick Bid Manager API** (DoubleClick Bid Manager is the former name of DV360)
1. If necessary, also enable the DCM/DFA Reporting And Trafficking API and/or the _DoubleClick Search API _for CM and SA360 reporting respectively.
1. From the IAM & admin > Service Accounts menu in the GCP console, **look for the Compute Engine default service account **(or your custom Service Account if you aren't using the default one) and click on the three-dots button under "Action" to **_Create a key_**. Pick the JSON option and store the file securely.
1. **Upload the JSON keyfile** you've just downloaded to the Storage Bucket linked to your Composer environment (the same bucket where you're uploading DAG and other python files, but in another subfolder - e.g. "data")

You are now ready to access the Connections list in the Airflow UI (_Admin > Connections_) and click on _Create_.

Use the following values (please note that the list of fields changes depending on the "Connection Type" you select, so don't worry if you don't see these exact fields initially):


<table>
<tr>
<td><strong>Field</strong>
</td>
<td><strong>Value</strong>
</td>
</tr>
<tr>
<td><strong>Conn Id</strong>
</td>
<td>gmp_reporting
</td>
</tr>
<tr>
<td><strong>Conn Type</strong>
</td>
<td>Google Cloud Platform
</td>
</tr>
<tr>
<td><strong>Project Id</strong>
</td>
<td>[Your Cloud Project ID]
</td>
</tr>
<tr>
<td><strong>Keyfile Path</strong>
</td>
<td>The path to the JSON file you've uploaded in the Storage Bucket during the previous steps. In particular, if you have uploaded your keyfile in a <em>data</em> folder, enter:
<p>
"/home/airflow/gcs/data/[keyfile_name].json"
</td>
</tr>
<tr>
<td><strong>Keyfile JSON </strong>
</td>
<td>[empty]
</td>
</tr>
<tr>
<td><strong>Scopes (comma separated)</strong>
</td>
<td>https://www.googleapis.com/auth/doubleclickbidmanager
<p>
Or, if necessary, also add other scopes such as:
<p>
https://www.googleapis.com/auth/dfareporting
<p>
https://www.googleapis.com/auth/doubleclicksearch
</td>
</tr>
</table>



### Creating a DV360 report
You can follow these **simple steps to have your Service Account create a DV360 report**, so that a subsequent task can, following our example, collect the report result and push it to BigQuery.

**_It's important that the Service Account creates the report because if you create it directly in the DV360 UI the Service Account won't be able to access the resulting files!_**

The Service Account needs to have read access to the Partners/Advertisers you're running reports for.

In this example below we are providing a DAG file (_dv360_create_report_dag.py_) which will let you manually launch the corresponding **DV360_Create_Query **DAG and will create a new report, but in order to do that you must first configure which kind of report you want to create.

To do this, you will need to add and populate a specific Variable in the Airflow UI, called **dv360_report_body**, which corresponds to the "body" of the DV360 query to be created.

Comprehensive documentation on how this object can be populated with Filters, Dimensions (GroupBys), Metrics can be found [here](https://developers.google.com/bid-manager/v1/queries#resource), and we suggest you first test your request through the API Explorer: https://developers.google.com/apis-explorer/#p/doubleclickbidmanager/v1/doubleclickbidmanager.queries.createquery

(clicking on the small downside arrow on the right you can switch from "Structured editor" to "Freeform editor", which allows you to directly copy-and-paste the JSON structure).

Here's an example of a basic request body:


```
{
"kind": "doubleclickbidmanager#query",
"metadata": {
"title": "myTest",
"dataRange": "LAST_30_DAYS",
"format": "CSV",
"sendNotification": false
},
"params": {
"type": "TYPE_GENERAL",
"groupBys": [
"FILTER_ADVERTISER",
"FILTER_INSERTION_ORDER"
],
"filters": [
{
"type": "FILTER_PARTNER",
"value": "12345678"
}
],
"metrics": [
"METRIC_IMPRESSIONS",
"METRIC_CLICKS"
],
"includeInviteData": true
},
"schedule": {
"frequency": "DAILY",
"nextRunMinuteOfDay": 0,
"nextRunTimezoneCode": "Europe/London"
}
}
```


You can then (manually) launch your DAG from the Airflow UI: identify the DAG named "DV360_Create_Query" in the list and launch it clicking on the "Trigger Dag" link (play-button like icon).

Once the DAG has completed successfully, you will find a new variable called **dv360_latest_report_id** in the list of Variables, populated with the ID of the generated report that you can use in the following steps of your pipeline.

### Add Dependencies

We use the [requests](http://docs.python-requests.org/en/master/) library to handle larger report files.

You can add this to the project via the Environment Page

Full details are covered [here](https://cloud.google.com/composer/docs/how-to/using/installing-python-dependencies#install-package).

Simply follow the instructions and add requests as the name of the package (no version required)

### More DV360 Operators
We're providing other DV360 operators, to be used in your DAGs, so that you're able to run reports, check their status and read their results:


<table>
<tr>
<td><strong>Operator</strong>
</td>
<td><strong>Function</strong>
</td>
</tr>
<tr>
<td><strong>dv360_run_query_operator</strong>
</td>
<td>Takes a query ID and runs the report (useful when you haven't set up the report to run on a schedule)
</td>
</tr>
<tr>
<td><strong>dv360_get_report_file_path_operator</strong>
</td>
<td>Given a query ID, collects the latest file path of the resulting report file and stores it in an XCOM variable.
</td>
</tr>
<tr>
<td><strong>dv360_download_report_by_file_path</strong>
</td>
<td>Reads the report file path from a XCOM variable and downloads it to a Cloud Storage bucket.
</td>
</tr>
<tr>
<td><strong>dv360_upload_bq_operator</strong>
</td>
<td>Loads a report CSV file from Cloud Storage, inferes the schema and uploads the data to a BigQuery table. Note: the BigQuery dataset needs to exist.
</td>
</tr>
</table>


# Structured Data Files

Below we will explain how to set up a workflow which will import your [DV360 Structured Data Files (SDFs)](https://developers.google.com/bid-manager/guides/structured-data-file/format) to BigQuery.

### Create a new Airflow Connection or update an existing one

If you haven’t created an Airflow Connection for GMP APIs follow [Create the Airflow Connection to GMP reporting](#create-the-airflow-connection-to-gmp-reporting) step to create one. Make sure that in the last step, the following scopes are added:

https://www.googleapis.com/auth/doubleclickbidmanager,
https://www.googleapis.com/auth/devstorage.full_control,
https://www.googleapis.com/auth/bigquery

### Create an SDF advertisers report

This report will contain all active advertiser IDs with their Partner IDs which will be used to retrieve SDFs via API. To create a new SDF advertisers report, in Airflow, please manually run:

_dv360_create_sdf_advertisers_report_dag_

The above DAG will create a scheduled DV360 report which will run daily. After it’s successfully completed, you should see that _dv360_sdf_advertisers_report_id_ Airflow variable has updated with a newly created report ID.

Note: These scheduled reports expire on the 1st of January 2029.

### Run the SDF advertisers report

Once you’ve created the SDF advertisers report, please manually run the following DAG:

_dv360_run_sdf_advertisers_report_dag_

After it’s completed, manually run:

_dv360_get_sdf_advertisers_from_report_dag_

After it’s completed, you should be able to verify that _dv360_sdf_advertisers_ Airflow variable now contains relevant partner and advertiser IDs which will be used to retrieve SDFs. The above DAG will be automatically configured to run daily.

### Upload SDFs to BigQuery

Please manually run the following DAG:

_dv360_sdf_uploader_to_bq_dag_

To upload SDFs to BigQuery. The process will use the dictionary stored in the _dv360_sdf_advertisers_ Airflow variable to make API requests and store responses in your BigQuery dataset.

Once the DAG has completed successfully, you will find new tables in your BigQuery dataset. Tables will correspond to SDF types you’ve configured to retrieve in the _sdf_file_types_ Airflow variable (e.g. if you’ve configured “LINE_ITEMS”, you should see a table called “SDFLineItem”). The above DAG will be automatically configured to run daily.

To sum up, we’ve scheduled two DAGs which run daily and independently from each other. The first DAG downloads a report and updates an Airflow variable with your partner and advertiser IDs. The second DAG fetches Structured Data Files using partner and advertiser IDs from the Airflow variable and uploads them to BigQuery.


# Additional info
Expand Down
Loading

0 comments on commit 2bf53c1

Please sign in to comment.