Skip to content

Latest commit

 

History

History
102 lines (78 loc) · 5.57 KB

File metadata and controls

102 lines (78 loc) · 5.57 KB

Marketing Data Engine Installation Guide

Overview

Marketing Data Engine consists of several components - marketing data store (MDS), feature store, ML pipelines, the activation pipeline and dashboards. This document describes the sequencing of installing these components.

Prerequisites

Marketing Analytics Data Sources

  • Set up Google Analytics 4 Export to Bigquery. Please follow the set-up documentation. The current version of MDS doesn't use streaming export tables.
  • Set up Google Cloud Data Transfer Service to export Google Ads to Bigquery. Follow these instructions.

Make sure these exports use the same BigQuery location, either regional or multi-regional one. You can export the data into the same project or different projects - the MDS will be able to get the data from multiple projects.

Destination Projects

The Terraform scripts which are used to create the infrastructure don’t create Google Cloud projects themselves. These projects need to be created before the scripts can be run and their ids will be provided to the script via Terraform variables. It is possible to install the whole solution in a single project if the projected BigQuery data volume is small (megabytes or low digit gigabytes of additional data per day). For larger installations or when more granular access control is desired multiple projects can be used:

  • MDS data storage project for all the data curated by the solution.
  • MDS data processing project for hosting the Dataform scripts and running BigQuery curation jobs.
  • ML pipeline features engineering, model training, model inference and activation application.
  • Dashboard query processing project. In case of high volume Dashboard usage this project can enable BigQuery BI Engine to accelerate the query originated from the dashboard.

Permissions to create infrastructure and access source data

There are multiple ways to configure Google Cloud authentication for the Terraform installations. Terraform's Google Provider documentation lists all possible options on how the authentication can be done. This installation guide assumes that will be using the Application Default Credentials. You can change this by, for example, creating a dedicated service account and setting GOOGLE_IMPERSONATE_SERVICE_ACCOUNT environment variable before you run Terraform scripts. We will refer to the identity which is used in the Terraform scripts (your email or the dedicated service account email) the "Terraform principal" for brevity.

The Terraform principal will need to be granted certain permissions in different projects:

  • the Owner role in all projects where the solution is to be installed. Required to install products related to the solution.
  • the BigQuery Data Owner role on the datasets containing the GA4 and Ads data exports. Required to grant data read access to a service account which will be created by the Terraform scripts. Follow the BigQuery documentation on how to grant this permission on a dataset level.

Dataform Git Repository

MDS uses Dataform as the tool to run the data transformation. Dataform uses a private GitHub or GitLab repository to store SQL transformation scripts. Customers will need to create a repository and copy the SQL scripts from a companion GitHub repo before running the Terraform scripts.

  1. Create a private empty repository in your GitHub or GitLab account.
  2. On your computer, check out the blank GitHub or GitLab repository. Instructions below assume that the repository will be hosted on GitHub.
  3. On your computer or in a Cloud Shell, check out the GitHub repository which contains the MDS Dataform scripts.
    git clone https://github.com/googlecloudplatform/marketing-data-engine-dataform.git
    
  4. Push the contents of the source repository to your private repo
    cd marketing-data-engine-dataform
    git remote add copy https://github.com/<your-account>/<repo>.git
    git branch -M main
    git push -u copy main
    
  5. Clean the checkout directory
    cd ..
    rm -rf marketing-data-engine-dataform
  6. Generate a GitHub personal access token. It will be used by Dataform to access the repository. For details and additional guidance regarding token type, security and require permissions see Dataform documentation. You don't need to create a Cloud Secret - it will be done by the Terraform scripts. You will need to provide the Git URL and the access token to the Terraform scripts using a Terraform variable.

GA4 Measurement ID and Secrets

A MEASUREMENT ID and API SECRET generated in the Google Analytics UI. To create a new secret, navigate to: Admin > Data Streams > choose your stream > Measurement Protocol > Create

Installing the MDS, ML pipelines, the feature Store, and the activation pipeline

Once all the prerequisites are met you can install these components using Terraform scripts.

Follow instructions in terraform/README.md

Installing Dashboards

Looker Studio Dashboards can be installed by following instructions in ../python/lookerstudio/README.md