Read this article: https://cloud.google.com/blog/products/data-analytics/ethereum-bigquery-how-we-built-dataset
- Sign in to BigQuery https://bigquery.cloud.google.com/
- Create new datasets called
crypto_ethereum
,crypto_ethereum_raw
,crypto_ethereum_temp
- Create a new Google Storage bucket to store exported files https://console.cloud.google.com/storage/browser
Create a new Cloud Composer environment:
export ENVIRONMENT_NAME=ethereum-etl-0
gcloud composer environments create $ENVIRONMENT_NAME --location=us-central1 --zone=us-central1-a \
--disk-size=100GB --machine-type=n1-standard-4 --node-count=3 --python-version=3 --image-version=composer-1.8.3-airflow-1.10.3 \
--network=default --subnetwork=default
gcloud composer environments update $ENVIRONMENT_NAME --location=us-central1 --update-pypi-package=ethereum-etl==1.7.0
Create variables in Airflow (Admin > Variables in the UI):
Variable | Description |
---|---|
ethereum_output_bucket | GCS bucket to store exported files |
ethereum_provider_uris | Comma separated URIs of Ethereum nodes |
ethereum_destination_dataset_project_id | Project ID of BigQuery datasets |
notification_emails | email for notifications |
Check other variables in dags/ethereumetl_airflow/variables.py
.
> ./upload_dags.sh <airflow_bucket>
pip install -r requirements.txt
export PYTHONPATH='dags'
pytest -vv -s
Read this article: https://medium.com/@medvedev1088/query-ens-and-0x-events-with-sql-in-google-bigquery-4d197206e644