This service provides forecasted daily delivery metrics (impressions and reach) based on historical OpenRTB request data and targeting configurations.
Our system uses a combination of technologies to efficiently process, store, and analyze large volumes of OpenRTB request data:
- Data Generation: A Python script generates mock data using OpenAI APIs, which is then imported into Supabase tables (
ad_data
). - OLTP Database: Supabase (PostgreSQL-based) serves as our transactional database, handling real-time inserts and updates.
- OLAP Database: ClickHouse is used for historical analysis and efficient aggregation of large datasets.
- Data Integration: Supabase uses a Foreign Data Wrapper to query ClickHouse, combining OLTP and OLAP capabilities.
- API Layer: A Golang microservice built with the Echo framework handles API requests (
/api/v1/forcast
). - Forecasting Model: An XGBoost model, trained on historical data, predicts future impressions and reach.
- Model Serving: A Python Flask server exposes the XGBoost model via an API (
/forcast
).
- Raw OpenRTB request logs (CSV format) are generated by the Python script and imported into Supabase.
- Data is continuously synced between Supabase and ClickHouse using the Foreign Data Wrapper.
- ClickHouse performs efficient aggregations on the large dataset.
- The forecasting model is trained periodically (every X days) using data from ClickHouse.
- Supabase provides near real-time transactional capabilities for fresh data.
- ClickHouse's column-based storage enables fast aggregations and queries on millions of rows.
- The Foreign Data Wrapper allows seamless querying between Supabase and ClickHouse, optimizing for both transactional and analytical workloads.
While the current implementation uses a single table, future optimizations could include:
- Normalized tables in Supabase for efficient OLTP operations
- Materialized views in ClickHouse for common aggregations
- Consideration of Elasticsearch for improved querying capabilities
Our system leverages the power of both OLTP and OLAP databases through an innovative use of Foreign Data Wrappers:
- We utilize ClickHouse's row-based capability for complex analytics while maintaining Supabase as our primary OLTP database.
- A Foreign Data Wrapper is implemented in Supabase to seamlessly query ClickHouse data.
- Custom views are created in ClickHouse to pre-aggregate data and optimize complex queries.
- These ClickHouse views are then exposed in Supabase through the Foreign Data Wrapper.
- Data Segregation: Analytical data resides in ClickHouse (OLAP), keeping our OLTP system (Supabase) lean and fast for transactional operations.
- Efficient Querying: ClickHouse's column-based storage allows for rapid aggregations and sorting on millions of rows.
- Flexible Analytics: Complex aggregations and
ORDER BY
operations on large datasets are performed efficiently in ClickHouse. - Seamless Integration: Users can query analytical data through Supabase as if it were local, thanks to the Foreign Data Wrapper.
- Scalability: This architecture allows independent scaling of OLTP and OLAP workloads.
- Utilizes XGBoost, trained on historical data including attributes like age, device, etc.
- Model weights are stored in
xgboost_model.json
- A Flask server hosts the model, separating ML operations from the main Golang service
- Periodic retraining via cron job ensures the model improves with new data
I have used of open-source projects to function effectively:
- Golang: Our primary microservice architecture.
- Python: Used for data scripts, model training, and the Flask API.
- Supabase: Manages OLTP transactions.
- ClickHouse: Handles OLAP transactions and analytics.
README.md
: Project documentationad_click_data.csv
: Sample data fileassets/
: Contains project-related imagesducky/
: Python package for the Ducky moduleforcasting/
: Contains forecasting-related filesmodel.py
: Forecasting model implementationmodel_rest.py
: REST API for the forecasting modelscripts/
: Data generation and import scriptsxgboost_model.json
: Serialized XGBoost model
microservices/
: Golang microservice implementationcore/
: Core components of the microservicesrc/
: Source code for the microserviceentity/
: Data modelshandler/
: Request handlersinput/
: Input validation structuresrepository/
: Data access layerservice/
: Business logic layer
main.go
: Entry point for the microservice
- The Golang microservice receives targeting configurations via
/api/v1/forcast
- It retrieves necessary data from Supabase/ClickHouse
- The data is sent to the Python Flask server (
/forcast
) - The Flask server uses the XGBoost model to generate predictions
- Results (daily impressions, reach, and predictions) are returned to the Golang service
- The Golang service sends the final forecast to the client
- ClickHouse enables efficient querying and aggregation of billions of daily requests
- The separation of OLTP and OLAP concerns allows for independent scaling of transactional and analytical workloads
- Caching layer (implemented through ClickHouse's efficient querying) reduces load on the primary database for frequently accessed data
Dillinger utilizes a variety of open-source projects to function effectively:
- Golang: Our primary microservice architecture.
- Python: Used for data scripts, model training, and the Flask API.
- Supabase: Manages OLTP transactions.
- ClickHouse: Handles OLAP transactions and analytics.
To secure calls to third-party APIs i.e our Ml model, we use a JWT with the key adster
. This ensures that our model is not accessible unauthorized .
Example token:
eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJjbGFpbXMiOnsiaWRlbnRpdHkiOiJzeXN0ZW1AYWRzdGVyIn0sImlkZW50aXR5Ijoic3lzdGVtQGFkc3RlciIsImlhdCI6MTY2Njg3OTc2MiwibmJmIjoxNjY2ODc5NzYyLCJleHAiOjE3NzY4Nzk3NjJ9.enuo0-fc_c4tvLeGBCaimMqd_7ArsRU3_pFEZo3gQfc
Since we use a combination of Golang and Python, you need both services to run simultaneously.
Open your favorite terminal and run these commands:
First Tab:
python model_rest.py
Secong Tab:
go run main.go
- Further normalization of Supabase tables for optimized OLTP performance
- Implementation of more sophisticated caching strategies
- Exploration of real-time model updating techniques to improve forecast accuracy
Since your team allowed gave a freebie that You are free to use AI tools(ChatGPT / Claude / Gemini) etc for help
.
I have used a custom made ducky (inspired from Ducky debugging) because I can't dump all my code to GPT evertime and make it tell me I know how to code, I decided to build something for quick iteration on CLI. All. Local.