OpenRTB Forecasting Service

This service provides forecasted daily delivery metrics (impressions and reach) based on historical OpenRTB request data and targeting configurations.

System Architecture

Our system uses a combination of technologies to efficiently process, store, and analyze large volumes of OpenRTB request data:

Data Generation: A Python script generates mock data using OpenAI APIs, which is then imported into Supabase tables (ad_data).
OLTP Database: Supabase (PostgreSQL-based) serves as our transactional database, handling real-time inserts and updates.
OLAP Database: ClickHouse is used for historical analysis and efficient aggregation of large datasets.
Data Integration: Supabase uses a Foreign Data Wrapper to query ClickHouse, combining OLTP and OLAP capabilities.
API Layer: A Golang microservice built with the Echo framework handles API requests (/api/v1/forcast).
Forecasting Model: An XGBoost model, trained on historical data, predicts future impressions and reach.
Model Serving: A Python Flask server exposes the XGBoost model via an API (/forcast).

Data Processing Pipeline

Raw OpenRTB request logs (CSV format) are generated by the Python script and imported into Supabase.
Data is continuously synced between Supabase and ClickHouse using the Foreign Data Wrapper.
ClickHouse performs efficient aggregations on the large dataset.
The forecasting model is trained periodically (every X days) using data from ClickHouse.

Efficient Large-Scale Data Processing

Supabase provides near real-time transactional capabilities for fresh data.
ClickHouse's column-based storage enables fast aggregations and queries on millions of rows.
The Foreign Data Wrapper allows seamless querying between Supabase and ClickHouse, optimizing for both transactional and analytical workloads.

Aggregations and Indices

While the current implementation uses a single table, future optimizations could include:

Normalized tables in Supabase for efficient OLTP operations
Materialized views in ClickHouse for common aggregations
Consideration of Elasticsearch for improved querying capabilities

Analytics and Data Integration

Our system leverages the power of both OLTP and OLAP databases through an innovative use of Foreign Data Wrappers:

ClickHouse Integration with Supabase

We utilize ClickHouse's row-based capability for complex analytics while maintaining Supabase as our primary OLTP database.
A Foreign Data Wrapper is implemented in Supabase to seamlessly query ClickHouse data.

Views for Efficient Analytics

Custom views are created in ClickHouse to pre-aggregate data and optimize complex queries.
These ClickHouse views are then exposed in Supabase through the Foreign Data Wrapper.

Benefits of This Approach

Data Segregation: Analytical data resides in ClickHouse (OLAP), keeping our OLTP system (Supabase) lean and fast for transactional operations.
Efficient Querying: ClickHouse's column-based storage allows for rapid aggregations and sorting on millions of rows.
Flexible Analytics: Complex aggregations and ORDER BY operations on large datasets are performed efficiently in ClickHouse.
Seamless Integration: Users can query analytical data through Supabase as if it were local, thanks to the Foreign Data Wrapper.
Scalability: This architecture allows independent scaling of OLTP and OLAP workloads.

Forecasting Algorithm

Utilizes XGBoost, trained on historical data including attributes like age, device, etc.
Model weights are stored in xgboost_model.json
A Flask server hosts the model, separating ML operations from the main Golang service
Periodic retraining via cron job ensures the model improves with new data

Tech Stack

I have used of open-source projects to function effectively:

Golang: Our primary microservice architecture.
Python: Used for data scripts, model training, and the Flask API.
Supabase: Manages OLTP transactions.
ClickHouse: Handles OLAP transactions and analytics.

Directory Structure Explanation

README.md: Project documentation
ad_click_data.csv: Sample data file
assets/: Contains project-related images
ducky/: Python package for the Ducky module
forcasting/: Contains forecasting-related files
- model.py: Forecasting model implementation
- model_rest.py: REST API for the forecasting model
- scripts/: Data generation and import scripts
- xgboost_model.json: Serialized XGBoost model
microservices/: Golang microservice implementation
- core/: Core components of the microservice
- src/: Source code for the microservice
  - entity/: Data models
  - handler/: Request handlers
  - input/: Input validation structures
  - repository/: Data access layer
  - service/: Business logic layer
- main.go: Entry point for the microservice

API Workflow

The Golang microservice receives targeting configurations via /api/v1/forcast
It retrieves necessary data from Supabase/ClickHouse
The data is sent to the Python Flask server (/forcast)
The Flask server uses the XGBoost model to generate predictions
Results (daily impressions, reach, and predictions) are returned to the Golang service
The Golang service sends the final forecast to the client

Scalability and Performance

ClickHouse enables efficient querying and aggregation of billions of daily requests
The separation of OLTP and OLAP concerns allows for independent scaling of transactional and analytical workloads
Caching layer (implemented through ClickHouse's efficient querying) reduces load on the primary database for frequently accessed data

Tech Stack

Dillinger utilizes a variety of open-source projects to function effectively:

Golang: Our primary microservice architecture.
Python: Used for data scripts, model training, and the Flask API.
Supabase: Manages OLTP transactions.
ClickHouse: Handles OLAP transactions and analytics.

Security

To secure calls to third-party APIs i.e our Ml model, we use a JWT with the key adster. This ensures that our model is not accessible unauthorized .

Example token:

eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJjbGFpbXMiOnsiaWRlbnRpdHkiOiJzeXN0ZW1AYWRzdGVyIn0sImlkZW50aXR5Ijoic3lzdGVtQGFkc3RlciIsImlhdCI6MTY2Njg3OTc2MiwibmJmIjoxNjY2ODc5NzYyLCJleHAiOjE3NzY4Nzk3NjJ9.enuo0-fc_c4tvLeGBCaimMqd_7ArsRU3_pFEZo3gQfc

Running in Development

Since we use a combination of Golang and Python, you need both services to run simultaneously.

Open your favorite terminal and run these commands:

First Tab:

python model_rest.py

Secong Tab:

go run main.go

Future Improvements

Further normalization of Supabase tables for optimized OLTP performance
Implementation of more sophisticated caching strategies
Exploration of real-time model updating techniques to improve forecast accuracy

Ducky (Personal Project)

Since your team allowed gave a freebie that You are free to use AI tools(ChatGPT / Claude / Gemini) etc for help. I have used a custom made ducky (inspired from Ducky debugging) because I can't dump all my code to GPT evertime and make it tell me I know how to code, I decided to build something for quick iteration on CLI. All. Local.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

OpenRTB Forecasting Service

System Architecture

Data Processing Pipeline

Efficient Large-Scale Data Processing

Aggregations and Indices

Analytics and Data Integration

ClickHouse Integration with Supabase

Views for Efficient Analytics

Benefits of This Approach

Forecasting Algorithm

Tech Stack

Directory Structure Explanation

API Workflow

Scalability and Performance

Tech Stack

Security

Running in Development

Future Improvements

Ducky (Personal Project)

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
assets		assets
forcasting		forcasting
microservices		microservices
README.md		README.md
ad_click_data.csv		ad_click_data.csv

M-A-D-A-R-A/adster

Folders and files

Latest commit

History

Repository files navigation

OpenRTB Forecasting Service

System Architecture

Data Processing Pipeline

Efficient Large-Scale Data Processing

Aggregations and Indices

Analytics and Data Integration

ClickHouse Integration with Supabase

Views for Efficient Analytics

Benefits of This Approach

Forecasting Algorithm

Tech Stack

Directory Structure Explanation

API Workflow

Scalability and Performance

Tech Stack

Security

Running in Development

Future Improvements

Ducky (Personal Project)

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages