Skip to content

Airline project based on xgboost algorithm to estimate the load factor of the competitors from advance purchase 15 to 0.

Notifications You must be signed in to change notification settings

fabiot21/machine-learning-competitor-load-factor-estimator

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Competitor Load Factor Estimator

Project based on xgboost algorithm to estimate the load factor of the competitors from advance purchase 15 to 0.

Getting Started

These instructions will get you a copy of the project up and running on your local machine for development.

Prerequesites

The application is authenticatd with the GOOGLE_APPLICATION_CREDENTIALS application crendetial

  • Python 3.6 packages
google-cloud-bigquery==1.10.0
python-dateutil==2.8.0
numpy==1.16.3
pandas==0.24.2
pyod==0.6.8
pyarrow==0.13.0
scikit-learn==0.20.3
xgboost==0.82
  • BigQuery tables
desa-cli-aa360.IORM_MINUTA_DISPO.MINUTA_DISPO_BI
desa-cli-aa360.IORM_INFARE_RM.FT_INFARE_AR
desa-cli-aa360.IORM_INFARE_RM.FT_INFARE_CL
desa-cli-aa360.IORM_INFARE_RM.FT_INFARE_CO
desa-cli-aa360.IORM_INFARE_RM.FT_INFARE_EC
desa-cli-aa360.IORM_INFARE_RM.FT_INFARE_PE

Installing

git clone [email protected]:revenue-latam/competitor-load-factor-estimator.git
cd competitor-load-factor-estimator
virtualenv env -p python3.6
source env/bin/activate
pip3 install -r requirements.txt

Running

python3 main.py [GBQ_DATASET] [GBQ_TABLE] [COUNTRY CODE]

Example

python3 main.py IORM_MODELS competitors_load_factors CL

Project Info

This project aims to estimate the load factor of the competitors using machine learning techniques for domestic routes. To achieve this, it was necessary to collect data from historical flights of LATAM.

The data was preprocessed to remove canceled flights (without ap 0), fill missing price values and perform feature engineering that involves handling dummy variables, apply timesteps of prices and their changes over time.

A machine learning model is then trained using the xgboost algorithm with one set of hyperparameters for each country.

With the trained model the process iterates over the competitor airlines and process the downloaded data from infare database, and predicts their load factor.

About Training process

FIles: train.py preprocess.py query_train.py

The training process generates a xgboost model based on LATAM data and uses the following variables:

  • Advance purchase
  • Route
  • Month
  • Day of week
  • Hour of departure
  • Price with N time steps
  • Price delta based on time steps
  • Load factor

About Prediction process

FIles: predict.py preprocess.py query_competitor.py

The prediction process uses the model generated by the training process and upload the resulting dataframe to google bigquery.

  • Output
observation_date
carrier
origin
destination
flight
departure_date
departure_time
predicted_load_factor

Configuration

The config.py file is used to configure the parameters and initial settings of the process.

Parameters

  • MAX_AP: Maximum advance purchase for training and prediction.
  • STEPS: Number of time steps for price value.
  • YEARS_TRAIN: Used to get the time window for the training dataset.
  • MONTH_TRAIN: Used to get the time window for the training dataset.
  • AIRLINES: Dictionary that contains a list of target airlines for every key country code
  • XGB_PARAMS: Dictionary that contains xgboost hyperparameters for every key country code

About

Airline project based on xgboost algorithm to estimate the load factor of the competitors from advance purchase 15 to 0.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages