Training

In the first part of this project, we will create the training pipeline for a categorization model.

More specifically, you will train a model that should receive data related to products and return the best categories for them.

More info about the data can be found here.

Training Pipeline

Your training pipeline should be composed of the following steps:

Data extraction
Loads a dataset with product data from a specified path available in the environment variable DATASET_PATH.
Data formatting
Processes the dataset to use it for training and validation.
Modeling
Specifies a model to handle the categorization problem.
Model validation
Generates metrics about the model accuracy (precision, recall, F1, etc.) for each category and exports them to a specified path available in the environment variable METRICS_PATH.
Model exportation
Exports a candidate model to a specified path available in the environment variable MODEL_PATH.

Implementation

The training pipeline should be implemented using JupyterLab in a file named trainer.ipynb.

Use Markdown cells to document relevant details about your implementation. Remember that good documentation should focus on the why (e.g., why a specific type of model was chosen), since clean code should be enough to understand the how (e.g., how you selected a specific type of model).

Infrastructure

In this directory, we provide a containerized environment that uses docker and docker-compose to run JupyterLab. This should standardize the development environment and avoid compatibility problems.

To install docker and docker-compose, check their official documentation here and here. Both tools should be instalable at Linux, MacOS and Windows.

To execute JupyterLab, just run the following command:

docker-compose up --build

Then open the link shown in the end.

To install an OS package (Debian-based), add the name of the package in the file packages.txt. To intall a Python package (Pip-based), add the name and version of the package in the file requirements.txt.

Evaluation

The evaluation will be based on four criteria:

Correctness
If the solution runs without unexpected errors.
Compliance
If the solution respects all specified behaviors, in particular concerning inputs and outputs.
Code Quality
If the solution follows the principles of clean code and general good practices discussed in class.
Documentation
If the solution documents relevant decisions in the right measure.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Training

Training Pipeline

Implementation

Infrastructure

Evaluation

Files

README.md

Latest commit

History

README.md

File metadata and controls

Training

Training Pipeline

Implementation

Infrastructure

Evaluation