Skip to content

Latest commit

 

History

History
88 lines (63 loc) · 2.98 KB

File metadata and controls

88 lines (63 loc) · 2.98 KB

Training

In the first part of this project, we will create the training pipeline for a categorization model.

More specifically, you will train a model that should receive data related to products and return the best categories for them.

More info about the data can be found here.

Training Pipeline

Your training pipeline should be composed of the following steps:

  1. Data extraction
    Loads a dataset with product data from a specified path available in the environment variable DATASET_PATH.

  2. Data formatting
    Processes the dataset to use it for training and validation.

  3. Modeling
    Specifies a model to handle the categorization problem.

  4. Model validation
    Generates metrics about the model accuracy (precision, recall, F1, etc.) for each category and exports them to a specified path available in the environment variable METRICS_PATH.

  5. Model exportation
    Exports a candidate model to a specified path available in the environment variable MODEL_PATH.

Implementation

The training pipeline should be implemented using JupyterLab in a file named trainer.ipynb.

Use Markdown cells to document relevant details about your implementation. Remember that good documentation should focus on the why (e.g., why a specific type of model was chosen), since clean code should be enough to understand the how (e.g., how you selected a specific type of model).

Infrastructure

In this directory, we provide a containerized environment that uses docker and docker-compose to run JupyterLab. This should standardize the development environment and avoid compatibility problems.

To install docker and docker-compose, check their official documentation here and here. Both tools should be instalable at Linux, MacOS and Windows.

To execute JupyterLab, just run the following command:

docker-compose up --build

Then open the link shown in the end.

To install an OS package (Debian-based), add the name of the package in the file packages.txt. To intall a Python package (Pip-based), add the name and version of the package in the file requirements.txt.

Evaluation

The evaluation will be based on four criteria:

  1. Correctness
    If the solution runs without unexpected errors.

  2. Compliance
    If the solution respects all specified behaviors, in particular concerning inputs and outputs.

  3. Code Quality
    If the solution follows the principles of clean code and general good practices discussed in class.

  4. Documentation
    If the solution documents relevant decisions in the right measure.