diff --git a/notebooks/MLOps/index.ipynb b/notebooks/MLOps/index.ipynb
new file mode 100644
index 0000000..06f279d
--- /dev/null
+++ b/notebooks/MLOps/index.ipynb
@@ -0,0 +1,480 @@
+{
+ "cells": [
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "LDerBS_foQl2"
+ },
+ "source": [
+ "# MLOps\n",
+ "\n",
+ "\n"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "a5z34ayPNA13"
+ },
+ "source": [
+ "# Table of Contents\n",
+ "1. [Introduction](#introduction)\n",
+ "2. [Machine Learning Lifecycle](#mll)\n",
+ "3. [MLOps Tools](#tools)\n",
+ " * [Data Management](#data)\n",
+ " * [Modeling](#model)\n",
+ " * [Operationalization](#operation)\n",
+ "4. [Example](#code_example)\n",
+ "5. [Conclusion](#conclusion)\n",
+ "6. [References](#references)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "2OotVuBgfqME"
+ },
+ "source": [
+ "## Introduction "
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "ZeHq0bAh4c2u"
+ },
+ "source": [
+ "MLOps, also known as Machine Learning Operations for Production, is a set of standardized practices that can be utilized to build, deploy, and govern the lifecycle of ML models. This setup helps to ease the interaction among cross-functional teams and provides an automated platform to keep track of everything required for the complete cycle of ML models. MLOps practices also result in increased scalability, security, and reliability of the ML systems, leading to shorter development cycles and escalated profits from the ML projects. \n",
+ "\n",
+ "
\n",
+ "
\n",
+ "\n",
+ "
\n", + " \n", + "
\n", + "\n", + "\n", + "\n" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "gYV4Z8O_fqMF" + }, + "source": [ + "## Machine Learning Lifecycle " + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "-thmGRK44qD6" + }, + "source": [ + "MLOps lifecycle has seven different stages. All the processes happen iteratively, and the success of the entire machine learning system comes with the successful execution of each of these stages.\n", + "\n", + "The machine learning lifecycle is the process of developing, deploying, and managing a machine learning model for a specific application. The lifecycle typically consists of:\n", + "\n", + "\n", + " \n", + "
\n", + "\n", + "ML Development: This is the basic step that involves creating a complete pipeline beginning from data processing to model training and evaluation codes. \n", + "\n", + "Model Training: Once the setup is ready, the next logical step is to train the model. Here, continuous training functionality is also needed to adapt to new data or address specific changes. \n", + "\n", + "Model Evaluation: Performing inference over the trained model and checking the accuracy/correctness of the output results. \n", + "\n", + "Model Deployment: When the proof of concept stage is accomplished, the other part is to deploy the model according to the industry requirements to face the real-life data. \n", + "\n", + "Prediction Serving: After deployment, the model is now ready to serve predictions over the incoming data. \n", + "\n", + "Model Monitoring: Over time, problems such as concept drift can make the results inaccurate hence continuous monitoring of the model is essential to ensure proper functioning. \n", + "\n", + "Data and Model Management: It is a part of the central system that manages the data and models. It includes maintaining storage, keeping track of different versions, ease of accessibility, security, and configuration across various cross-functional teams. \n", + "\n", + "\n", + "Models are deployed across the organization and in various systems without a consistent way to monitor them. Models have been in production for a long time and never refreshed." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "iEn8GUyrfqMO" + }, + "source": [ + "## MLOps Tools " + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "CjLahM_Vbqz2" + }, + "source": [ + "One of the challenges in ML lifecycle management is manual labor. Every step and the transition between steps are manual. It means data scientists need to collect, analyze, and process data for each application manually. They need to examine their older models to develop new ones and manually fine-tune each time. A large amount of time is allocated to model monitoring to prevent performance degradation. A successful deployment of machine learning models at scale requires automation of steps of the lifecycle. Automation decreases the time allocated to resource-consuming steps such as feature engineering, model training, monitoring, and retraining. It frees up time to rapidly experiment with new models.\n", + "\n", + "The MLOps tools help organizations apply DevOps practices to the process of creating and using AI and machine learning models. These tools are typically used by machine learning engineers, data scientists, and DevOps engineers. MLOps tools can be divided into three major areas.\n", + "\n", + "\n", + " \n", + "
\n" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "_4ZBEZCDfqMO" + }, + "source": [ + "### Data Management " + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "UcGXgwZ_fqMP" + }, + "source": [ + "MLOps Tools for data management consist of data labeling tools which are used to label large volumes of data such as texts, images, or audios and data versioning tools which enable managing different versions of datasets and storing them in an accessible and well-organized way.\n" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "zoqWqW24fqMQ" + }, + "source": [ + "### Modeling " + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "4L4mL1nt5d99" + }, + "source": [ + "MLOps Tools for modeling consist of feature engineering tools that automate the process of extracting useful features from raw datasets to create better training data for machine learning models like [Feast](https://github.com/feast-dev/feast). Another tool is for experiment tracking which save all the necessary information about different experiments like [MLFlow](https://mlflow.org) and the last tool is for Hyperparameter Optimization that automate the process of searching and selecting hyperparameters that give optimal performance for machine learning models." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "UstdBiu75eKS" + }, + "source": [ + "### Operationalization " + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "pxixV6SG6ldE" + }, + "source": [ + "MLOps Tools for operationalization consist of model deployment tools which facilitate integrating ML models into a production environment to make predictions like [Kubeflow](https://www.kubeflow.org). the other tool concerning operationalization is for model monitoring which detect data drifts and anomalies over time and allow setting up alerts in case of performance issues." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "29Hx95TFfqMU" + }, + "source": [ + "## Example \n" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "CdJiNHvH6nWn" + }, + "source": [ + "In this section we see an example of ml lifrcycle using MLFlow. MLflow is an open source platform for managing the end-to-end machine learning lifecycle. It is designed to work with any machine learning library, determine most things about your code by convention, and require minimal changes to integrate into an existing codebase.\n", + "First, we install and import nessecary packages." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Next, we define our metric for evaluation." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "In the next cell, we first read the wine-quality csv file from the URL and then split the data into training and test sets.\n", + "Then, we split the target from our data set which is the quality column and at the end we register our model.\n", + "\n", + "The mlflow.start_run function start a new MLflow run, setting it as the active run under which metrics and parameters will be logged, mlflow.log_metric function logs a single key-value metric, mlflow.log_param function logs a single key-value param in the currently active run, mlflow.log_artifact function logs a local file or directory as an artifact and mlflow.set_tracking_uri function set tracking store URI.\n" + ] + }, + { + "cell_type": "code", + "execution_count": 17, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Overwriting train.py\n" + ] + } + ], + "source": [ + "%%writefile train.py\n", + "# !pip install mlflow\n", + "import os\n", + "import warnings\n", + "import sys\n", + "\n", + "import pandas as pd\n", + "import numpy as np\n", + "from sklearn.metrics import mean_squared_error, mean_absolute_error, r2_score\n", + "from sklearn.model_selection import train_test_split\n", + "from sklearn.linear_model import ElasticNet\n", + "from urllib.parse import urlparse\n", + "import mlflow\n", + "import mlflow.sklearn\n", + "\n", + "import logging\n", + "\n", + "logging.basicConfig(level=logging.WARN)\n", + "logger = logging.getLogger(__name__)\n", + "\n", + "\n", + "def eval_metrics(actual, pred):\n", + " rmse = np.sqrt(mean_squared_error(actual, pred))\n", + " mae = mean_absolute_error(actual, pred)\n", + " r2 = r2_score(actual, pred)\n", + " return rmse, mae, r2\n", + "\n", + "warnings.filterwarnings(\"ignore\")\n", + "np.random.seed(40)\n", + "\n", + "csv_url = (\"http://archive.ics.uci.edu/ml/machine-learning-databases/wine-quality/winequality-red.csv\")\n", + "try:\n", + " data = pd.read_csv(csv_url, sep=\";\")\n", + "except Exception as e:\n", + " logger.exception(\"Unable to download training & test CSV, check your internet connection. Error: %s\", e)\n", + "\n", + "train, test = train_test_split(data)\n", + "\n", + "train_x = train.drop([\"quality\"], axis=1)\n", + "test_x = test.drop([\"quality\"], axis=1)\n", + "train_y = train[[\"quality\"]]\n", + "test_y = test[[\"quality\"]]\n", + "\n", + "alpha = float(sys.argv[1]) if len(sys.argv) > 1 else 0.5\n", + "l1_ratio = float(sys.argv[2]) if len(sys.argv) > 2 else 0.5\n", + "\n", + "with mlflow.start_run():\n", + " lr = ElasticNet(alpha=alpha, l1_ratio=l1_ratio, random_state=42)\n", + " lr.fit(train_x, train_y)\n", + "\n", + " predicted_qualities = lr.predict(test_x)\n", + "\n", + " (rmse, mae, r2) = eval_metrics(test_y, predicted_qualities)\n", + "\n", + " print(\"Elasticnet model (alpha=%f, l1_ratio=%f):\" % (alpha, l1_ratio))\n", + " print(\" RMSE: %s\" % rmse)\n", + " print(\" MAE: %s\" % mae)\n", + " print(\" R2: %s\" % r2)\n", + "\n", + " mlflow.log_param(\"alpha\", alpha)\n", + " mlflow.log_param(\"l1_ratio\", l1_ratio)\n", + " mlflow.log_metric(\"rmse\", rmse)\n", + " mlflow.log_metric(\"r2\", r2)\n", + " mlflow.log_metric(\"mae\", mae)\n", + "\n", + " tracking_url_type_store = urlparse(mlflow.get_tracking_uri()).scheme\n", + "\n", + " if tracking_url_type_store != \"file\":\n", + " mlflow.sklearn.log_model(lr, \"model\", registered_model_name=\"ElasticnetWineModel\")\n", + " else:\n", + " mlflow.sklearn.log_model(lr, \"model\")" + ] + }, + { + "cell_type": "code", + "execution_count": 18, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Elasticnet model (alpha=0.600000, l1_ratio=0.800000):\n", + " RMSE: 0.8326325509502465\n", + " MAE: 0.6676500690618903\n", + " R2: 0.0177082428508879\n" + ] + } + ], + "source": [ + "!python train.py 0.6 0.8" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "!mlflow ui" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Then, we serve our model which is to host machine-learning models (on the cloud or on premises) and to make their functions available via API so that applications can incorporate AI into their systems. Model serving is crucial, as a business cannot offer AI products to a large user base without making its product accessible." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "!mlflow models serve -m \"/Users/model\" --no-conda -p 1234" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "!curl -X POST -H \"Content-Type:application/json; format=pandas-split\" --data '{\"columns\":[\"alcohol\", \"chlorides\", \"citric acid\", \"density\", \"fixed acidity\", \"free sulfur dioxide\", \"pH\", \"residual sugar\", \"sulphates\", \"total sulfur dioxide\", \"volatile acidity\"],\"data\":[[12.8, 0.029, 0.48, 0.98, 6.2, 29, 3.33, 1.2, 0.39, 75, 0.66]]}' http://127.0.0.1:1234/invocations" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "The next step is to deploy our model using ducker. First we build the image and then deploy it to our cluster. One way to do this is by applying the respective Kubernetes manifests through the kubectl CLI" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "!mlflow models build-docker \\\n", + " -m ./mlruns/0/d1a8010b10f84f5a9b0a51e2b420efb2/artifacts/model \\\n", + " -n my-docker-image \\\n", + " --enable-mlserver\n", + "\n", + "!kubectl apply -f my-config.yaml" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "%%writefile my-manifest.yaml\n", + "\n", + "apiVersion: serving.kserve.io/v1beta1\n", + "kind: InferenceService\n", + "metadata:\n", + " name: mlflow-model\n", + "spec:\n", + " predictor:\n", + " containers:\n", + " - name: mlflow-model\n", + " image: my-docker-image\n", + " ports:\n", + " - containerPort: 8080\n", + " protocol: TCP\n", + " env:\n", + " - name: PROTOCOL\n", + " value: v2" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "8BDIzkJhfqMV" + }, + "source": [ + "## Conclusion \n" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "TAfujAPs6n1B" + }, + "source": [ + "MLOps solution provides data scientists with an easier and efficient way to maintain monitor models. By getting models into production and bridging the gap between the stakeholder teams, they can focus on data science. With the help of MLOps, deployment can be done on any platform.\n", + "\n", + "In this nootboke we talk about MLOps and its lifecycle and the nessecity of using it. and at the end we saw an simple example of developing and deploying a model using MLFlow which is a library used for MLOps in python. " + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## References \n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "[MLOps concepts for busy engineers: model serving](https://spell.ml/blog/mlops-concepts-model-serving-X385lREAACcAAGzS)\n", + "\n", + " \n", + "
\n", + "\n", + "\n", + "\n" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "gYV4Z8O_fqMF" + }, + "source": [ + "## Training " + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "-thmGRK44qD6" + }, + "source": [ + "We can think about RNNs in two ways. one is this concept of having a hidden state that feeds back at itself recurrently. The other one is to think about unrolling this computational graph for multiple time steps. This would help understanding the recurrent network easier.\n", + "\n", + "\n", + " \n", + "
\n", + "\n", + " $x_t$ is the input at time step t. To keep things simple we assume that $x_t$ is a scalar value with a single feature. You can extend this idea to a d-dimensional feature vector.\n", + " \n", + " $o_t$ is the output of the network at time step t. We can produce multiple outputs in the network but for this example we assume that there is one output.\n", + " \n", + " $h_t$ vector stores the values of the hidden states at time t. This is also called the current context. $h_0$ vector is initialized to zero.\n", + " \n", + " $w_t$ is weight matrix.\n", + "\n", + " At every time step we can unfold the network for k time steps to get the output at time step k+1. The unfolded network is very similar to the feedforward neural network.\n", + " \n", + " Now that we are seeing recurrent neural network as an feedforward neural network with k step, we can easily compute the outputs.\n", + "\n", + "\n", + " \n", + "
" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "zoqWqW24fqMQ" + }, + "source": [ + "### Many to One " + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "4L4mL1nt5d99" + }, + "source": [ + "This RNN takes a sequence of inputs that could be variably sized like a text and generates a single output. Sentiment analysis is a good example of this kind of network where a given sentence can be classified as expressing positive or negative sentiments or in a computer vision contex, you might imagine taking as input, a video which might have variable number of frames and we want to read this entire video of potentioally variable lenght and at the end, make a classification decision about the kind of activity that is going on in that video.\n", + "\n", + "\n", + " \n", + "
" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "UstdBiu75eKS" + }, + "source": [ + "### Many to Many " + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "pxixV6SG6ldE" + }, + "source": [ + "This RNN takes a sequence of inputs and generates a sequence of outputs. Machine translation is one of the examples where our input might be some sentence in English, which could have a variable lenght and our output is the same sentence but in French, which also could have a variable length and crucially the length of the English sentence might be diffrent from the lenght of the French sentence so we need some models that have the capacity to accept both variable length sequences on the input and the output.\n", + "\n", + "We might also consider problems in computer vision contex, where our input is variably length like a video sequence with variable number of frames and we want to make a decision for each element of that input sequence. which in the context of video, is making a classification decision along every frame of that video.\n", + "\n", + "\n", + " \n", + "
\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "As we saw above, RNNs are like a general paradigm for handling variable sized sequenced data that allow us to capture all of these diffrent types of setups in our models." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "29Hx95TFfqMU" + }, + "source": [ + "## Example \n" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "CdJiNHvH6nWn" + }, + "source": [ + "In this example we will be implementing a simple RNN character model with PyTorch to familiarize ourselves with the PyTorch library and get started with RNNs. \n", + "In this implementation, we will be building a model that can complete your sentence based on a few characters or a word used as input.\n", + "\n", + "We will start off by installing and importing the main packages that we will use." + ] + }, + { + "cell_type": "code", + "execution_count": 6, + "metadata": {}, + "outputs": [], + "source": [ + "#!pip3 install torch\n", + "# !pip3 install numpy\n", + "import torch\n", + "from torch import nn\n", + "import numpy as np" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "We have to set our device first. we would use gpu if available and cpu if not." + ] + }, + { + "cell_type": "code", + "execution_count": 14, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "GPU not available, CPU used\n" + ] + } + ], + "source": [ + "if torch.cuda.is_available():\n", + " device = torch.device(\"cuda\")\n", + " print(\"GPU is available\")\n", + "else:\n", + " device = torch.device(\"cpu\")\n", + " print(\"GPU not available, CPU used\")" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Then, we will define the sentences that we want our model to output when fed with the first word or the first few characters and create a dictionary out of all the characters that we have in the sentences and map them to an integer." + ] + }, + { + "cell_type": "code", + "execution_count": 7, + "metadata": {}, + "outputs": [], + "source": [ + "text = ['hey how are you','good i am fine','have a nice day']\n", + "chars = set(''.join(text))\n", + "int2char = dict(enumerate(chars))\n", + "char2int = {char: ind for ind, char in int2char.items()}" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Next, we will be padding our input sentences to ensure that all the sentences are of the sample length. While RNNs are typically able to take in variably sized inputs, we will usually want to feed training data in batches to speed up the training process. In order to used batches to train on our data, we'll need to ensure that each sequence within the input data are of equal size.\n", + "\n", + "Therefore, in most cases, padding can be done by filling up sequences that are too short with 0 values and trimming sequences that are too long. In our case, we'll be finding the length of the longest sequence and padding the rest of the sentences with blank spaces to match that length." + ] + }, + { + "cell_type": "code", + "execution_count": 8, + "metadata": {}, + "outputs": [], + "source": [ + "maxlen = len(max(text, key=len))\n", + "\n", + "for i in range(len(text)):\n", + " while len(text[i])