diff --git a/notebooks/MLOps/index.ipynb b/notebooks/MLOps/index.ipynb new file mode 100644 index 0000000..06f279d --- /dev/null +++ b/notebooks/MLOps/index.ipynb @@ -0,0 +1,480 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": { + "id": "LDerBS_foQl2" + }, + "source": [ + "# MLOps\n", + "\n", + "\n" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "a5z34ayPNA13" + }, + "source": [ + "# Table of Contents\n", + "1. [Introduction](#introduction)\n", + "2. [Machine Learning Lifecycle](#mll)\n", + "3. [MLOps Tools](#tools)\n", + " * [Data Management](#data)\n", + " * [Modeling](#model)\n", + " * [Operationalization](#operation)\n", + "4. [Example](#code_example)\n", + "5. [Conclusion](#conclusion)\n", + "6. [References](#references)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "2OotVuBgfqME" + }, + "source": [ + "## Introduction " + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "ZeHq0bAh4c2u" + }, + "source": [ + "MLOps, also known as Machine Learning Operations for Production, is a set of standardized practices that can be utilized to build, deploy, and govern the lifecycle of ML models. This setup helps to ease the interaction among cross-functional teams and provides an automated platform to keep track of everything required for the complete cycle of ML models. MLOps practices also result in increased scalability, security, and reliability of the ML systems, leading to shorter development cycles and escalated profits from the ML projects. \n", + "\n", + "
\n", + "
\n", + "\n", + "

\n", + " \n", + "

\n", + "\n", + "\n", + "\n" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "gYV4Z8O_fqMF" + }, + "source": [ + "## Machine Learning Lifecycle " + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "-thmGRK44qD6" + }, + "source": [ + "MLOps lifecycle has seven different stages. All the processes happen iteratively, and the success of the entire machine learning system comes with the successful execution of each of these stages.\n", + "\n", + "The machine learning lifecycle is the process of developing, deploying, and managing a machine learning model for a specific application. The lifecycle typically consists of:\n", + "\n", + "

\n", + " \n", + "

\n", + "\n", + "ML Development: This is the basic step that involves creating a complete pipeline beginning from data processing to model training and evaluation codes. \n", + "\n", + "Model Training: Once the setup is ready, the next logical step is to train the model. Here, continuous training functionality is also needed to adapt to new data or address specific changes. \n", + "\n", + "Model Evaluation: Performing inference over the trained model and checking the accuracy/correctness of the output results. \n", + "\n", + "Model Deployment: When the proof of concept stage is accomplished, the other part is to deploy the model according to the industry requirements to face the real-life data. \n", + "\n", + "Prediction Serving: After deployment, the model is now ready to serve predictions over the incoming data. \n", + "\n", + "Model Monitoring: Over time, problems such as concept drift can make the results inaccurate hence continuous monitoring of the model is essential to ensure proper functioning. \n", + "\n", + "Data and Model Management: It is a part of the central system that manages the data and models. It includes maintaining storage, keeping track of different versions, ease of accessibility, security, and configuration across various cross-functional teams. \n", + "\n", + "\n", + "Models are deployed across the organization and in various systems without a consistent way to monitor them. Models have been in production for a long time and never refreshed." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "iEn8GUyrfqMO" + }, + "source": [ + "## MLOps Tools " + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "CjLahM_Vbqz2" + }, + "source": [ + "One of the challenges in ML lifecycle management is manual labor. Every step and the transition between steps are manual. It means data scientists need to collect, analyze, and process data for each application manually. They need to examine their older models to develop new ones and manually fine-tune each time. A large amount of time is allocated to model monitoring to prevent performance degradation. A successful deployment of machine learning models at scale requires automation of steps of the lifecycle. Automation decreases the time allocated to resource-consuming steps such as feature engineering, model training, monitoring, and retraining. It frees up time to rapidly experiment with new models.\n", + "\n", + "The MLOps tools help organizations apply DevOps practices to the process of creating and using AI and machine learning models. These tools are typically used by machine learning engineers, data scientists, and DevOps engineers. MLOps tools can be divided into three major areas.\n", + "\n", + "

\n", + " \n", + "

\n" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "_4ZBEZCDfqMO" + }, + "source": [ + "### Data Management " + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "UcGXgwZ_fqMP" + }, + "source": [ + "MLOps Tools for data management consist of data labeling tools which are used to label large volumes of data such as texts, images, or audios and data versioning tools which enable managing different versions of datasets and storing them in an accessible and well-organized way.\n" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "zoqWqW24fqMQ" + }, + "source": [ + "### Modeling " + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "4L4mL1nt5d99" + }, + "source": [ + "MLOps Tools for modeling consist of feature engineering tools that automate the process of extracting useful features from raw datasets to create better training data for machine learning models like [Feast](https://github.com/feast-dev/feast). Another tool is for experiment tracking which save all the necessary information about different experiments like [MLFlow](https://mlflow.org) and the last tool is for Hyperparameter Optimization that automate the process of searching and selecting hyperparameters that give optimal performance for machine learning models." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "UstdBiu75eKS" + }, + "source": [ + "### Operationalization " + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "pxixV6SG6ldE" + }, + "source": [ + "MLOps Tools for operationalization consist of model deployment tools which facilitate integrating ML models into a production environment to make predictions like [Kubeflow](https://www.kubeflow.org). the other tool concerning operationalization is for model monitoring which detect data drifts and anomalies over time and allow setting up alerts in case of performance issues." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "29Hx95TFfqMU" + }, + "source": [ + "## Example \n" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "CdJiNHvH6nWn" + }, + "source": [ + "In this section we see an example of ml lifrcycle using MLFlow. MLflow is an open source platform for managing the end-to-end machine learning lifecycle. It is designed to work with any machine learning library, determine most things about your code by convention, and require minimal changes to integrate into an existing codebase.\n", + "First, we install and import nessecary packages." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Next, we define our metric for evaluation." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "In the next cell, we first read the wine-quality csv file from the URL and then split the data into training and test sets.\n", + "Then, we split the target from our data set which is the quality column and at the end we register our model.\n", + "\n", + "The mlflow.start_run function start a new MLflow run, setting it as the active run under which metrics and parameters will be logged, mlflow.log_metric function logs a single key-value metric, mlflow.log_param function logs a single key-value param in the currently active run, mlflow.log_artifact function logs a local file or directory as an artifact and mlflow.set_tracking_uri function set tracking store URI.\n" + ] + }, + { + "cell_type": "code", + "execution_count": 17, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Overwriting train.py\n" + ] + } + ], + "source": [ + "%%writefile train.py\n", + "# !pip install mlflow\n", + "import os\n", + "import warnings\n", + "import sys\n", + "\n", + "import pandas as pd\n", + "import numpy as np\n", + "from sklearn.metrics import mean_squared_error, mean_absolute_error, r2_score\n", + "from sklearn.model_selection import train_test_split\n", + "from sklearn.linear_model import ElasticNet\n", + "from urllib.parse import urlparse\n", + "import mlflow\n", + "import mlflow.sklearn\n", + "\n", + "import logging\n", + "\n", + "logging.basicConfig(level=logging.WARN)\n", + "logger = logging.getLogger(__name__)\n", + "\n", + "\n", + "def eval_metrics(actual, pred):\n", + " rmse = np.sqrt(mean_squared_error(actual, pred))\n", + " mae = mean_absolute_error(actual, pred)\n", + " r2 = r2_score(actual, pred)\n", + " return rmse, mae, r2\n", + "\n", + "warnings.filterwarnings(\"ignore\")\n", + "np.random.seed(40)\n", + "\n", + "csv_url = (\"http://archive.ics.uci.edu/ml/machine-learning-databases/wine-quality/winequality-red.csv\")\n", + "try:\n", + " data = pd.read_csv(csv_url, sep=\";\")\n", + "except Exception as e:\n", + " logger.exception(\"Unable to download training & test CSV, check your internet connection. Error: %s\", e)\n", + "\n", + "train, test = train_test_split(data)\n", + "\n", + "train_x = train.drop([\"quality\"], axis=1)\n", + "test_x = test.drop([\"quality\"], axis=1)\n", + "train_y = train[[\"quality\"]]\n", + "test_y = test[[\"quality\"]]\n", + "\n", + "alpha = float(sys.argv[1]) if len(sys.argv) > 1 else 0.5\n", + "l1_ratio = float(sys.argv[2]) if len(sys.argv) > 2 else 0.5\n", + "\n", + "with mlflow.start_run():\n", + " lr = ElasticNet(alpha=alpha, l1_ratio=l1_ratio, random_state=42)\n", + " lr.fit(train_x, train_y)\n", + "\n", + " predicted_qualities = lr.predict(test_x)\n", + "\n", + " (rmse, mae, r2) = eval_metrics(test_y, predicted_qualities)\n", + "\n", + " print(\"Elasticnet model (alpha=%f, l1_ratio=%f):\" % (alpha, l1_ratio))\n", + " print(\" RMSE: %s\" % rmse)\n", + " print(\" MAE: %s\" % mae)\n", + " print(\" R2: %s\" % r2)\n", + "\n", + " mlflow.log_param(\"alpha\", alpha)\n", + " mlflow.log_param(\"l1_ratio\", l1_ratio)\n", + " mlflow.log_metric(\"rmse\", rmse)\n", + " mlflow.log_metric(\"r2\", r2)\n", + " mlflow.log_metric(\"mae\", mae)\n", + "\n", + " tracking_url_type_store = urlparse(mlflow.get_tracking_uri()).scheme\n", + "\n", + " if tracking_url_type_store != \"file\":\n", + " mlflow.sklearn.log_model(lr, \"model\", registered_model_name=\"ElasticnetWineModel\")\n", + " else:\n", + " mlflow.sklearn.log_model(lr, \"model\")" + ] + }, + { + "cell_type": "code", + "execution_count": 18, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Elasticnet model (alpha=0.600000, l1_ratio=0.800000):\n", + " RMSE: 0.8326325509502465\n", + " MAE: 0.6676500690618903\n", + " R2: 0.0177082428508879\n" + ] + } + ], + "source": [ + "!python train.py 0.6 0.8" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "!mlflow ui" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Then, we serve our model which is to host machine-learning models (on the cloud or on premises) and to make their functions available via API so that applications can incorporate AI into their systems. Model serving is crucial, as a business cannot offer AI products to a large user base without making its product accessible." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "!mlflow models serve -m \"/Users/model\" --no-conda -p 1234" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "!curl -X POST -H \"Content-Type:application/json; format=pandas-split\" --data '{\"columns\":[\"alcohol\", \"chlorides\", \"citric acid\", \"density\", \"fixed acidity\", \"free sulfur dioxide\", \"pH\", \"residual sugar\", \"sulphates\", \"total sulfur dioxide\", \"volatile acidity\"],\"data\":[[12.8, 0.029, 0.48, 0.98, 6.2, 29, 3.33, 1.2, 0.39, 75, 0.66]]}' http://127.0.0.1:1234/invocations" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "The next step is to deploy our model using ducker. First we build the image and then deploy it to our cluster. One way to do this is by applying the respective Kubernetes manifests through the kubectl CLI" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "!mlflow models build-docker \\\n", + " -m ./mlruns/0/d1a8010b10f84f5a9b0a51e2b420efb2/artifacts/model \\\n", + " -n my-docker-image \\\n", + " --enable-mlserver\n", + "\n", + "!kubectl apply -f my-config.yaml" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "%%writefile my-manifest.yaml\n", + "\n", + "apiVersion: serving.kserve.io/v1beta1\n", + "kind: InferenceService\n", + "metadata:\n", + " name: mlflow-model\n", + "spec:\n", + " predictor:\n", + " containers:\n", + " - name: mlflow-model\n", + " image: my-docker-image\n", + " ports:\n", + " - containerPort: 8080\n", + " protocol: TCP\n", + " env:\n", + " - name: PROTOCOL\n", + " value: v2" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "8BDIzkJhfqMV" + }, + "source": [ + "## Conclusion \n" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "TAfujAPs6n1B" + }, + "source": [ + "MLOps solution provides data scientists with an easier and efficient way to maintain monitor models. By getting models into production and bridging the gap between the stakeholder teams, they can focus on data science. With the help of MLOps, deployment can be done on any platform.\n", + "\n", + "In this nootboke we talk about MLOps and its lifecycle and the nessecity of using it. and at the end we saw an simple example of developing and deploying a model using MLFlow which is a library used for MLOps in python. " + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## References \n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "[MLOps concepts for busy engineers: model serving](https://spell.ml/blog/mlops-concepts-model-serving-X385lREAACcAAGzS)\n", + "
\n", + "[MLOps Principles](https://ml-ops.org/content/mlops-principles)\n", + "
\n", + "[MLOps Python Tutorial for Beginners -Get Started with MLOps](https://www.projectpro.io/data-science-in-python-tutorial/mlops-python-tutorial-for-beginners#mcetoc_1fglt18dug)\n", + "
\n", + "[The MLOps–A Complete Guide and tutorial](https://www.devopsschool.com/blog/the-mlops-a-complete-guide-and-tutorial/)\n", + "
\n", + "[Machine Learning, Pipelines, Deployment and MLOps Tutorial](https://www.datacamp.com/tutorial/tutorial-machine-learning-pipelines-mlops-deployment#why-mlops-)\n", + "
\n", + "[Introduction to MLOps](https://www.youtube.com/watch?v=Kvxaj6pHeVA)" + ] + } + ], + "metadata": { + "colab": { + "collapsed_sections": [], + "name": "index.ipynb", + "provenance": [], + "toc_visible": true + }, + "kernelspec": { + "display_name": "Python 3", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.8.8" + } + }, + "nbformat": 4, + "nbformat_minor": 2 +} diff --git a/notebooks/MLOps/metadata.yml b/notebooks/MLOps/metadata.yml new file mode 100644 index 0000000..64c9968 --- /dev/null +++ b/notebooks/MLOps/metadata.yml @@ -0,0 +1,29 @@ +title: MLOps + +meta: + - name: keywords + content: Artificial Intelligence, MLOps + +header: + title: MLOps + description: | + In this notebook we talk about MLOps with a simple example of model deployment. + +authors: + label: + position: top + text: Authors + kind: people + content: + - name: matinamehdizadeh + role: Author + contact: + - link: https://github.com/matinamehdizadeh + icon: fab fa-github + - link: mailto://matinamehdizadeh@gmail.com + icon: fas fa-envelope + +comments: + label: false + kind: comments + diff --git a/notebooks/Recurrent Neural Networks/index.ipynb b/notebooks/Recurrent Neural Networks/index.ipynb new file mode 100644 index 0000000..0456f32 --- /dev/null +++ b/notebooks/Recurrent Neural Networks/index.ipynb @@ -0,0 +1,663 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": { + "id": "LDerBS_foQl2" + }, + "source": [ + "# Recurrent Neural Networks\n", + "\n" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "a5z34ayPNA13" + }, + "source": [ + "# Table of Contents\n", + "1. [Introduction](#introduction)\n", + "2. [Training](#train)\n", + "3. [Architectures](#architectures)\n", + " * [One to Many](#otm)\n", + " * [Many to One](#mto)\n", + " * [Many to Many](#mtm)\n", + "6. [Example](#code_example)\n", + "6. [Conclusion](#conclusion)\n", + "7. [References](#references)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "2OotVuBgfqME" + }, + "source": [ + "## Introduction " + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "ZeHq0bAh4c2u" + }, + "source": [ + "Traditional feed-forward neural networks take in a fixed amount of input data all at the same time and produce a fixed amount of output each time. However, in some context in machine learning we want to have more flexibility in the types of data that our model can process. therefore, we move to this idea of recurrent neural networks (RNN). A recurrent neural network is a special type of an artificial neural network adapted to work for time series data or data that involves sequences; Meaning, RNNs do not consume all the input data at once. Instead, they take them in one at a time and in a sequence. At each step, the RNN does a series of calculations before producing an output. The output, known as the hidden state, is then combined with the next input in the sequence to produce another output. This process continues until the model is programmed to finish or the input sequence ends. To sum up, RNNs have the concept of memory that helps them store the states or information of previous inputs to generate the next output of the sequence.\n", + "\n", + "
\n", + "\n", + "

\n", + " \n", + "

\n", + "\n", + "\n", + "\n" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "gYV4Z8O_fqMF" + }, + "source": [ + "## Training " + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "-thmGRK44qD6" + }, + "source": [ + "We can think about RNNs in two ways. one is this concept of having a hidden state that feeds back at itself recurrently. The other one is to think about unrolling this computational graph for multiple time steps. This would help understanding the recurrent network easier.\n", + "\n", + "

\n", + " \n", + "

\n", + "\n", + " $x_t$ is the input at time step t. To keep things simple we assume that $x_t$ is a scalar value with a single feature. You can extend this idea to a d-dimensional feature vector.\n", + "
\n", + " $o_t$ is the output of the network at time step t. We can produce multiple outputs in the network but for this example we assume that there is one output.\n", + "
\n", + " $h_t$ vector stores the values of the hidden states at time t. This is also called the current context. $h_0$ vector is initialized to zero.\n", + "
\n", + " $w_t$ is weight matrix.\n", + "
\n", + " At every time step we can unfold the network for k time steps to get the output at time step k+1. The unfolded network is very similar to the feedforward neural network.\n", + "
\n", + " Now that we are seeing recurrent neural network as an feedforward neural network with k step, we can easily compute the outputs.\n", + "\n", + "
\n", + " $h_t = f_w(h_{t-1}, x_t) = tanh(w_{hh}h_{t-1} + w_{xh}x_t)$\n", + "
\n", + " $y_t = w_{yh}h_t$\n", + "
\n", + " \n", + " During training, for each piece of training data we will have a corresponding ground-truth label that we want the model to output. After receiving these outputs, we will calculate the loss of that process, which measures how far off, the model’s output is from the correct answer. Using this loss, we can calculate the gradient of the loss function for back-propagation.\n", + "With the gradient that we just obtained, we can update the weights in the model accordingly. Combined with the forward pass, back-propagation is looped over and again, allowing the model to become more accurate with its outputs each time as the weight matrices values are modified to pick out the patterns of the data.\n", + "\n", + "Although it may look as if each RNN cell is using a different weight as shown in the graphics, all of the weights are actually the same as that RNN cell is essentially being re-used throughout the process. This may lead to one of RNNs disadvantages which is the vanishing gradient problem, where the gradients used to compute the weight update may get very close to zero due to multiplication of the same matrix over and over again which prevents the network from learning new weights. The deeper the network, the more pronounced is this problem.\n", + "\n", + "The pseudo-code for training is given below. The value of k which is the recursion factor can be selected by the user for training. In the pseudo-code below $p_t$ is the target value at time step t:\n", + "\n", + "Repeat till stopping criterion is met:\n", + "
\n", + "Set all h to zero.\n", + "
\n", + "Repeat for t = 0 to k\n", + "
\n", + "Forward propagate the network over the unfolded network for k time steps to compute all h and y.\n", + "
\n", + "Compute the error as: $error = y_{k} - p_{k}$\n", + "
\n", + "Backpropagate the error across the unfolded network and update the weights.\n", + "\n", + "\n" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "iEn8GUyrfqMO" + }, + "source": [ + "## Architectures " + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "CjLahM_Vbqz2" + }, + "source": [ + "RNNs are really flexible and can adapt to your needs. As you will see in the images below, your input and output size can come in different forms, yet they can still be fed and extracted from the RNN model. There are different types of recurrent neural networks with varying architectures that are shown below." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "_4ZBEZCDfqMO" + }, + "source": [ + "### One to Many " + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "UcGXgwZ_fqMP" + }, + "source": [ + "This type of neural network has a input which is an object of fixed size like an image and the output is a sequence of variable lenght, such as a caption where diffrent captions might have diffrent number of words, so our output needs to be variable at lenght. \n", + "
\n", + "\n", + "

\n", + " \n", + "

" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "zoqWqW24fqMQ" + }, + "source": [ + "### Many to One " + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "4L4mL1nt5d99" + }, + "source": [ + "This RNN takes a sequence of inputs that could be variably sized like a text and generates a single output. Sentiment analysis is a good example of this kind of network where a given sentence can be classified as expressing positive or negative sentiments or in a computer vision contex, you might imagine taking as input, a video which might have variable number of frames and we want to read this entire video of potentioally variable lenght and at the end, make a classification decision about the kind of activity that is going on in that video.\n", + "\n", + "

\n", + " \n", + "

" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "UstdBiu75eKS" + }, + "source": [ + "### Many to Many " + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "pxixV6SG6ldE" + }, + "source": [ + "This RNN takes a sequence of inputs and generates a sequence of outputs. Machine translation is one of the examples where our input might be some sentence in English, which could have a variable lenght and our output is the same sentence but in French, which also could have a variable length and crucially the length of the English sentence might be diffrent from the lenght of the French sentence so we need some models that have the capacity to accept both variable length sequences on the input and the output.\n", + "\n", + "We might also consider problems in computer vision contex, where our input is variably length like a video sequence with variable number of frames and we want to make a decision for each element of that input sequence. which in the context of video, is making a classification decision along every frame of that video.\n", + "\n", + "

\n", + " \n", + "

\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "As we saw above, RNNs are like a general paradigm for handling variable sized sequenced data that allow us to capture all of these diffrent types of setups in our models." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "29Hx95TFfqMU" + }, + "source": [ + "## Example \n" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "CdJiNHvH6nWn" + }, + "source": [ + "In this example we will be implementing a simple RNN character model with PyTorch to familiarize ourselves with the PyTorch library and get started with RNNs. \n", + "In this implementation, we will be building a model that can complete your sentence based on a few characters or a word used as input.\n", + "\n", + "We will start off by installing and importing the main packages that we will use." + ] + }, + { + "cell_type": "code", + "execution_count": 6, + "metadata": {}, + "outputs": [], + "source": [ + "#!pip3 install torch\n", + "# !pip3 install numpy\n", + "import torch\n", + "from torch import nn\n", + "import numpy as np" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "We have to set our device first. we would use gpu if available and cpu if not." + ] + }, + { + "cell_type": "code", + "execution_count": 14, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "GPU not available, CPU used\n" + ] + } + ], + "source": [ + "if torch.cuda.is_available():\n", + " device = torch.device(\"cuda\")\n", + " print(\"GPU is available\")\n", + "else:\n", + " device = torch.device(\"cpu\")\n", + " print(\"GPU not available, CPU used\")" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Then, we will define the sentences that we want our model to output when fed with the first word or the first few characters and create a dictionary out of all the characters that we have in the sentences and map them to an integer." + ] + }, + { + "cell_type": "code", + "execution_count": 7, + "metadata": {}, + "outputs": [], + "source": [ + "text = ['hey how are you','good i am fine','have a nice day']\n", + "chars = set(''.join(text))\n", + "int2char = dict(enumerate(chars))\n", + "char2int = {char: ind for ind, char in int2char.items()}" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Next, we will be padding our input sentences to ensure that all the sentences are of the sample length. While RNNs are typically able to take in variably sized inputs, we will usually want to feed training data in batches to speed up the training process. In order to used batches to train on our data, we'll need to ensure that each sequence within the input data are of equal size.\n", + "\n", + "Therefore, in most cases, padding can be done by filling up sequences that are too short with 0 values and trimming sequences that are too long. In our case, we'll be finding the length of the longest sequence and padding the rest of the sentences with blank spaces to match that length." + ] + }, + { + "cell_type": "code", + "execution_count": 8, + "metadata": {}, + "outputs": [], + "source": [ + "maxlen = len(max(text, key=len))\n", + "\n", + "for i in range(len(text)):\n", + " while len(text[i]) (Batch Size, Sequence Length, One-Hot Encoding Size)\n" + ] + } + ], + "source": [ + "dict_size = len(char2int)\n", + "seq_len = maxlen - 1\n", + "batch_size = len(text)\n", + "\n", + "def one_hot_encode(sequence, dict_size, seq_len, batch_size):\n", + " features = np.zeros((batch_size, seq_len, dict_size), dtype=np.float32)\n", + " \n", + " for i in range(batch_size):\n", + " for u in range(seq_len):\n", + " features[i, u, sequence[i][u]] = 1\n", + " return features\n", + "\n", + "input_seq = one_hot_encode(input_seq, dict_size, seq_len, batch_size)\n", + "print(\"Input shape: {} --> (Batch Size, Sequence Length, One-Hot Encoding Size)\".format(input_seq.shape))\n", + "\n", + "input_seq = torch.from_numpy(input_seq)\n", + "target_seq = torch.Tensor(target_seq)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "To start building our own neural network model, we can define a class that inherits PyTorch’s base class (nn.module) for all neural network modules. After doing so, we can start defining some variables and also the layers for our model under the constructor. For this model, we wll only be using one layer of RNN followed by a fully connected layer. The fully connected layer will be in-charge of converting the RNN output to our desired output shape.\n", + "\n", + "We also have to define the forward pass function under forward() as a class method. The order the forward function is sequentially executed, therefore have to pass the inputs and the zero-initialized hidden state through the RNN layer first, before passing the RNN outputs to the fully-connected layer.\n", + "\n", + "The last method that we have to define is the method that we called earlier to initialize the hidden state - init_hidden(). This basically creates a tensor of zeros in the shape of our hidden states.\n", + "\n", + "Then we create an instance of our model and initialize the hyperparameters and start the training process." + ] + }, + { + "cell_type": "code", + "execution_count": 15, + "metadata": {}, + "outputs": [], + "source": [ + "class Model(nn.Module):\n", + " def __init__(self, input_size, output_size, hidden_dim, n_layers):\n", + " super(Model, self).__init__()\n", + "\n", + " self.hidden_dim = hidden_dim\n", + " self.n_layers = n_layers\n", + " self.rnn = nn.RNN(input_size, hidden_dim, n_layers, batch_first=True) \n", + " self.fc = nn.Linear(hidden_dim, output_size)\n", + " \n", + " def forward(self, x):\n", + " \n", + " batch_size = x.size(0)\n", + " hidden = self.init_hidden(batch_size)\n", + " out, hidden = self.rnn(x, hidden)\n", + " out = out.contiguous().view(-1, self.hidden_dim)\n", + " out = self.fc(out)\n", + " \n", + " return out, hidden\n", + " \n", + " def init_hidden(self, batch_size):\n", + " hidden = torch.zeros(self.n_layers, batch_size, self.hidden_dim).to(device)\n", + " return hidden" + ] + }, + { + "cell_type": "code", + "execution_count": 16, + "metadata": {}, + "outputs": [], + "source": [ + "model = Model(input_size=dict_size, output_size=dict_size, hidden_dim=12, n_layers=1)\n", + "model = model.to(device)\n", + "\n", + "n_epochs = 100\n", + "lr=0.01\n", + "\n", + "criterion = nn.CrossEntropyLoss()\n", + "optimizer = torch.optim.Adam(model.parameters(), lr=lr)" + ] + }, + { + "cell_type": "code", + "execution_count": 17, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Epoch: 10/100............. Loss: 2.5228\n", + "Epoch: 20/100............. Loss: 2.1118\n", + "Epoch: 30/100............. Loss: 1.7116\n", + "Epoch: 40/100............. Loss: 1.3229\n", + "Epoch: 50/100............. Loss: 0.9832\n", + "Epoch: 60/100............. Loss: 0.7112\n", + "Epoch: 70/100............. Loss: 0.5081\n", + "Epoch: 80/100............. Loss: 0.3617\n", + "Epoch: 90/100............. Loss: 0.2649\n", + "Epoch: 100/100............. Loss: 0.2016\n" + ] + } + ], + "source": [ + "input_seq = input_seq.to(device)\n", + "for epoch in range(1, n_epochs + 1):\n", + " optimizer.zero_grad() # Clears existing gradients from previous epoch\n", + " #input_seq = input_seq.to(device)\n", + " output, hidden = model(input_seq)\n", + " output = output.to(device)\n", + " target_seq = target_seq.to(device)\n", + " loss = criterion(output, target_seq.view(-1).long())\n", + " loss.backward() # Does backpropagation and calculates gradients\n", + " optimizer.step() # Updates the weights accordingly\n", + " \n", + " if epoch%10 == 0:\n", + " print('Epoch: {}/{}.............'.format(epoch, n_epochs), end=' ')\n", + " print(\"Loss: {:.4f}\".format(loss.item()))" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Now we have to test our model." + ] + }, + { + "cell_type": "code", + "execution_count": 18, + "metadata": {}, + "outputs": [], + "source": [ + "def predict(model, character):\n", + " character = np.array([[char2int[c] for c in character]])\n", + " character = one_hot_encode(character, dict_size, character.shape[1], 1)\n", + " character = torch.from_numpy(character)\n", + " character = character.to(device)\n", + " \n", + " out, hidden = model(character)\n", + "\n", + " prob = nn.functional.softmax(out[-1], dim=0).data\n", + " char_ind = torch.max(prob, dim=0)[1].item()\n", + "\n", + " return int2char[char_ind], hidden" + ] + }, + { + "cell_type": "code", + "execution_count": 19, + "metadata": {}, + "outputs": [], + "source": [ + "def sample(model, out_len, start='hey'):\n", + " model.eval() \n", + " start = start.lower()\n", + " chars = [ch for ch in start]\n", + " size = out_len - len(chars)\n", + " for ii in range(size):\n", + " char, h = predict(model, chars)\n", + " chars.append(char)\n", + "\n", + " return ''.join(chars)" + ] + }, + { + "cell_type": "code", + "execution_count": 20, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "'good i am fine '" + ] + }, + "execution_count": 20, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "sample(model, 15, 'good')" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "As we can see, the model is able to come up with the sentence ‘good i am fine ‘ if we feed it with the words ‘good’, achieving what we intended for it to do.\n", + "\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Conclusion " + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "In this notebook we discuss:\n", + "
\n", + "- How a recurrent neural network handles sequential data\n", + "
\n", + "- Unfolding a recurrent neural network\n", + "
\n", + "- Training and back propagation in time\n", + "
\n", + "- Various architectures and variants of RNN\n", + "
\n", + "- Simple example of a vanilla RNN\n", + "\n", + "This is just the tip of the iceberg when it comes to Recurrent Neural Networks. While the vanilla RNN is rarely used in solving NLP or sequential problems, having a good grasp of the basic concepts of RNNs will definitely aid in your understanding as you move towards the more popular GRUs and LSTMs." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "8BDIzkJhfqMV" + }, + "source": [ + "## References \n" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "TAfujAPs6n1B" + }, + "source": [ + "[Recurrent Neural Networks Lecture, Stanford University School of Engineering](https://www.youtube.com/c/stanfordengineering)\n", + "
\n", + "[Recurrent Neural Network (RNN) Tutorial: Types, Examples, LSTM and More](https://www.simplilearn.com/tutorials/deep-learning-tutorial/rnn)\n", + "
\n", + "[RNN walkthrough](https://github.com/gabrielloye/RNN-walkthrough)\n", + "
\n", + "[An Introduction To Recurrent Neural Networks And The Math That Powers Them](https://machinelearningmastery.com/an-introduction-to-recurrent-neural-networks-and-the-math-that-powers-them/)\n", + "
\n", + "[A Tour of Recurrent Neural Network Algorithms for Deep Learning](https://machinelearningmastery.com/recurrent-neural-network-algorithms-for-deep-learning/)\n", + "
\n", + "[A Beginner’s Guide on Recurrent Neural Networks with PyTorch](https://blog.floydhub.com/a-beginners-guide-on-recurrent-neural-networks-with-pytorch/)" + ] + } + ], + "metadata": { + "colab": { + "collapsed_sections": [], + "name": "index.ipynb", + "provenance": [], + "toc_visible": true + }, + "kernelspec": { + "display_name": "Python 3 (ipykernel)", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.10.6" + } + }, + "nbformat": 4, + "nbformat_minor": 2 +} diff --git a/notebooks/Recurrent Neural Networks/metadata.yml b/notebooks/Recurrent Neural Networks/metadata.yml new file mode 100644 index 0000000..7b8f304 --- /dev/null +++ b/notebooks/Recurrent Neural Networks/metadata.yml @@ -0,0 +1,29 @@ +title: Recurrent Neural Networks + +meta: + - name: keywords + content: Artificial Intelligence, Recurrent Neural Networks + +header: + title: Recurrent Neural Networks + description: | + In this notebook we talk about Recurrent Neural Networks. + +authors: + label: + position: top + text: Authors + kind: people + content: + - name: matinamehdizadeh + role: Author + contact: + - link: https://github.com/matinamehdizadeh + icon: fab fa-github + - link: mailto://matinamehdizadeh@gmail.com + icon: fas fa-envelope + +comments: + label: false + kind: comments +