index.html


<!DOCTYPE html>
<html>
  <head>
    <meta charset="utf-8" />
    <meta name="keywords" content="neural network, remark.js, slides" />
    <meta name="description" content="Neural network presentation" />
    <title>Neural network</title>
    <style>
      @import url(https://fonts.googleapis.com/css?family=Droid+Serif);
      @import url(https://fonts.googleapis.com/css?family=Yanone+Kaffeesatz);
      @import url(https://fonts.googleapis.com/css?family=Ubuntu+Mono:400,700,400italic);

      body {
        font-family: 'Droid Serif';
      }
      h1, h2, h3 {
        font-family: 'Yanone Kaffeesatz';
        font-weight: 400;
        margin-bottom: 0;
        opacity: 1.0;
      }
      .remark-slide-content h1 { font-size: 3em; }
      .remark-slide-content h2 { font-size: 2em; }
      .remark-slide-content h3 { font-size: 1.6em; }
      .footnote {
        position: absolute;
        bottom: 3em;
      }
      li p { line-height: 1.25em; }
      .red { color: #fa0000; }
      .large { font-size: 2em; }
      a, a > code {
        color: rgb(249, 38, 114);
        text-decoration: none;
      }
      code {
        background: #e7e8e2;
        border-radius: 5px;
      }
      .remark-code, .remark-inline-code { font-family: 'Ubuntu Mono'; }
      .remark-code-line-highlighted     { background-color: #373832; }
      .pull-left {
        float: left;
        width: 47%;
      }
      .pull-right {
        float: right;
        width: 47%;
      }
      .pull-right ~ p {
        clear: both;
      }
      #slideshow .slide .content code {
        font-size: 0.8em;
      }
      #slideshow .slide .content pre code {
        font-size: 0.9em;
        padding: 15px;
      }
      .inverse {
        background: #272822;
        color: #777872;
        text-shadow: 0 0 20px #333;
      }
      .inverse h1, .inverse h2 {
        color: #f3f3f3;
        line-height: 0.8em;
      }

      /* Slide-specific styling */
      #slide-inverse .footnote {
        bottom: 12px;
        left: 20px;
      }
      #slide-how .slides {
        font-size: 0.9em;
        position: absolute;
        top:  151px;
        right: 140px;
      }
      #slide-how .slides h3 {
        margin-top: 0.2em;
      }
      #slide-how .slides .first, #slide-how .slides .second {
        padding: 1px 20px;
        height: 90px;
        width: 120px;
        -moz-box-shadow: 0 0 10px #777;
        -webkit-box-shadow: 0 0 10px #777;
        box-shadow: 0 0 10px #777;
      }
      #slide-how .slides .first {
        background: #fff;
        position: absolute;
        top: 20%;
        left: 20%;
        z-index: 1;
      }
      #slide-how .slides .second {
        position: relative;
        background: #fff;
        z-index: 0;
      }

      /* Two-column layout */
      .left-column {
        color: #777;
        width: 20%;
        height: 92%;
        float: left;
      }
        .left-column h2:last-of-type, .left-column h3:last-child {
          color: #000;
        }
      .right-column {
        width: 75%;
        float: right;
        padding-top: 1em;
      }
      /* Custom css classes for background images */
      .neuralnet3 {
        background-image: url(nnet7.jpg);
        opacity:0.75;
      }
      .full-width {
        background-size: auto 100%;
      }
      .full-height {
        background-size: auto 100%;
      }
      .neuralnet4 {
        background-image: url(cvbg.jpg);
        opacity:0.75;
      }
      .neuralnet5 {
        background-image: url(nlpbg.jpg);
        opacity:0.75;
      }
      .h1 {
        color: #f4fefe;
      }
      .neuralnet6 {
        background-image: url(monalisa-gif.gif);
        opacity:0.75;
      }

    </style>
  </head>
  <body>
    <textarea id="source">
name: inverse
layout: true
class: center, middle, inverse
---
class: neuralnet3, full-width, full-height
# Neural Nets: from Perceptron to Deep Learning
[or mimicking brain]
.footnote[or skip it to [Tensorflow Playground](http://playground.tensorflow.org/)]
---
## What are deep artificial neural nets?
---
layout: false
.left-column[
  ## What is it?
]
.right-column[
  Another bio-inspired, crazy-AI-maths unindentified object a.k.a:

- the *most popular ML algorithm* after linear models

- the reason for bi-weekly Breaking News from Google

- the stuff that *already* tags maps, photos, unlocks your IPhone, translate your texts, recognize your speech,...

- the stuff that *has been* reading bankchecks, detecting frauds, identifying threats...for more than 20 years

- the stuff that *will* help drive your car, diagnose a cancer, transform a panda, generate people...

<img src="panda.gif" style="width: 40%">
<img src="gen.gif" style="width: 40%">


]
---
.left-column[
  ## For real ?
]
.right-column[
  Artificial Neural Networks are as old as Artificial Intelligence, initial goal was to create a **machine**
  that would reason the same way as the humain brain, except neurons would be central computing units connected
  by wires

  - historical **hardware** implementation of ANN before transistors became code with transistors and high level languages

  - with DL it has essentially become a piece of specific code running on GPU or TPU

  - but AI chip dream still around

  <img src="mark1.png" style="width: 40%">
  <img src="AIchip.png" style="width: 43%">

]
---
.left-column[
  ## Historically
]
.right-column[
  ANN started in 1943, complex timeline

  - Cybernetics era, followed by a 'crazy-expectations' era

  - Massively funded era (defence, IT, finance) sounded by an SVM era

  - **Deep Learning era** (since 2006) is actually the third wave, most succesfull period


<img src="Google_Scholar.png" width="600">

]
---
.left-column[
  ## Second wave
]
.right-column[
  Starting in the 80's, essentially at AT&T, MIT, Microsoft and IBM

  - backpropagation, Convolutionnal Neural Networks, Long Short Term Memory

  - Hinton (Univ. Toronto), LeCun (Bell Labs), Bengio (Univ. Montreal)

  - training was hard but already in production for inference



<img src="asamples.gif" width="400">


> " *At some point in the late 1990s, one of these systems was reading 10 to 20% of all the checks in the US.* " Y. LeCun (2014)
]
---
.left-column[
  ## Second wave
]
.right-column[
  (Shallow) Neural networks already in common use with crafted features engineering

  - *much workload* was concerned by this *feature engineering* and *data preprocessing*

  - once good features have been found, Neural Nets was just another ML technique (vs. SVM)

  - more dedicated solutions (CNN, LSTM) were *hard to train*

<img src="class.png" width="600">

]
---
.left-column[
  ## Third wave
]
.right-column[
  Deep Neural networks learn the features and make inference easier

  - no *more complex preprocessing* and automatically learns the right features

  - much workload consists in training made possible with GPU and very large learning set

  - inference pipeline simpler but requires *computing power*



<img src="dll.png" width="600">

]
---
# How do you train deep neural nets?

<img src="mlp2.jpg" width="600">
---
.left-column[
  ## Jargon
]
.right-column[
  As any domain, Neural Nets has its own Domain Specific Language, in addition to traditional ML issues

  Comics view .red[*]

<img src="google_comics.png" width="600">

.footnote[.red[*] source : https://codelabs.developers.google.com/]
]
---
.left-column[
  ## Maths
]
.right-column[
  An ANN output is a linear combination of nonlinear outputs taken over linear combination of the inputs

  - let's call the inputs `\(x=(x_1,\ldots,x_n)\)` and the weights `\(w\)` and the bias `\(b\)` of one neuron

  - let's call `\(f\)` the **activation function** of one given neuron

  - the output of one neuron is

  $$ f(w_1 x_1 + \ldots + w_n x_n + b) $$


<img src="formula2.png" width="400">

  - now, with m neurons, the final output is

  $$ g(x) = \sum_{i=1}^m \alpha_i f^i(w_1^i x_1 + \ldots + w_n^i x_n + b^i) + b_0$$
]
---
.left-column[
  ## MLP
]
.right-column[
  What we have just described is the celebrated Multi-Layer-Perceptron architecture

  - ancestors of all modern architectures (fully connected part)

  - created by Rosenblatt in 1963

  - part of the feedforward class of neural nets (no loop back)



<img src="mlp.gif" width="600">


]
---
.left-column[
  ## Activation Functions
]
.right-column[
  Activation function is the unit process function that every input transformation goes through. Initially, this function purpose was to mimick ON/OFF process of axons. Before deep learning, ANN practitioners favored differentiable activation function.
  But since deep learning upraisal, this situation has changed and now it can be

  - a smooth saturated function e.g. sigmoid, hyperbolic tangent

  - piecewise linear (Rectified Linear Unit)

  - anything (cosinus, identity)



<img src="modern_activation.png" width="600">


]
---
.left-column[
  ## Does it work?
]
.right-column[
  Hornik's universal approximation theorem (a.k.a Cybenko's theorem)



<img src="hornik.png" width="600">


]
---
.left-column[
  ## Training
]
.right-column[
  Once architecture (number of layers, number of neurons, activation functions) is set, need to get the weights:

  - Find the optimal values of weights s.t the loss function `\(L(w)\)` is minimum over the training set of `\(N\)` examples where

  $$ \min L(w)=\frac{1}{N}\sum_{i=1}^N(g(x_i)-y_i)^2 $$

  - Unconstrained non convex minimization problem over `\(w\)` variables

  - With many optimization variables



<img src="optimization.gif" width="600">


]
---
.left-column[
  ## Training
]
.right-column[
  The only viable option for such a training is gradient descent

  - iterative algorithm where you start from an initial guess of the values of the `\(w^0\)` and update them

  $$ w^{i+1}=w^{i}-\alpha \nabla_w L(w^{i}) $$

  - `\(\alpha\)` is known as the gradient step (in optimization) and the learning rate (in NN)

  - quite simple...except you need `\(\nabla_w L(w^{i})\)`  **backpropagation algorithm**



<img src="back.gif" width="350">


]
---
.left-column[
  ## Training
]
.right-column[
  Training an MLP (and it is even more true for Deep Neural Nets) then requires

  - scientific computing skills, computing power and **craft art**

  - dedicated libraries essentially for distributed backpropagation and optimization (all are in C++ but sklearn and DL4J)

  - solving an **optimization problem**



<img src="cnn.png" width="350">

]
---
.left-column[
  ## Training
]
.right-column[
  In addition to traditional ML parameters (regularization), training an DNN requires setting the right hyperparameters for the
  optimization

  - optimization solvers: Stochastic Gradient Descent, Momentum, Adam, RMSProp, Nesterov...

  - SGD is the most used solver, it is a simple Gradient Descent where you compute  `\(\nabla_w L(w^{i})\)` over a very small subset of the training basis (called *Minibatch*) and hence the Stochastic gradient







<img src="opt1.gif" width="350">

]
---
.left-column[
  ## Training
]
.right-column[
  Regularization is ensured through a ridge penalty term a.k.a *weight decay*, but
  - *learning rate*: most important term, increases model capacity indirectly through its impact on optimization. More advanced strategies update
  the learning rate (typically with TensorBoard)
  - *number of neurons*: too many neurons leads to overfitting




<img src="learningrates.jpeg" width="350">

]
---
## What do they work for?

<img src="googlernncnn.png" width="500">
---
.left-column[
  ## Automatic Processing Tasks
]
.right-column[
   Deep Learning upraisal was motivated by reaching human-like errors rate for traditional challenges in Robotics, IA... by taking advantage of large amount of data collected
   by american and chinese tech giants almost for free. This include

  - **Computer Vision** (pictures from Mobile Phones)

  - **Natural Language Processing** (Text and Comments from the World Wide Web)

  - **Speech** (excerpts of talks from Mobile Phones and Collaborative Platforms)


<img src="IAtasks.png" width="500">

]
---
class: neuralnet4, full-width, full-height
# Computer Vision

---
.left-column[
  ## Computer Vision - Tasks
]
.right-column[
   In Image Processing (or Computer Vision typically for Robotics, Detection...), traditional challenges and tasks are

  - image classification (threat or not ?)

  - object detection (where is the threat in the picture ?)

  - segmentation (background removal), semantic (tumor detection) or instance segmentation (count people)

  - scene understanding, tracking, location, mapping (SLAM), 3d reconstruction... (defence, autonomous transportation)


<img src="cvtasks.png" width="500">

]
---
.left-column[
  ## Computer Vision - Traditional Methods
]
.right-column[
   In Image Processing, decades of research created many different methods, like

  - template matching (similarity with fixed pattern)

  - low-level dedicated methods (Canny edge detection, watershed or CRF for image segmentation...)

  - Descriptors extractions and Machine Learning (face detection on Mobile phone, Viola-Jones)


<img src="cvtrad.png" width="500">

]
---
.left-column[
  ## Computer Vision - New architectures for DNN
]
.right-column[
   Convolutionnal Neural Networks (created by LeCun in 1989) first apply Convolution Layers on top of inputs (pixel values) to reduce the number
   of weights

  - achieve state-of-the-art performance for most CV challenges since 2012

  - add to the MLP architecture layers of convolution adapted from traditional CV

  - add newcomers such as *pooling* (a.k.a subsampling) and *dropout* to regularize and lower the number of parameters


<img src="cnn_2.png" width="500">

]
---
.left-column[
  ## Computer Vision - Convolution
]
.right-column[
   Convolutionnal Layers specific to pictures

  - Fully Connected layers would involve too many parameters for one picture (think
  of a `\(1000 \mbox{px}  \times  1000 \mbox{px}\)` picture, first layer for one neuron would involve `\(1,000,000\)` weights)

  - Convolutions help reduce the number of parameters

  - Weights of the convolution become trainable parameters

<img src="convol.gif" width="500">

]
---
.left-column[
  ## Computer Vision - Dropout and Pooling
]
.right-column[
   Dropout

  - Some connections between neurons and layers are switched off during training. Typical amount is 20 %
  of neurons randomly selected during training. Dropout helps regularization

<img src="dropout.gif" width="300">

  Pooling

  - Subsampling helps reduce the number of parameters

<img src="pooling.gif" width="300">

]
---
.left-column[
  ## Computer Vision Architectures
]
.right-column[
   Many sophisticated architectures

  - Combines many layers ('deep') and in general several fully connected layers at the end

  - Usually ReLu-like activation functions and TanH at the end, but the devil is in the sequel of pooling, batch normalization weights sharing...

  - Dropout only for training ! Not inference


<img src="comp_archi.png" width="500">

<img src="resnet.png" width="500">

]
---
.left-column[
  ## Computer Vision Architectures
]
.right-column[
   Main challenges and architectures focus on image classification, but other CV tasks require more complex architectures

  - Typically, object detection outputs one or several bounding boxes containing most probable objects

  - Traditional CV uses pyramid of images and uses a classifier many times over many different regions and outputs region with gighest scores

  - Object recognition recent DL architectures rely on efficient subdivision and probability map to speed up object detection


<img src="yolo.png" width="450">

]
---
.left-column[
  ## Computer Vision Architectures
]
.right-column[
   A difficult CV task is semantic segmentation (determines which class each pixel belongs to)

  - Outputs a map (same size) of the original images

  - Apply convolution and then deconvolution

  - Labelling is tedious


<img src="cat_segmentation.png" width="500">

]
---
.left-column[
  ## Computer Vision: best practices
]
.right-column[
   Use a classical architecture (unless you're Google or Facebook) or even better transfer learning

  - Dataset: same aspect ratio, same resolution, same colormap

  - Size: as many pictures as possibles, don't overlook Labelling

  - Scaling: usually mean substraction (per pixel, per channel)

  - Augment: train and test (crops, shift, rotation...)

  - [DL's book](https://www.deeplearningbook.org/) Rule-Of-Thumb : need at list `\5,000\` examples per category and `\10,000,000\` examples to reach human error rate

  - Many practical advices [here](https://jeffmacaluso.github.io/post/DeepLearningRulesOfThumb/)


<img src="faces.png" width="500">

]
---
class: neuralnet5, h1, full-width, full-height
# NLP

---
.left-column[
  ## Natural Language Processing Tasks
]
.right-column[
   AI initial goal was to make machine able to understand and generate texts and speech to interact with humans (Turing test). Such a goal relies on easy to very hard tasks

  - **Easy** Spelling correction, Part-of-Speech tagging...

  - **Medium** Sentiment Analysis, Text Classification...

  - **Hard** Summarization, Translation, Question Answering...


<img src="nlptasks.png" width="500">

]
---
.left-column[
  ## Natural Language Processing Issues
]
.right-column[
   NLP is hard because

  - Hard pre-processing (segmentation) and sequential by nature

  - Sparsity of examples (many words not used very often, different sizes...)

  - Idioms, Languages diversity (colloquial, formal...), Domain Specific Languages

  - Word and sense human ambiguity, grammar parsing not enough... **the astronomer saw the star**

  - Linguistics issues (rationalist vs. empiricist)


<img src="tay.jpg" width="500">

]
---
.left-column[
  ## Natural Language Processing Traditional Methods
]
.right-column[
   NLP has a longer history than AI, many different methods

  - Rule-based (think of `\(regexp\)`)

  - Language Models based on n-grams (Markov models) and dictionary

  - Graph-based methods (Viterbi algorithm)

  - Manual feature extraction (dictionary, bag of words, tf-idf) and ML classifier


<img src="ngrams.png" width="500">

]
---
.left-column[
  ## Natural Language Processing Neural Networks Approach
]
.right-column[
   Similar to CV, NLP has experienced some real advances with the help of Artificial Neural Networks

  - Long Short Term Memory unit in Recurrent Neural Network for Language Models, allow to retain information

  - Word Embeddings : dense representation of words, sentences... **word2vec** approach

  - Sequence-to-Sequence learning and attention mechanisms (Neural Machine Translation, Question Answering, chatbot...)

  - Transformers (BERT, T-NLG..) and deeper architectures (17 billions parameters)


<img src="word2vec.png" width="500">

]
---
.left-column[
  ## Natural Language Processing LSTM
]
.right-column[
   Long Short Term Memory networks are part of Recurrent Neural Networks (created in 1982 by Hopfield)

  - unlike standard Feedforward they process inputs sequentially

  - they retain information (*contextual*) in a memory

  - works well for Language Modelling, Sequence-to-Sequence models, translation...


<img src="rnn.gif" width="500">

]
---
class: neuralnet6, h1, full-width, full-height

---
.left-column[
  ## Generative Adversarial Networks (GANs)
]
.right-column[
   Best idea in Machine Learning over 10 years (Yann LeCun)

  - Generative VS Discriminative (Classification, Regression)

  - Two neural networks : the Generator and the Discriminator that fights in a minimax game

  - Quite hard training... recent methods take inspiration from [Optimal Transport](https://weave.eu/le-transport-optimal-un-couteau-suisse-pour-la-data-science/) theory (Wasserstein distance)

     <img src="https://camo.githubusercontent.com/7443d2adadc104b885cac75a1894567c053c987f/687474703a2f2f7777772e6b646e7567676574732e636f6d2f77702d636f6e74656e742f75706c6f6164732f67656e657261746976652d616476657273617269616c2d6e6574776f726b2e706e67" width="500">


]
---
.left-column[
  ## Generative Adversarial Networks (GANs)
]
.right-column[
   Applications

  - Generate realistic images

  - Augment resolution...

  - Text-to-image, image-to-image ...

<img src="https://junyanz.github.io/CycleGAN/images/teaser.jpg" width="600">

]
---
.left-column[
  ## Autoencoders
]
.right-column[
   Autoencoders are a deep extension of MLP to automatically find the best reduction and coding/decoding at once

  - part of unsupervised learning

  - conceptually, it is a nonlinear dimension reduction technique decodable (unlike t-SNE)

  - often used to denoise signals, also for data synthesis


<img src="enco.png" width="500">

]
---
.left-column[
  ## Auto-encoders
]
.right-column[
   Applications :

   - find the concept of cat spawning Web images and videos

   - help unlock your IPhone 10

   - used as a compression for speech, image...


<img src="nn_cat.jpeg" width="500">

]
---
.left-column[
  ## Adversarial Attacks
]
.right-column[
   Basic idea  :

   - Find with optimzation amount of pixels to add to an image to change prediction of CNN

   - DNN are very sensititive to small changes of pixel values

   - Funny but very worrying for security and industrial applications of CNN


<img src="patch.gif" width="500">

]
---
## Small focus on hardware

<img src="gpu.jpg" width="500">
---
.left-column[
  ## Training
]
.right-column[
   In addition to learning rate and algorithmic improvements, modern Neural Nets work because

  - more data (millions time more), better backprop algorithm (automatic differentiation)

  - progress in the secret sauce (initialization of weights, dropout, activation function...)

  - **clever use of GPU** (in lieue of Moore's law): allows to distribute many easy-computations (kind of vectorization) with NVIDIA setting the standards (with CUDA language)


<img src="dl.png" width="550">

]
---
.left-column[
  ## Training
]
.right-column[
  In addition, chip  war has already started with old and new players :

  - NVIDIA (with CUDA)  vs. AMD (with OpenCL)

  - most DL libraries support only CUDA except Caffe (Apple) but would like to get rid of NVIDIA-dependency

  - Microsoft/Intel (with FPGA) vs. Amazon/Xilinx/Baidu (with XPU)

  - Google (Tensor Processing Unit)

  - Cloud-based services



<img src="NVIDIA.png" width="450">

]
---
.left-column[
  ## Training
]
.right-column[
   Big AI/fonders currently deploy strategic plans where training NN is different from infering NN :

   - Microsoft massively deploys Field Programmable Gate Array with Intel Nervana for Cloud Deep Learning

   - TPU/XPU (mix of CPU/GPU/FPGA) would go for inference !


<img src="chips.jpg" width="600">

]



---
.left-column[
  ## On the future of deep learning
]
.right-column[
   Future, trends and challenges :

   - Much ado about **Generative Adversarial Networks** and **Autoencoders**

   - Since several years ago, DNN was successfully integrated in Reinforcement Learning (Deep Q-learning)

   - Safety-critical ML issues (adversarial, uncertainty...), power-efficient version...

   - Prove something about DNN (data coverage with TDA, formal methods for robustness evaluation...)

   - Conciliate physics-based, expert knowledge reasoning and predictive power of Deep Learning **Bayesian Deep Learning**



]




---
.left-column[
  ## Some applications for aerospace engineering
]
.right-column[
   Every field is experiencing a Deep Learning hype :

   - Aerodynamics: automatic tuning of turbulence  parameters, calibration, predict flow for new geometry...

<img src="flow.gif" width="180">

   - Material design: constitution-to-properties prediction, material discovery

<img src="predictive.jpg" width="180">

   - Inverse design: calibration, uncertainty propagation (probabilistic programming)

<img src="inverse.png" width="180">
]




---
## Frameworks, Tools and resources

<img src="tools.jpg" width="500">

---
.left-column[
  ## Deep Learning at home
]
.right-column[



   ### Technical solutions :

   - **Python** : Theano (Montreal), Tensorflow (Google), Keras (Google), Caffe and Pytorch (Facebook), CNTK (Microsoft), MXNET (Amazon), Paddle (Baidu)

   - **R** : Keras (Google)

   - **Javascript** : deeplearning.js, ex : [Teachable machines](https://teachablemachine.withgoogle.com/)


]


---
.left-column[
  ## DL Frameworks
]
.right-column[



   ### Deep Learning frameworks :

   - Allows to build complicated architecture neural networks

   - They are all optimized for performance (low-level libraries written in C/C++ wrapped to be called by a high-level API)

   - Community support (continuous development)

   - GPU-acceleration and massive parallelizations

   - **Automatic differentation**

   <img src="dlframeworks.jpg" width="500">


]


---
.left-column[
  ## DL Frameworks
]
.right-column[



   ### Computational Graphs and AD :

   - User define a computational graph as a composition of transformations (the flow) of the data

   - This graph will be optimized for execution and for gradient computation (automatic differentiation), usually many more parameters
   than output hence reverse mode (backward) is implemented

   - Define-and-run (static graph) vs. more general define-by-run (dynamic graph)

   <img src="staticgraph.png" width="250">


]


---
.left-column[
  ## DL Frameworks
]
.right-column[



   ### Tensorflow :

   - Static computational graph "build graph once and run it many times over distributed machines"

   - Developped by Google (for dev and production)

   - Massive community and side projects (mobile, tensorflow-probability)


   <img src="tensorflow.png" width="500">


]


---
.left-column[
  ## DL Frameworks
]
.right-column[



   ### PyTorch :

   - Dynamic computational graph "build graph once at each forward pass"

   - Developped by Facebook (PyTorch+Caffe2 for production)

   - Massive community and side projects (pyro)

   - More flexible, research oriented

   <img src="pytorch.png" width="500">


]


---
.left-column[
  ## Deep Learning at home
]
.right-column[


   ### Resources :

   - Introduction videos [Part 1](https://www.youtube.com/watch?v=aircAruvnKk) and [Part 2](https://www.youtube.com/watch?v=IHZwWFHWa-w)

   - Benchmark different libraries to recognize MNIST [github link](https://github.com/TheoLvs/machine-learning/blob/master/1.%20Computer%20Vision/2.%20MNIST%20Recognition.ipynb)

   - Stanford course CS231n : [Convolutional Neural Networks for Visual Recognition](http://cs231n.github.io/)

   - [Fast.ai course](http://course.fast.ai/part2.html)

   - [Deeplearning.ai course on Coursera](https://www.coursera.org/specializations/deep-learning)

   - [Google course](https://classroom.udacity.com/courses/ud730)



]







---
## That's all folks !

<img src="giphy.gif" width="500">
---







    </textarea>
    <script src="https://remarkjs.com/downloads/remark-latest.min.js">
    </script>
        <script src="https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.1/MathJax.js?config=TeX-AMS_HTML&delayStartupUntil=configured" type="text/javascript"></script>
    <script type="text/javascript">
      var slideshow = remark.create();

      // Setup MathJax
      MathJax.Hub.Config({
          tex2jax: {
          skipTags: ['script', 'noscript', 'style', 'textarea', 'pre']
          }
      });

      MathJax.Hub.Configured();
    </script>
    <script>
      var slideshow = remark.create();
    </script>
  </body>
</html>