Covering topics across theoretical mathematical underpinings of ML and practical applications through PySpark, Pandas, Scikit , Matplotlib and Tensorflow.
Deep dive across four main sections:
-
- Python for data Science
-
- Statistics and Probability
-
- Machine Learning Fundamentals
-
- Big Data and Spark
Covers foundational Python concepts utilised for data analysis, covering:
- Jupyter Notebooks
- NumPy
- Pandas (Series & DataFrames)
- Matplotlib
- Scikit
- NLTK
- Plus some Unix for data analysis
Reviewing statistical concepts with probability, foundational for machine learning, covering:
-
Set Theory
-
Combinatronics
-
Probability (Axiomatic Formulations)
-
Conditional Probability
-
Random variables, Cumulative Dist, Expectation, Variance and Covariance
-
Discrete Distribution Families
- Bernoulli
- Binomial
- Poisson
- Geometric
-
Continuous Distribution Families
- Uniform
- Exponential
- Gaussian
-
Inequalities (Markov, Chebyshev, MGF & Law of large numbers) and Limits (Chernoff Bound and Central Limit Theorem)
-
Parameter Estimation and Confidence Interval
-
Regression (Linear and Polynomial) and Principal Component Analysis (PCA)
-
Hypothesis Testing (P Values, Z and T tests)
Reviewing foundational ML concepts from mathematical perspective following up with practical applications in Python, Covering:
- K Nearest Neighbour (K-NN) and Distance Functions
- Generative Modelling; Probability Spaces and Bivariate Guassians
- Further Generative Modelling; Multivariate Gaussians
- Regression; Linear/Logistic and Regularisation
- Optimisation: Unconstrained, Convexity, Positive Semidefinite Matrices
- Linear Classification; Support Vector Machines (SVM) and Multiclass Linear Prediction
- Further Classifiers; Kernels, Kernel SVM, Decision Trees, Boosting and Random Forests
- Representational Learning; Clustering with; K-Means, Gaussians and Hierarchical Clustering
- Further Representational Learning; Linear Projections, Principal Component Analysis (PCA) , Eigenvalues/Eigenvectors and Spectral Decomposition
- Deep Learning; Autoencoders, Distributed Representation, Feedforward Neural Networks and Training
Reviewing Machine Learning concepts and their practical application through Spark using PySpark, RDD's Spark SQL, DataFrames and TensorFlow.
- Spark, PySpark infrastructure setup
- MapReduce concepts and RDD's
- Spark SQL and DataFrames
- Covariance and principal Component Analysis (PCA) with Python
- PCA Coefficients and PCA Residuals with Python
- K-Means Clustering and Intrinsic Dimensions
- Decision Trees,Ensembles and Boosting
- Neural Networks (NN) review and TensorFlow (Base and Estimator API)