Here are the essential skills that a Machine Learning Engineer needs, as mentioned in the first video of this lesson. Within each group are topics that you should be familiar with.
Study Tip: Copy and paste this list into a document and save to your computer for easy referral.
Topics:
- Data structures:
Lists
,stacks
,queues
,strings
,hash maps
,vectors
,matrices
,classes
&objects
,trees
,graphs
, etc. - Algorithms:
Recursion
,searching
,sorting
,optimization
,dynamic programming
, etc. - Computability and complexity:
P vs. NP
,NP-complete problems
,big-O notation
,approximate algorithms
, etc. - Computer architecture:
Memory
,cache
,bandwidth
,threads
&processes
,deadlocks
, etc.
Topics:
- Basic probability:
Conditional probability
,Bayes rule
,likelihood
,independence
, etc. - Probabilistic models:
Bayes Nets
,Markov Decision Processes
,Hidden Markov Models
, etc. - Statistical measures:
Mean
,median
,mode
,variance
,population parameters vs. sample statistics
etc. - Proximity and error metrics:
Cosine similarity
,mean-squared error
,Manhattan and Euclidean distance
,log-loss
, etc. - Distributions and random sampling:
Uniform
,normal
,binomial
,Poisson
, etc. - Analysis methods:
ANOVA
,hypothesis testing
,factor analysis
, etc.
Topics:
- Data preprocessing:
Munging/wrangling
,transforming
,aggregating
, etc. - Pattern recognition:
Correlations
,clusters
,trends
,outliers & anomalies
, etc. - Dimensionality reduction:
Eigenvectors
,Principal Component Analysis
, etc. - Prediction: Classification,
regression
,sequence prediction
, etc.;suitable error/accuracy metrics
. - Evaluation:
Training-testing split
,sequential vs. randomized cross-validation
, etc.
Topics:
- Models:
Parametric vs. nonparametric
,decision tree
,nearest neighbor
,neural net
,support vector machine
,ensemble of multiple models
, etc. - Learning procedure:
Linear regression
,gradient descent
,genetic algorithms
,bagging
,boosting
, and othermodel-specific methods
;regularization
,hyperparameter tuning
, etc. - Tradeoffs and gotchas:
Relative advantages and disadvantages
,bias and variance
,overfitting
andunderfitting
,vanishing/exploding gradients
,missing data
,data leakage
, etc.
Topics:
- Software interface:
Library calls
,REST APIs
,data collection endpoints
,database queries
, etc. - User interface:
Capturing user inputs & application events
,displaying results & visualization
, etc. - Scalability:
Map-reduce
,distributed processing
, etc. - Deployment:
Cloud hosting
,containers & instances
,microservices
, etc.