Classification

Some approaches for classification

Using linear regression of a Indicator Matrix
Linear Discriminant Analysis
Quadratic Discriminant Analysis
Regularized Discriminant Analysis [Source: ESLR Page-112]
Logistic Regression

What is log odds?

Log Oddds Definition
What and Why of Log Odds

The odds ratio is the probability of success/probability of failure. As an equation, that’s P(A)/P(-A), where P(A) is the probability of A, and P(-A) the probability of ‘not A’ (i.e. the complement of A).

Taking the logarithm of the odds ratio gives us the log odds of A, which can be written as

log(A) = log(P(A)/P(-A)), Since the probability of an event happening, P(-A) is equal to the probability of an event not happening, 1 – P(A), we can write the log odds as

Where: p = the probability of an event happening 1 – p = the probability of an event not happening

MLE Estimation for Logistic Regression

Although we could use (non-linear) least squares to fit the logistic model , the more general method of maximum likelihood is preferred, since it has better statistical properties. In logistic regression, we use the logistic function,

Dividing (1) by (2) & taking log on both sides,

Let us make following parametric assumption:

MLE is used to find the model parameters while maximizing,

P(observed data|model parameters)

For Logistic Regression, we need to find the model parameter w that maximizes conditional probability,

Resources to understand MLE estimation for Logistic Regression

lecture05.pdf (zstevenwu.com)
Logit.dvi (rutgers.edu)
ADAfaEPoV (cmu.edu)
A Gentle Introduction to Logistic Regression With Maximum Likelihood Estimation (machinelearningmastery.com)
Logistic Regression and Maximum Likelihood Estimation Function | by Puja P. Pathak | CodeX | Medium

Difference b/w Logistic Regression & Linear Discriminant Analysis

[Source: ISLR Page-151]

Logistic Regression	Linear DA
Parameters are estimated using Maximum Likelihood estimation	Parameters are estimated using estimated mean & variance from normal distribution
Decision boundary- Linear	Decision boundary- Linear
logistic regression can outperform LDA if the Gaussian assumptions are not met	LDA assumes that the observations are drawn from a Gaussian distribution with a common covariance matrix in each class, and so can provide some improvements over logistic regression when this assumption approximately holds.

Difference b/w Linear & Quadratic Discriminant Analysis

[Source: ESLR Page-109] and lecture9-stanford

Linear DA	Quadratic DA
All the classes have common covariance matrix Σ_k = Σ ∀ k	Each class has its own covariance matrix, Σ_k
Decision boundary- Linear	Decision boundary- Quadratic
Discriminant Function	Discriminant Function
Since covariance matrices is common for all classes no such problem	Since separate covariance matrices must be computed for each class, when p (#Features) is large, number of parameters increases dramatically.
[Source: ISLR Page-142] LDA classifier results from assuming that the observations within each class come from a normal distribution with a class-specific mean vector and a common variance σ²	[Source: ISLR Page-142] QDA classifier results from assuming that the observations within each class come from a normal distribution with a class-specific mean vector and covariance matrix Σ_k
With p predictors, estimating a covariance matrix requires estimating p(p+1)/2 parameters.	With p predictors and K classses, estimating a covariance matrix requires estimating K.p(p+1)/2 parameters
LDA is a much less flexible classifier	QDA is a more flexible classifier
Can have low variance high bias	Can have high variance low bias

What happens when the classes are well separated in Logistic Regression?

When the classes are well-separated, the parameter estimates for the logistic regression model are surprisingly unstable. Linear discriminant analysis does not suffer from this problem. [Source: ESLR, Page-128] If the data in a two-class logistic regression model can be perfectly separated by a hyperplane, the maximum likelihood estimates of the parameters are undefined (i.e., infinite; see Exercise 4.5). The LDA coefficients for the same data will be well defined, since the marginal likelihood will not permit these degeneracies. https://stats.stackexchange.com/questions/224863/understanding-complete-separation-for-logistic-regression https://stats.stackexchange.com/questions/239928/is-there-any-intuitive-explanation-of-why-logistic-regression-will-not-work-for

Compare SVM & Logistic Regression

[Source: ISLR Page-357] SVM loss function is exactly zero for observations for which these correspond to observations that are on the correct side of the margin. In contrast, the loss function for logistic regression is not exactly zero anywhere. But it is very small for observations that are far from the decision boundary. Due to the similarities between their loss functions, logistic regression and the support vector classifier often give very similar results. When the classes are well separated, SVMs tend to behave better than logistic regression; in more overlapping regimes, logistic regression is often preferred.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

classification.md

classification.md

Classification

Some approaches for classification

What is log odds?

MLE Estimation for Logistic Regression

Resources to understand MLE estimation for Logistic Regression

Difference b/w Logistic Regression & Linear Discriminant Analysis

Difference b/w Linear & Quadratic Discriminant Analysis

What happens when the classes are well separated in Logistic Regression?

Compare SVM & Logistic Regression

Files

classification.md

Latest commit

History

classification.md

File metadata and controls

Classification

Some approaches for classification

What is log odds?

MLE Estimation for Logistic Regression

Resources to understand MLE estimation for Logistic Regression

Difference b/w Logistic Regression & Linear Discriminant Analysis

Difference b/w Linear & Quadratic Discriminant Analysis

What happens when the classes are well separated in Logistic Regression?

Compare SVM & Logistic Regression