- General Concepts
- Odds and Log-odds
- The Sigmoid
- Truly Understanding Logistic Regression
- The Logit Function and Entropy
Q. True or False: For a fixed number of observations in a data set, introducing more variables normally generates a model that has a better fit to the data. What may be the drawback of such a model-fitting strategy?
Answer
True, But if the inducted features do not provide enough information and act like redundant predictors, then it does not make sense to add those predictors to the model. It unnecessarily increases the complexity of the model and may cause overfitting issues.
Q. Define the term odds of success both qualitatively and formally. Give a numerical example that stresses the relation between probability and odds of an event occurring.
Answer
The term "odds of success" refers to the likelihood or probability of a favorable outcome or event occurring in a given situation or experiment. The odds of success express the relative chances of a positive outcome compared to a negative outcome. It tells you how much more likely success is compared to failure.
Odds of Success =
Let's consider a simple numerical example to illustrate the relationship between probability and the odds of an event occurring:
Suppose you are flipping a fair coin. The probability of getting heads (
Odds of Success =
In this case, the odds of success are 1. This means that the chances of getting heads and the chances of getting tails are equal.
Q. Answer the following:
- Define what is meant by the term "interaction", in the context of a logistic regression predictor variable.
- What is the simplest form of an interaction? Write its formulae.
- What statistical tests can be used to attest to the significance of an interaction term?
Answer
-
An interaction is the product of two single predictor variables implying a non-additive effect.
-
Suppose in your model you have two predictors
$X$ and$Y$ , a model having interaction term can be expressed as:$$\beta_0 + \beta_1X + \beta_2Y + \beta_3XY$$ The last term
$\beta_3XY$ represents an interaction between these two predictors. -
For testing the contribution of an interaction, two principal methods are commonly employed; the Wald chi-squared test or a likelihood ratio test between the model with and without the interaction term.
Q. True or False: In machine learning terminology, unsupervised learning refers to the mapping of input covariates to a target response variable that is attempted to be predicted when the labels are known.
Answer
False, In unsupervised learning we don't have anything like targets/labels to guide the model's predictions. We mostly use unsupervised learning to gain insights from the data. Note that the above definition describes another paradigm of machine learning i.e. supervised learning.
Q. Complete the following sentence: In the case of logistic regression, the response variable is the log of the odds of being classified in [...]
.
Answer
In the case of logistic regression, the response variable is the log of the odds of being classified in a group of binary or multi-class responses. This definition essentially demonstrates that odds can take the form of a vector, allowing for a linear relationship.
Q. Describe how in a logistic regression model, a transformation to the response variable is applied to yield a probability distribution. Why is it considered a more informative representation of the response?
Answer
There are different techniques that are widely used to model probability distribution over output classes, which is bounded between 0 and 1.
We can use the following functions to achieve that:
It will map
Q. Complete the following sentence: Minimizing the negative log-likelihood also means
maximizing the [...]
of selecting the [...]
class.
Answer
Minimizing the negative log-likelihood also means maximizing the probability/likelihood of selecting the positive class
Q. Assume the probability of an event occurring is p = 0.1
.
- What are the
odds
of the event occurring? - What are the
log odds
of the event occurring? - Construct the
probability
of the event as a ratio that equals 0.1
Answer
- odds of an event with probability p =
$\frac{p}{1-p}$ For given$p=0.1$
odds of the event =
- log odds of the event =
$\log(odds \quad of \quad the \quad event) = \log_e\frac{1}{9} = -2.20$ - probability of the event in terms of the odds can be written as follows:
Q. True or False: If the odds of success in a binary response is
Answer
True
Let's calculate the probability of the success(p). We can define the probability(p) in terms of odds of the success as follows:
Since odds of the success = 4.
Q. Draw a graph of odds to probabilities, mapping the entire range of probabilities to their respective odds.
Q. The logistic regression model is a subset of a broader range of machine learning models known as generalized linear models (GLMs), which also include analysis of variance (ANOVA), vanilla linear regression, etc. There are three components to a GLM; identify these three components for binary logistic regression.
Answer
A binary logistic regression GLM consists of there components:
-
Random component: refers to the probability distribution of the response variable (Y ), e.g., binomial distribution for Y in the binary logistic regression, which takes on the values
$Y = 0 or Y =1$ . -
Systematic component: describes the explanatory variables:
$(X1, X2, ...)$ as a combination of linear predictors. The binary case does not constrain these variables to any degree. -
Link function: specifies the link between random and systematic components. It says how the expected value of the response relates to the linear predictor of explanatory variables.
Note: Assume that Y denotes whether a human voice activity was detected
$(Y = 1)$ or not$(Y = 0)$ in a give time frame.
Q. Let us consider the logit transformation, i.e., log-odds. Assume a scenario in which the logit forms the linear decision boundary, for a given vector of systematic components X and predictor variables θ. Write the mathematical expression for the hyperplane that describes the decision boundary.
Answer
At the decision boundary we will have
If we put the 0.5 in log-odds expression we get,
So,
Q. True or False: The logit function and the natural logistic (sigmoid) function are inverses of each other.
Answer
True
Logit expression :
A simple set of algebraic equations yields the inverse relation:
The above equation represents sigmoid function.
Q. Compute the derivative of the natural sigmoid function:
Answer
Direct derivative:
We have,
We can apply formula for
Q. Characterize the sigmoid function when its argument approaches
Answer
We have:
For
For
For
Q. Remember that in logistic regression, the hypothesis function for some parameter vector
where y holds the hypothesis value. Suppose the coefficients of a logistic regression model with independent variables are as follows:
- What is the value of the logit for this observation?
- What is the value of the odds for this observation?
- What is the value of
$P(y = 1)$ for this observation?
Answer
- Logit value can be obtained by substituting independent variables and model's coefficients as follows:
Substitute the given values:
- We know the log-odds can be written in terms of logit as follows:
We know the logit value for given parameters and observations.
On taking
- We can write odds of getting
$y=1$ in terms of$P(y = 1)$ as follows:
We know the odds i.e
On simplifying above expression, we get
Q. Proton therapy (PT) is a widely adopted form of treatment for many types of cancer including breast and lung cancer (Fig. 2.2).
Pulmonary nodules (left) and breast cancer (right) |
Tumour eradication statistics |
- What is the explanatory variable and what is the response variable?
- Explain the use of relative risk and odds ratio for measuring association.
- Are the two variables positively or negatively associated? Find the direction and strength of the association using both relative risk and odds ratio.
- Compute a 95% confidence interval (CI) for the measure of association.
- Interpret the results and explain their significance.
Answer
-
Explanatory variable : Cancer Type(Breast/Lung)
Response variable : Tumor eradication(Yes/No)
-
Relative risk (RR) is the ratio of risk of an event in one group (e.g., exposed group) versus the risk of the event in the other group (e.g., non-exposed group). The odds ratio (OR) is the ratio of odds of an event in one group versus the odds of the event in the other group.
-
Lets calculate relative rist and odds ratio:
For Cancer Type = Lungs:
Odds Ratio Calculations
$$odds(cancer \quad type = lungs) = \frac{number \quad of \quad yes}{number \quad of \quad no}$$ $$odds(cancer \quad type = lungs) = \frac{69}{36} = 1.91$$
For Cancer Type = Breast:
odds ratio
since, odds ratio is greater than
Relative risk calculations
Relative risk(RR) as a measure of association:
- The
$95\%$ confidence interval for the odds-ratio, θ is computed from the sample confidence interval for log odds ratio:
Also from above calculations:
Therefore, the
Using the above logits,
- Since we have
$(0.81, 1.9)$ defines the measure of association with$95\%$ confidence and it also contains$1$ .$1$ depicts there no relationship between tumor eradication vs cancer type.
Q. Consider a system for radiation therapy planning (Fig. 2.3). Given a patient with a malignant tumour, the problem is to select the optimal radiation exposure time for that patient. A key element in this problem is estimating the probability that a given tumour will be eradicated given certain covariates. A data scientist collects information relating to this radiation therapy system.
A multi-detector positron scanner used to locate tumors |
The following covariates are collected;
The data scientist fits a logistic regression model to the dependent measurements and produces these estimated coefficients:
- Estimate the probability that, given a patient who undergoes the treatment for
$40 \ milliseconds$ and who is presented with a tumour sized$3.5\ cm$ , the system eradicates the tumour. - How many milliseconds the patient in part (a) would need to be radiated with to have exactly a
$50\%$ chance of eradicating the tumour?
Answer
- Given:
Also,
On imputing model's parameters and observations values:
- For
$50\%$ chance of eradicating the tumor.
We need to find
By taking logarithm of both side, we can solve of
Q. Recent research suggests that heating mercury containing dental amalgams may cause the release of toxic mercury fumes into the human airways. It is also presumed that drinking hot coffee, stimulates the release of mercury vapour from amalgam fillings.
A dental amalgam |
To study factors that affect migraines, and in particular, patients who have at least four dental amalgams in their mouth, a data scientist collects data from
-
$X_1 = 1$ if the patient has at least four amalgams;$0$ otherwise. -
$X_2$ = coffee consumption (0 to 100 hot cups per month).
The output from training a logistic regression classifier is as follows:
A dental amalgam |
- Using
$X_1$ and$X_2$ , express the odds of a patient having a migraine for a second time. - Calculate the probability of a second migraine for a patient that has at least four amalgams and drank 100 cups per month?
- For users that have at least four amalgams, is high coffee intake associated with an increased probability of a second migraine?
- Is there statistical evidence that having more than four amalgams is directly associated with a reduction in the probability of a second migraine?
Answer
- odds of migraine
$Pr(migraine = 1)$ can be given by:
- Given observation has following values
- drank 100 cups per month i.e
$X_2 = 100$ - patient that has at least four amalgams i.e
$X_1 = 1$
Using above values, Expression for
On putting
- For user that have at-least 4 amalgams
$(X_1 = 1)$ ,
If we increase
So, high coffee intake associated with an increased probability of a second migraine.
Note that we can also deduce same conclusion on looking at coefficient and p-value of hot-coffee.
- No
We can do hypothesis testing to access the statistical significance. Lets define the alternate and null hypothesis for this case.
-
$H_0$ - having more than four amalgams is not directly associated with a reduction in the probability of a second migraine -
$H_1$ - having more than four amalgams is directly associated with a reduction in the probability of a second migraine
From the model we have p-value as
Q. To study factors that affect Alzheimer’s disease using logistic regression, a researcher considers the link between gum (periodontal) disease and Alzheimer as a plausible risk factor. The predictor variable is a count of gum bacteria (Fig. 2.5) in the mouth.
A chain of spherical bacteria. |
The output from training a logistic regression classifier is as follows:
output from training a logistic regression classifier |
- Estimate the probability of improvement when the count of gum bacteria of a patient is 33.
- Find out the gum bacteria count at which the estimated probability of improvement is 0.5.
- Find out the estimated odds ratio of improvement for an increase of 1 in the total gum bacteria count.
- Obtain a 99% confidence interval for the true odds ratio of improvement increase of 1 in the total gum bacteria count. Remember that the most common confidence levels are 90%, 95%, 99%, and 99.9%. Table 9.1 lists the z values for these levels.
Common confidence levels |
Answer
From the given data, we have:
Response variable : Y(remission = 1 or 0)
Independent variable : X(count of gum bacteria)
Model's parameters:
- Given
$X_1 = 33$ , with models parametrs we can estimate the probability:
- Let
$X$ be the count of the gum bacteria for which probability of the improvement is$0.5$ .
taking logs on both side and solving for
So for
- Unit increase in bacteria count will affect the odd ratio by
- A
$99\%$ confidence interval for$β$ is calculated as follows:
Therefore, a
Q. Recent research suggests that cannabis (Fig. 2.6) and cannabinoids administration in particular, may reduce the size of malignant tumours in rats.
Cannabis |
Tumour shrinkage in rats |
For the true odds ratio:
- Find the sample odds ratio.
- Find the sample log-odds ratio.
- Compute a 95% confidence interval
$(z_{0.95} = 1.645; z_{0.975} = 1.96)$ for the true log odds ratio and true odds ratio.
Answer
- sample odds ratio:
- sample log-odd:
The estimated standard error for
- The
$95\%$ CI for the true log odds ratio is:
Correspondingly, the
Q. The entropy of a single binary outcome with probability
- At what
$p$ does$H(p)$ attain its maximum value? - What is the relationship between the entropy
$H(p)$ and the logit function, given$p$ ?
Answer
- To get the
$argmax(H(p))$ , we can differentiate the function wrt to$p$ and equate it to$0$ .
Using derivative rule for logs and products:
On simplifying the above expression:
So,
- If we look at derivative of entropy
$H(p)$ wrt$p$ , we have
Q. What is the difference between linear regression and logistic regression?
Answer
Q. What is the logistic function (sigmoid function), and how is it used in logistic regression?
Answer
Q. What is the purpose of the odds ratio in logistic regression?
Answer
Q. What is the cost function in logistic regression, and why is it used?
Answer
Q. What are the assumptions of logistic regression?
Answer
Q. How do you deal with multi-collinearity in logistic regression?
Answer
Q. What is the purpose of regularization in logistic regression, and how does it work?
Answer
Q. What is the ROC curve in the context of logistic regression?
Answer
Q. How do you evaluate the performance of a logistic regression model?
Answer
Q. For logistic regression, why is log loss recommended over MSE (mean squared error)?