ML Series3: Logistic Regression
A Glimpse Into Generalized Linear Models
I have tried to do some research online about how is logistic regression cost function derived, how is it related to generalized linear models, or how is this different from linear regression. However, I did not find a very satisfying post that have a balance between depth and understandability, with clear cut logic. Therefore, I am writing this blog in hope to accomplish this goal!
Framework
Classification is the task of choosing a value of y that maximizes P(Y |X ). Because of linear models’ interpretability and flexibility, statisticians invented a way to use linear models on different tasks, more specifically, different distribution of Y given X. To make it easier to understand, we will only talk about binary case (Y= 0 or 1), which is a Bernoulli distribution.
Below are some notations we will be using:
Suppose we have data D = {(x1,y1), (x2,y2),….,(xn,yn)}.
Derivations
Cost Function
Under the assumption that you are given n I.I.D. training data points, the probabilities of two independent events happening, given a model, is the product of the probabilities. Therefore, the Likelihood of D will be
Now, because we want to linearly combine features and yi follows Bernoulli Distribution, we have below for every single instance
We can take log on both side because of monotonic, we get
In order to find the parameters, we apply Maximum Likelihood Estimation here to try to maximize above function. Equivalently, we try to Minimize the objective function
In here, we can plug into different link functions from generalized linear model family to make the range of f to be between 0-1. Other than sigmoid function, we can also use probit function.
After plug in sigmoid function, we get the standard form of cost function
where, z = wᵀx. Note that unlike ordinary linear regression, logistic regression does not have a closed form solution.
Gradient Derivation
Derivative of the Sigmoid Function
Cost Function Algorithm
Interview Questions
- What is a decision boundary?
- Can the cost function used in linear regression work in logistic regression?
- What metrics do we use to evaluate logistic regression?
- What is the Maximum Likelihood Estimator (MLE)?
- *How can logistic regression be used in multi-class classification?
Thanks for reading the article! Hope this is helpful. Please let me know if you need more information.