# ML Series3: Logistic Regression

A Glimpse Into Generalized Linear Models

I have tried to do some research online about how is logistic regression cost function derived, how is it related to generalized linear models, or how is this different from linear regression. However, I did not find a very satisfying post that have a balance between depth and understandability, with clear cut logic. Therefore, I am writing this blog in hope to accomplish this goal!

# Framework

Classification is the task of choosing a value of y that maximizes P(Y |X ). Because of linear models’ interpretability and flexibility, statisticians invented a way to use linear models on different tasks, more specifically, different distribution of Y given X. To make it easier to understand, we will only talk about binary case (Y= 0 or 1), which is a Bernoulli distribution.

Below are some notations we will be using:

Suppose we have data D = {(x1,y1), (x2,y2),….,(xn,yn)}.

# Derivations

## Cost Function

Under the assumption that you are given n I.I.D. training data points, the probabilities of two independent events happening, given a model, is the product of the probabilities. Therefore, the Likelihood of D will be

Now, because we want to linearly combine features and yi follows Bernoulli Distribution, we have below for every single instance

We can take log on both side because of monotonic, we get

In order to find the parameters, we apply Maximum Likelihood Estimation here to try to maximize above function. Equivalently, we try to Minimize the objective function

In here, we can plug into different link functions from generalized linear model family to make the range of f to be between 0-1. Other than sigmoid function, we can also use probit function.

After plug in sigmoid function, we get the standard form of cost function

where, z = wᵀx. Note that unlike ordinary linear regression, logistic regression does not have a closed form solution.

# Interview Questions

• What is a decision boundary?
• Can the cost function used in linear regression work in logistic regression?
• What metrics do we use to evaluate logistic regression?
• What is the Maximum Likelihood Estimator (MLE)?
• *How can logistic regression be used in multi-class classification?