# ML Series3: Logistic Regression

A Glimpse Into Generalized Linear Models

I have tried to do some research online about how is logistic regression cost function derived, how is it related to generalized linear models, or how is this different from linear regression. However, I did not find a very satisfying post that have a **balance between depth and understandability, with clear cut logic.** Therefore, I am writing this blog in hope to accomplish this goal!

# Framework

Classification is the task of choosing a value of y that maximizes P(Y |X ). Because of linear models’ interpretability and flexibility, statisticians invented a way to use linear models on different tasks, more specifically, different distribution of Y given X. To make it easier to understand, we will only talk about binary case (Y= 0 or 1), which is a **Bernoulli distribution**.

Below are some notations we will be using:

Suppose we have data D = {(x1,y1), (x2,y2),….,(xn,yn)}.

# Derivations

## Cost Function

Under the assumption that you are given n I.I.D. training data points, the probabilities of two independent events happening, given a model, is the product of the probabilities. Therefore, the **Likelihood** of D will be

Now, because we want to linearly combine features and yi follows Bernoulli Distribution, we have below for every single instance

We can take log on both side because of monotonic, we get

In order to find the parameters, we apply **Maximum Likelihood Estimation**** **here to try to maximize above function. **Equivalently, we try to Minimize the objective function**

In here, we can plug into different link functions from **generalized linear model** family to make the range of f to be between 0-1. Other than sigmoid function, we can also use probit function.

After plug in sigmoid function, we get the standard form of cost function

where, z = wᵀx. **Note that unlike ordinary linear regression, logistic regression does not have a closed form solution.**

## Gradient Derivation

## Derivative of the Sigmoid Function

## Cost Function Algorithm

# Interview Questions

- What is a decision boundary?
- Can the cost function used in linear regression work in logistic regression?
- What metrics do we use to evaluate logistic regression?
- What is the Maximum Likelihood Estimator (MLE)?
- *How can logistic regression be used in multi-class classification?

**Thanks for reading the article! Hope this is helpful. Please let me know if you need more information.**