Dec 20, 2009

A Nice Introduction to Logistic Regression

Among the many text books and tutorials on logistic regression, the very preliminary one given by above link explains how the logistic regression model comes:

In the binary classification problem, it is intuitive to determine whether an instance x belongs to class 0 or class 1 by the ratio P(c=1|x) / P(c=0|x). Denoting P = P(c=1|x) and 1-P = P(c=0|x), the ratio becomes odds P/(1-P).

However, a bad property of odds is that it is asymmetric w.r.t. P. For example, swapping the values of P and 1-P does not negates the value of P/(1-P). However, the swapping does negates the logit ln P/(1-P). So, it becomes reasonable to make logit instead of odds our dependent variable.

By modeling the dependent variable by a linear form, we get:
ln P/(1-P) = a + bx
which is equivalent to
P = ea+bx / (1 + ea+bx)

Above tutorial also compares linear regression with logistic regression:
"If you use linear regression, the predicted values will become greater than one and less than zero if you move far enough on the X-axis. Such values are theoretically inadmissible."

This explains that logistic regression does not estimate the relation between x and c, instead it estimates x and P(c|x), and uses P(c|x) to determine whether x is in c=1 or c=0. So logistic regression is not regression, it is a classifier.

Additional information:

No comments: