Tech Notes of Yi Wang: High-dimensional Data Processing

Dec 24, 2009

High-dimensional Data Processing

The three top performing classes of algorithms for high-dimensional data sets are

logistic regression,
Random Forests and
SVMs.

Although logistic regression can be inferior to non-linear algorithms, e.g. kernel SVMs, for low-dimensional data sets, it often performs equally well in high-dimensions, when the number of features goes over 10000, because most data sets become linearly separable when the numbers of features become very large.

Given the fact that logistic regression is often faster to train than more complex models like Random Forests and SVMs, in many situations it is the preferable method to deal with high dimensional data sets.

Tech Notes of Yi Wang

Dec 24, 2009

High-dimensional Data Processing

No comments:

About Me

Blog Archive

Followers