Dec 24, 2009

High-dimensional Data Processing

The three top performing classes of algorithms for high-dimensional data sets are
  1. logistic regression,
  2. Random Forests and
  3. SVMs.
Although logistic regression can be inferior to non-linear algorithms, e.g. kernel SVMs, for low-dimensional data sets, it often performs equally well in high-dimensions, when the number of features goes over 10000, because most data sets become linearly separable when the numbers of features become very large.

Given the fact that logistic regression is often faster to train than more complex models like Random Forests and SVMs, in many situations it is the preferable method to deal with high dimensional data sets.

No comments: