Dec 24, 2009

High-dimensional Data Processing

The three top performing classes of algorithms for high-dimensional data sets are
  1. logistic regression,
  2. Random Forests and
  3. SVMs.
Although logistic regression can be inferior to non-linear algorithms, e.g. kernel SVMs, for low-dimensional data sets, it often performs equally well in high-dimensions, when the number of features goes over 10000, because most data sets become linearly separable when the numbers of features become very large.

Given the fact that logistic regression is often faster to train than more complex models like Random Forests and SVMs, in many situations it is the preferable method to deal with high dimensional data sets.

2 comments:

Yi Wang said...

I guess when the number of features are getting larger and larger, it is very important to normalize these features. Otherwise, it is hard to get a linear separable boundary. So a high dimension may not give us a good separable data, am I right?

CIIT Noida said...

The most popular destination for Hadoop Training in

Noida
in NOIDA is CIITNOIDA in Sector-2

You can Master the various components of Hadoop ecosystem like Hadoop, MapReduce Architecture, Pig, Hive, HBase, Sqoop, Flume by their Senior Experts

from Industry with good Industrial experience of handling big projects.

You can even Get hands-on practice with live projects and ORACLE certification at CIITNOIDA.

This Big Data Hadoop Training in Noida is best

suited for CS & IT Engineering Students, professionals looking to gain expertise in Big Data and work for Fortune 500 Software Companies.

Hadoop Training in Noida
Big Data Hadoop Training in Noida