Feature selection can be of value to classification for a variety of reasons. Real world data sets can be rife with irrelevant features, especially if the data was not gather specifically for the classification task at hand. For instance in many business applications hundreds of customer attributes may have been captured in some central data store, whilst only later is decided what kind of models actually need to be built. Bag of words text classification data will be definition include large numbers of terms that may end up not to be relevant. Microarray data sets consisting of genetic expression profiles are very wide data sets, whilst the number of instances is typically very small. In general, feature selection may help in terms of making models more interpretable, ensuring that models actually generalize rather than overfit and it will speed up the building of models when costly algorithms are being used.
In this study we investigate the specific question: will feature selection improve prediction for a given data set and algorithm. We base our findings on experiments across a large number of data sets (just under 400) and a range of different algorithms.