Study
Heterogeneous Ensembles for Data Streams

Heterogeneous Ensembles for Data Streams

Created 21-02-2019 by Jan van Rijn Visibility: public
Loading wiki
Ensembles of classifiers are among the best performing classifiers available in many data mining applications. Rather than training one classifier, multiple classifiers are trained, and their predictions are combined according to some voting schedule. An important perquisite for ensembles to be successful is that the individual models are diverse. One way to vastly increase the diversity among the models is to build an heterogeneous ensemble, comprised of fundamentally different model types. However, most ensembles developed specifically for the dynamic data stream setting rely on only one type of base-level classifier, most often Hoeffding Trees. We study the use of heterogeneous ensembles for data streams. We introduce the Online Performance Estimation framework, which dynamically weights the votes of individual classifiers. Using an internal evaluation method it measures how well ensemble members performed on recent data and dynamically updates them to improve the ensemble's performance throughout the stream. Experiments over a wide range of data streams show performance that is competitive with state of the art ensemble techniques, including Online Bagging and Leveraging Bagging. All experimental results from this work are easily reproducible and publicly available online.