Study
Importance of hyperparameter tuning

Importance of hyperparameter tuning

Created 22-02-2018 by Hilde Weerts Visibility: public
Loading wiki
Benchmark study, using 73 datasets from OpenML-CC18, on the importance of hyperparameter tuning: which parameters are important to tune and which might be set to a default value instead? For each dataset, the following experiments were ran. * Experiment 1: gather performance data to determine good default parameters * Flow 8351: 1000 random configurations of sklearn RandomForestClassifier with a training time limit of 3 hours. * Flow 8353: 1000 random configurations of sklearn SVC with a training time limit of 3 hours. For each of the 59 datasets for which more than 900 performance data points are retrieved in the previous experiment, the following experiments were ran: * Experiment 2: RandomizedGridSearchCV(cv = 5, n\_iter = 100) with one hyperparameter fixed * Flow 8365: RandomForestClassifier(n\_estimators = 300), 10 random search seeds, 5 hyperparameters (_bootstrap_, _criterion_, _max\_features_, _min\_samples\_leaf_, _min\_samples\_split) and a control group with no parameters fixed. * Flow 8399: SVC(kernel = 'rbf'), 10 random search seeds, 4 hyperparameters (_gamma_, _C_, _tol_, _shrinking_) and a control group with no parameters fixed. For more information, see: https://github.com/hildeweerts/hyperimp We thank Microsoft Azure for providing the computational resources for this study.