People
Minh-Anh Le
Search these datasets in more detail

Minh-Anh's datasets

"The speech dataset was also provided by (see citation request) and contains real world data from recorded English language. The normal class contains data from persons having an American accent…
1599 runs0 likes0 downloads0 reach0 impact
3686 instances - 401 features - 2 classes - 0 missing values
"This UCI dataset contains the hand- written digits 0–9 of 45 different writers. All digits are kept, except for the digit 4. From this class, the first 10 instances are kept (similar to…
0 runs0 likes0 downloads0 reach0 impact
6724 instances - 17 features - 2 classes - 0 missing values
"This UCI dataset contains the hand- written digits 0–9 of 45 different writers. Here, in the “global” task, we only keep the digit 8 as the normal class and sample the 10…
0 runs0 likes0 downloads0 reach0 impact
809 instances - 17 features - 2 classes - 0 missing values
This dataset is not the original dataset. The target variable "Target" is relabeled into "Normal" and "Anomaly".
0 runs0 likes0 downloads0 reach0 impact
11183 instances - 7 features - 2 classes - 0 missing values
E. Schubert","R. Wojdanowski","A. Zimek","H.-P. Kriegel","On Evaluation of Outlier Rankings and Outlier Scores","In Proceedings of the 12th SIAM International Conference on Data Mining…
0 runs0 likes0 downloads0 reach0 impact
49999 instances - 28 features - 2 classes - 0 missing values
This dataset is not the original dataset. The target variable "Target" is relabeled into "Normal" and "Anomaly". This dataset is used in "Learning hyperparameters…
0 runs0 likes0 downloads0 reach0 impact
5300 instances - 3 features - 2 classes - 0 missing values
The Covertype data which is available at UCI repository. In our experiments, instances from class 2 are considered as normal points and instances from class 4 are anomalies. The anomalies ratios is…
0 runs0 likes0 downloads0 reach0 impact
286048 instances - 11 features - 2 classes - 0 missing values
The data set is picked from KDD Cup 1999 data, which is available at UCI repository. Using the ‘service’ attribute, the second largest subsets is Smtp (95,156 records) with anomaly ratios…
0 runs0 likes0 downloads0 reach0 impact
95156 instances - 4 features - 2 classes - 0 missing values
Thanks to NASA for allowing UCL to use the shuttle datasets. "The shuttle dataset describes radiator positions in a NASA space shuttle with 9 attributes and was designed for supervised anomaly…
0 runs0 likes0 downloads0 reach0 impact
49097 instances - 10 features - 2 classes - 0 missing values
Modified “Statlog (Shuttle)” dataset from the UCI machine learning for Unsupervised Anomaly Detection. The Statlog (Shuttle) data which is available at UCI repository. In our ex-…
0 runs0 likes0 downloads0 reach0 impact
46464 instances - 10 features - 2 classes - 0 missing values
The satellite dataset comprises of features extracted from satellite observations. In particular, each image was taken under four different light wavelength, two in visible light (green and red) and…
2078 runs3 likes70 downloads73 reach34 impact
5100 instances - 37 features - 2 classes - 0 missing values
The data is generated from a synthetic data generator Mulcross (see Paper) and available. Mulcross generates a multi-variate normal distribution with a selectable number of anomaly clusters. In our…
0 runs0 likes0 downloads0 reach0 impact
262144 instances - 5 features - 2 classes - 0 missing values
The data set is picked from KDD Cup 1999 data, which is available at UCI repository. Using the ‘service’ attribute, the largest subsets is Http (567,497 records) with anomaly ratios of…
0 runs0 likes0 downloads0 reach0 impact
567497 instances - 4 features - 2 classes - 0 missing values
"The features of the breast-cancer dataset are extracted from medical images of a fine needle aspirate (FNA) describing the cell nuclei. The task of the UCI dataset is to separate cancer from…
0 runs0 likes0 downloads0 reach0 impact
367 instances - 31 features - 2 classes - 0 missing values
Artificial test data set with 4 normal distributions (one of which with low density), a micro cluster and local anomalies. This dataset is not the original dataset. The target variable…
0 runs0 likes0 downloads0 reach0 impact
3000 instances - 3 features - 2 classes - 0 missing values
Simulated data set. 1000 normal observations were drawn from a multivariate normal distribution of variable size 4, with mean zero and no correlation and variance for each feature between 1 and 10.…
0 runs0 likes0 downloads0 reach0 impact
1050 instances - 11 features - 2 classes - 0 missing values
Simulated data set. 1000 normal observations were drawn from a multivariate normal distribution of variable size 4, with mean zero and no correlation and variance for each feature between 1 and 10.…
0 runs0 likes0 downloads0 reach0 impact
1100 instances - 5 features - 2 classes - 0 missing values
Simulated data set. 1000 normal observations were drawn from a multivariate normal distribution of variable size 10, with mean zero and no correlation and variance for each feature between 1 and 10.…
0 runs0 likes0 downloads0 reach0 impact
1050 instances - 11 features - 2 classes - 0 missing values
Simulated data set. 1000 normal observations were drawn from a multivariate normal distribution of variable size 4, with mean zero and no correlation and variance for each feature between 1 and 10.…
0 runs0 likes0 downloads0 reach0 impact
1050 instances - 5 features - 2 classes - 0 missing values
The original thyroid disease (ann-thyroid) dataset from UCI machine learning repository is a classification dataset, which is suited for training ANNs. It has 3772 training instances and 3428 testing…
0 runs0 likes0 downloads0 reach0 impact
7200 instances - 7 features - 2 classes - 0 missing values