Data
sonar

sonar

active ARFF Publicly available Visibility: public Uploaded 06-04-2014 by Jan van Rijn
0 likes downloaded by 0 people , 0 total downloads 0 issues 0 downvotes
  • Data Science Engineering Geoscience Kaggle mythbusting_1 study_1 study_123 study_15 study_20 study_29 study_30 study_41 study_50 study_52 study_7 study_88 uci
Issue #Downvotes for this reason By


Loading wiki
Help us complete this description Edit
Author: Source: Unknown - Please cite: NAME: Sonar, Mines vs. Rocks SUMMARY: This is the data set used by Gorman and Sejnowski in their study of the classification of sonar signals using a neural network [1]. The task is to train a network to discriminate between sonar signals bounced off a metal cylinder and those bounced off a roughly cylindrical rock. SOURCE: The data set was contributed to the benchmark collection by Terry Sejnowski, now at the Salk Institute and the University of California at San Deigo. The data set was developed in collaboration with R. Paul Gorman of Allied-Signal Aerospace Technology Center. MAINTAINER: Scott E. Fahlman PROBLEM DESCRIPTION: The file "sonar.mines" contains 111 patterns obtained by bouncing sonar signals off a metal cylinder at various angles and under various conditions. The file "sonar.rocks" contains 97 patterns obtained from rocks under similar conditions. The transmitted sonar signal is a frequency-modulated chirp, rising in frequency. The data set contains signals obtained from a variety of different aspect angles, spanning 90 degrees for the cylinder and 180 degrees for the rock. Each pattern is a set of 60 numbers in the range 0.0 to 1.0. Each number represents the energy within a particular frequency band, integrated over a certain period of time. The integration aperture for higher frequencies occur later in time, since these frequencies are transmitted later during the chirp. The label associated with each record contains the letter "R" if the object is a rock and "M" if it is a mine (metal cylinder). The numbers in the labels are in increasing order of aspect angle, but they do not encode the angle directly. METHODOLOGY: This data set can be used in a number of different ways to test learning speed, quality of ultimate learning, ability to generalize, or combinations of these factors. In [1], Gorman and Sejnowski report two series of experiments: an "aspect-angle independent" series, in which the whole data set is used without controlling for aspect angle, and an "aspect-angle dependent" series in which the training and testing sets were carefully controlled to ensure that each set contained cases from each aspect angle in appropriate proportions. For the aspect-angle independent experiments the combined set of 208 cases is divided randomly into 13 disjoint sets with 16 cases in each. For each experiment, 12 of these sets are used as training data, while the 13th is reserved for testing. The experiment is repeated 13 times so that every case appears once as part of a test set. The reported performance is an average over the entire set of 13 different test sets, each run 10 times. It was observed that this random division of the sample set led to rather uneven performance. A few of the splits gave poor results, presumably because the test set contains some samples from aspect angles that are under-represented in the corresponding training set. This motivated Gorman and Sejnowski to devise a different set of experiments in which an attempt was made to balance the training and test sets so that each would have a representative number of samples from all aspect angles. Since detailed aspect angle information was not present in the data base of samples, the 208 samples were first divided into clusters, using a 60-dimensional Euclidian metric; each of these clusters was then divided between the 104-member training set and the 104-member test set. The actual training and testing samples used for the "aspect angle dependent" experiments are marked in the data files. The reported performance is an average over 10 runs with this single division of the data set. A standard back-propagation network was used for all experiments. The network had 60 inputs and 2 output units, one indicating a cylinder and the other a rock. Experiments were run with no hidden units (direct connections from each input to each output) and with a single hidden layer with 2, 3, 6, 12, or 24 units. Each network was trained by 300 epochs over the entire training set. The weight-update formulas used in this study were slightly different from the standard form. A learning rate of 2.0 and momentum of 0.0 was used. Errors less than 0.2 were treated as zero. Initial weights were uniform random values in the range -0.3 to +0.3. RESULTS: For the angle independent experiments, Gorman and Sejnowski report the following results for networks with different numbers of hidden units: Hidden % Right on Std. % Right on Std. Units Training set Dev. Test Set Dev. ------ ------------ ---- ---------- ---- 0 89.4 2.1 77.1 8.3 2 96.5 0.7 81.9 6.2 3 98.8 0.4 82.0 7.3 6 99.7 0.2 83.5 5.6 12 99.8 0.1 84.7 5.7 24 99.8 0.1 84.5 5.7 For the angle-dependent experiments Gorman and Sejnowski report the following results: Hidden % Right on Std. % Right on Std. Units Training set Dev. Test Set Dev. ------ ------------ ---- ---------- ---- 0 79.3 3.4 73.1 4.8 2 96.2 2.2 85.7 6.3 3 98.1 1.5 87.6 3.0 6 99.4 0.9 89.3 2.4 12 99.8 0.6 90.4 1.8 24 100.0 0.0 89.2 1.4 Not surprisingly, the network's performance on the test set was somewhat better when the aspect angles in the training and test sets were balanced. Gorman and Sejnowski further report that a nearest neighbor classifier on the same data gave an 82.7% probability of correct classification. Three trained human subjects were each tested on 100 signals, chosen at random from the set of 208 returns used to create this data set. Their responses ranged between 88% and 97% correct. However, they may have been using information from the raw sonar signal that is not preserved in the processed data sets presented here. REFERENCES: 1. Gorman, R. P., and Sejnowski, T. J. (1988). "Analysis of Hidden Units in a Layered Network Trained to Classify Sonar Targets" in Neural Networks, Vol. 1, pp. 75-89. Relabeled values in attribute 'Class' From: R To: Rock From: M To: Mine

61 features

Class (target)nominal2 unique values
0 missing
attribute_1numeric177 unique values
0 missing
attribute_2numeric182 unique values
0 missing
attribute_3numeric190 unique values
0 missing
attribute_4numeric181 unique values
0 missing
attribute_5numeric193 unique values
0 missing
attribute_6numeric196 unique values
0 missing
attribute_7numeric195 unique values
0 missing
attribute_8numeric201 unique values
0 missing
attribute_9numeric205 unique values
0 missing
attribute_10numeric207 unique values
0 missing
attribute_11numeric203 unique values
0 missing
attribute_12numeric206 unique values
0 missing
attribute_13numeric198 unique values
0 missing
attribute_14numeric202 unique values
0 missing
attribute_15numeric203 unique values
0 missing
attribute_16numeric203 unique values
0 missing
attribute_17numeric202 unique values
0 missing
attribute_18numeric204 unique values
0 missing
attribute_19numeric206 unique values
0 missing
attribute_20numeric203 unique values
0 missing
attribute_21numeric200 unique values
0 missing
attribute_22numeric203 unique values
0 missing
attribute_23numeric199 unique values
0 missing
attribute_24numeric201 unique values
0 missing
attribute_25numeric198 unique values
0 missing
attribute_26numeric194 unique values
0 missing
attribute_27numeric190 unique values
0 missing
attribute_28numeric194 unique values
0 missing
attribute_29numeric197 unique values
0 missing
attribute_30numeric202 unique values
0 missing
attribute_31numeric207 unique values
0 missing
attribute_32numeric205 unique values
0 missing
attribute_33numeric205 unique values
0 missing
attribute_34numeric206 unique values
0 missing
attribute_35numeric205 unique values
0 missing
attribute_36numeric205 unique values
0 missing
attribute_37numeric206 unique values
0 missing
attribute_38numeric206 unique values
0 missing
attribute_39numeric204 unique values
0 missing
attribute_40numeric206 unique values
0 missing
attribute_41numeric204 unique values
0 missing
attribute_42numeric208 unique values
0 missing
attribute_43numeric205 unique values
0 missing
attribute_44numeric196 unique values
0 missing
attribute_45numeric205 unique values
0 missing
attribute_46numeric199 unique values
0 missing
attribute_47numeric202 unique values
0 missing
attribute_48numeric204 unique values
0 missing
attribute_49numeric193 unique values
0 missing
attribute_50numeric154 unique values
0 missing
attribute_51numeric160 unique values
0 missing
attribute_52numeric144 unique values
0 missing
attribute_53numeric134 unique values
0 missing
attribute_54numeric134 unique values
0 missing
attribute_55numeric129 unique values
0 missing
attribute_56numeric122 unique values
0 missing
attribute_57numeric121 unique values
0 missing
attribute_58numeric124 unique values
0 missing
attribute_59numeric119 unique values
0 missing
attribute_60numeric109 unique values
0 missing

107 properties

208
Number of instances (rows) of the dataset.
61
Number of attributes (columns) of the dataset.
2
Number of distinct values of the target attribute (if it is nominal).
0
Number of missing values in the dataset.
0
Number of instances with at least one value missing.
60
Number of numeric attributes.
1
Number of nominal attributes.
0.26
Maximum standard deviation of attributes of the numeric type.
46.63
Percentage of instances belonging to the least frequent class.
98.36
Percentage of numeric attributes.
0.43
Third quartile of means among attributes of the numeric type.
0.65
Area Under the ROC Curve achieved by the landmarker weka.classifiers.trees.DecisionStump -E "weka.attributeSelection.CfsSubsetEval -P 1 -E 1" -S "weka.attributeSelection.BestFirst -D 1 -N 5" -W
0.7
Area Under the ROC Curve achieved by the landmarker weka.classifiers.trees.RandomTree -depth 2
0.33
Error rate achieved by the landmarker weka.classifiers.trees.J48 -C .00001
Average entropy of the attributes.
97
Number of instances belonging to the least frequent class.
1.64
Percentage of nominal attributes.
Third quartile of mutual information between the nominal attributes and the target attribute.
0.37
Error rate achieved by the landmarker weka.classifiers.trees.DecisionStump -E "weka.attributeSelection.CfsSubsetEval -P 1 -E 1" -S "weka.attributeSelection.BestFirst -D 1 -N 5" -W
0.3
Error rate achieved by the landmarker weka.classifiers.trees.RandomTree -depth 2
0.33
Kappa coefficient achieved by the landmarker weka.classifiers.trees.J48 -C .00001
2.28
Mean kurtosis among attributes of the numeric type.
0.79
Area Under the ROC Curve achieved by the landmarker weka.classifiers.bayes.NaiveBayes
First quartile of entropy among attributes.
1.69
Third quartile of skewness among attributes of the numeric type.
0.26
Kappa coefficient achieved by the landmarker weka.classifiers.trees.DecisionStump -E "weka.attributeSelection.CfsSubsetEval -P 1 -E 1" -S "weka.attributeSelection.BestFirst -D 1 -N 5" -W
0.39
Kappa coefficient achieved by the landmarker weka.classifiers.trees.RandomTree -depth 2
0.7
Area Under the ROC Curve achieved by the landmarker weka.classifiers.trees.J48 -C .0001
0.28
Mean of means among attributes of the numeric type.
0.33
Error rate achieved by the landmarker weka.classifiers.bayes.NaiveBayes
-0.55
First quartile of kurtosis among attributes of the numeric type.
0.24
Third quartile of standard deviation of attributes of the numeric type.
0.65
Area Under the ROC Curve achieved by the landmarker weka.classifiers.bayes.NaiveBayes -E "weka.attributeSelection.CfsSubsetEval -P 1 -E 1" -S "weka.attributeSelection.BestFirst -D 1 -N 5" -W
0.7
Area Under the ROC Curve achieved by the landmarker weka.classifiers.trees.RandomTree -depth 3
0.33
Error rate achieved by the landmarker weka.classifiers.trees.J48 -C .0001
Average mutual information between the nominal attributes and the target attribute.
0.36
Kappa coefficient achieved by the landmarker weka.classifiers.bayes.NaiveBayes
0.05
First quartile of means among attributes of the numeric type.
0.71
Area Under the ROC Curve achieved by the landmarker weka.classifiers.trees.REPTree -L 1
0.37
Error rate achieved by the landmarker weka.classifiers.bayes.NaiveBayes -E "weka.attributeSelection.CfsSubsetEval -P 1 -E 1" -S "weka.attributeSelection.BestFirst -D 1 -N 5" -W
0.3
Error rate achieved by the landmarker weka.classifiers.trees.RandomTree -depth 3
0.33
Kappa coefficient achieved by the landmarker weka.classifiers.trees.J48 -C .0001
An estimate of the amount of irrelevant information in the attributes regarding the class. Equals (MeanAttributeEntropy - MeanMutualInformation) divided by MeanMutualInformation.
1
Number of binary attributes.
First quartile of mutual information between the nominal attributes and the target attribute.
0.29
Error rate achieved by the landmarker weka.classifiers.trees.REPTree -L 1
0.26
Kappa coefficient achieved by the landmarker weka.classifiers.bayes.NaiveBayes -E "weka.attributeSelection.CfsSubsetEval -P 1 -E 1" -S "weka.attributeSelection.BestFirst -D 1 -N 5" -W
0.39
Kappa coefficient achieved by the landmarker weka.classifiers.trees.RandomTree -depth 3
0.7
Area Under the ROC Curve achieved by the landmarker weka.classifiers.trees.J48 -C .001
2
Average number of distinct values among the attributes of the nominal type.
0.45
First quartile of skewness among attributes of the numeric type.
0.41
Kappa coefficient achieved by the landmarker weka.classifiers.trees.REPTree -L 1
0.65
Area Under the ROC Curve achieved by the landmarker weka.classifiers.lazy.IBk -E "weka.attributeSelection.CfsSubsetEval -P 1 -E 1" -S "weka.attributeSelection.BestFirst -D 1 -N 5" -W
0
Standard deviation of the number of distinct values among attributes of the nominal type.
0.33
Error rate achieved by the landmarker weka.classifiers.trees.J48 -C .001
0.97
Mean skewness among attributes of the numeric type.
0.04
First quartile of standard deviation of attributes of the numeric type.
0.71
Area Under the ROC Curve achieved by the landmarker weka.classifiers.trees.REPTree -L 2
0.37
Error rate achieved by the landmarker weka.classifiers.lazy.IBk -E "weka.attributeSelection.CfsSubsetEval -P 1 -E 1" -S "weka.attributeSelection.BestFirst -D 1 -N 5" -W
0.83
Area Under the ROC Curve achieved by the landmarker weka.classifiers.lazy.IBk
0.33
Kappa coefficient achieved by the landmarker weka.classifiers.trees.J48 -C .001
0.14
Mean standard deviation of attributes of the numeric type.
Second quartile (Median) of entropy among attributes.
0.29
Error rate achieved by the landmarker weka.classifiers.trees.REPTree -L 2
0.26
Kappa coefficient achieved by the landmarker weka.classifiers.lazy.IBk -E "weka.attributeSelection.CfsSubsetEval -P 1 -E 1" -S "weka.attributeSelection.BestFirst -D 1 -N 5" -W
0.17
Error rate achieved by the landmarker weka.classifiers.lazy.IBk
53.37
Percentage of instances belonging to the most frequent class.
Minimal entropy among attributes.
0.85
Second quartile (Median) of kurtosis among attributes of the numeric type.
0.41
Kappa coefficient achieved by the landmarker weka.classifiers.trees.REPTree -L 2
1
Entropy of the target attribute values.
0.66
Kappa coefficient achieved by the landmarker weka.classifiers.lazy.IBk
111
Number of instances belonging to the most frequent class.
-1.21
Minimum kurtosis among attributes of the numeric type.
0.26
Second quartile (Median) of means among attributes of the numeric type.
0.71
Area Under the ROC Curve achieved by the landmarker weka.classifiers.trees.REPTree -L 3
0.65
Area Under the ROC Curve achieved by the landmarker weka.classifiers.trees.DecisionStump
Maximum entropy among attributes.
0.01
Minimum of means among attributes of the numeric type.
Second quartile (Median) of mutual information between the nominal attributes and the target attribute.
0.29
Error rate achieved by the landmarker weka.classifiers.trees.REPTree -L 3
0.34
Error rate achieved by the landmarker weka.classifiers.trees.DecisionStump
20.74
Maximum kurtosis among attributes of the numeric type.
Minimal mutual information between the nominal attributes and the target attribute.
0.94
Second quartile (Median) of skewness among attributes of the numeric type.
0.41
Kappa coefficient achieved by the landmarker weka.classifiers.trees.REPTree -L 3
0.31
Kappa coefficient achieved by the landmarker weka.classifiers.trees.DecisionStump
0.7
Maximum of means among attributes of the numeric type.
2
The minimal number of distinct values among attributes of the nominal type.
1.64
Percentage of binary attributes.
0.15
Second quartile (Median) of standard deviation of attributes of the numeric type.
0.7
Area Under the ROC Curve achieved by the landmarker weka.classifiers.trees.RandomTree -depth 1
0.29
Number of attributes divided by the number of instances.
Maximum mutual information between the nominal attributes and the target attribute.
-0.79
Minimum skewness among attributes of the numeric type.
0
Percentage of instances having missing values.
Third quartile of entropy among attributes.
0.3
Error rate achieved by the landmarker weka.classifiers.trees.RandomTree -depth 1
Number of attributes needed to optimally describe the class (under the assumption of independence among attributes). Equals ClassEntropy divided by MeanMutualInformation.
2
The maximum number of distinct values among attributes of the nominal type.
0.01
Minimum standard deviation of attributes of the numeric type.
0
Percentage of missing values.
3.61
Third quartile of kurtosis among attributes of the numeric type.
1
Average class difference between consecutive instances.
0.39
Kappa coefficient achieved by the landmarker weka.classifiers.trees.RandomTree -depth 1
0.7
Area Under the ROC Curve achieved by the landmarker weka.classifiers.trees.J48 -C .00001
3.4
Maximum skewness among attributes of the numeric type.

21 tasks

915 runs - estimation_procedure: 10-fold Crossvalidation - evaluation_measure: predictive_accuracy - target_feature: Class
396 runs - estimation_procedure: 10 times 10-fold Crossvalidation - evaluation_measure: predictive_accuracy - target_feature: Class
362 runs - estimation_procedure: 33% Holdout set - evaluation_measure: predictive_accuracy - target_feature: Class
346 runs - estimation_procedure: 5 times 2-fold Crossvalidation - evaluation_measure: predictive_accuracy - target_feature: Class
31 runs - estimation_procedure: 10-fold Crossvalidation - target_feature: Class
0 runs - estimation_procedure: Leave one out - evaluation_measure: predictive_accuracy - target_feature: Class
0 runs - estimation_procedure: 4-fold Crossvalidation - target_feature: Class
213 runs - estimation_procedure: 10 times 10-fold Learning Curve - evaluation_measure: predictive_accuracy - target_feature: Class
85 runs - estimation_procedure: 10-fold Learning Curve - evaluation_measure: predictive_accuracy - target_feature: Class
24 runs - estimation_procedure: Interleaved Test then Train - target_feature: Class
0 runs - estimation_procedure: 50 times Clustering
0 runs - estimation_procedure: 50 times Clustering
0 runs - estimation_procedure: 50 times Clustering
0 runs - estimation_procedure: 50 times Clustering
0 runs - estimation_procedure: 50 times Clustering
0 runs - estimation_procedure: 50 times Clustering
0 runs - estimation_procedure: 50 times Clustering
0 runs - estimation_procedure: 50 times Clustering
0 runs - estimation_procedure: 50 times Clustering
0 runs - estimation_procedure: 50 times Clustering
Define a new task