Data
texture

texture

active ARFF Publicly available Visibility: public Uploaded 29-07-2016 by Rafael Gomes Mantovani
0 likes downloaded by 14 people , 22 total downloads 0 issues 0 downvotes
  • Health Medicine OpenML-CC18 OpenML100 study_123 study_135 study_14 study_50 study_52 study_98 study_99
Issue #Downvotes for this reason By


Loading wiki
Help us complete this description Edit
Author: Laboratory of Image Processing and Pattern Recognition (INPG-LTIRF), Grenoble - France. Source: [ELENA project](https://www.elen.ucl.ac.be/neural-nets/Research/Projects/ELENA/databases/REAL/texture/) Please cite: None ####1. Summary This database was generated by the Laboratory of Image Processing and Pattern Recognition (INPG-LTIRF) in the development of the Esprit project ELENA No. 6891 and the Esprit working group ATHOS No. 6620. ``` (a) Original source: P. Brodatz "Textures: A Photographic Album for Artists and Designers", Dover Publications,Inc.,New York, 1966. (b) Creation: Laboratory of Image Processing and Pattern Recognition Institut National Polytechnique de Grenoble INPG Laboratoire de Traitement d'Image et de Reconnaissance de Formes LTIRF Av. Felix Viallet, 46 F-38031 Grenoble Cedex France (c) Contact: Dr. A. Guerin-Dugue, INPG-LTIRF, guerin@tirf.inpg.fr ``` ####2. Past Usage: This database has a private usage at the TIRF laboratory. It has been created in order to study the textures discrimination with high order statistics. ``` A.Guerin-Dugue, C. Aviles-Cruz, "High Order Statistics from Natural Textured Images", In ATHOS workshop on System Identification and High Order Statistics, Sophia-Antipolis, France, September 1993. Guerin-Dugue, A. and others, Deliverable R3-B4-P - Task B4: Benchmarks, Technical report, Elena-NervesII "Enhanced Learning for Evolutive Neural Architecture", ESPRIT-Basic Research Project Number 6891, June 1995. ``` ####3. Relevant Information: The aim is to distinguish between 11 different textures (Grass lawn, Pressed calf leather, Handmade paper, Raffia looped to a high pile, Cotton canvas, ...), each pattern (pixel) being characterised by 40 attributes built by the estimation of fourth order modified moments in four orientations: 0, 45, 90 and 135 degrees. A statistical method based on the extraction of fourth order moments for the characterization of natural micro-textures was developed called "fourth order modified moments" (mm4) [Guerin93], this method measures the deviation from first-order Gauss-Markov process, for each texture. The features were estimated in four directions to take into account the possible orientations of the textures (0, 45, 90 and 135 degrees). Only correlation between the current pixel, the first neighbourhood and the second neighbourhood are taken into account. This small neighbourhood is adapted to the fine grain property of the textures. The data set contains 11 classes of 500 instances and each class refers to a type of texture in the Brodatz album. The database dimension is 40 plus one for the class label. The 40 attributes were build respectively by the estimation of the following fourth order modified moments in four orientations: 0, 45, 90 and 135 degrees: mm4(000), mm4(001), mm4(002), mm4(011), mm4(012), mm4(022), mm4(111), mm4(112), mm4(122) and mm4(222). !! Patterns are always sorted by class and are presented in the increasing order of their class label in each dataset relative to the texture database (texture.dat, texture_CR.dat, texture_PCA.dat, texture_DFA.dat) ####4. Class: The class label is a code for the following classes: ``` Class Class label 2 Grass lawn (D09) 3 Pressed calf leather (D24) 4 Handmade paper (D57) 6 Raffia looped to a high pile: (D84) 7 Cotton canvas (D77) 8 Pigskin (D92) 9 Beach sand: (D28) 10 Beach sand (D29) 12 Oriental straw cloth (D53) 13 Oriental straw cloth (D78) 14 Oriental grass fiber cloth (D79) ``` ####5. Summary Statistics: Table here below provides for each attribute of the database the dynamic (Min and Max values), the mean value and the standard deviation. ``` Attribute Min Max Mean Standard deviation 1 -1.4495 0.7741 -1.0983 0.2034 2 -1.2004 0.3297 -0.5867 0.2055 3 -1.3099 0.3441 -0.5838 0.3135 4 -1.1104 0.5878 -0.4046 0.2302 5 -1.0534 0.4387 -0.3307 0.2360 6 -1.0029 0.4515 -0.2422 0.2225 7 -1.2076 0.5246 -0.6026 0.2003 8 -1.0799 0.3980 -0.4322 0.2210 9 -1.0570 0.4369 -0.3317 0.2361 10 -1.2580 0.3546 -0.5978 0.3268 11 -1.4495 0.7741 -1.0983 0.2034 12 -1.0831 0.3715 -0.5929 0.2056 13 -1.1194 0.6347 -0.4019 0.3368 14 -1.0182 0.1573 -0.6270 0.1390 15 -0.9435 0.1642 -0.4482 0.1952 16 -0.9944 0.0357 -0.5763 0.1587 17 -1.1722 0.0201 -0.7331 0.1955 18 -1.0174 0.1155 -0.4919 0.2335 19 -1.0044 0.0833 -0.4727 0.2257 20 -1.1800 0.4392 -0.4831 0.3484 21 -1.4495 0.7741 -1.0983 0.2034 22 -1.2275 0.5963 -0.7363 0.2220 23 -1.3412 0.4464 -0.7771 0.3290 24 -1.1774 0.6882 -0.5770 0.2646 25 -1.1369 0.4098 -0.5085 0.2538 26 -1.1099 0.3725 -0.4038 0.2515 27 -1.2393 0.6120 -0.7279 0.2278 28 -1.1540 0.4221 -0.5863 0.2446 29 -1.1323 0.3916 -0.5090 0.2526 30 -1.4224 0.4718 -0.7708 0.3264 31 -1.4495 0.7741 -1.0983 0.2034 32 -1.1789 0.5647 -0.6463 0.1890 33 -1.1473 0.6755 -0.4919 0.3304 34 -1.1228 0.3132 -0.6435 0.1441 35 -1.0145 0.3396 -0.4918 0.1922 36 -1.0298 0.1560 -0.5934 0.1704 37 -1.2534 0.0899 -0.7795 0.1641 38 -1.0966 0.1944 -0.5541 0.2111 39 -1.0765 0.2019 -0.5230 0.2015 40 -1.2155 0.4647 -0.5677 0.3091 ``` The dynamic of the attributes is in [-1.45 - 0.775]. The database resulting from the centering and reduction by attribute of the Texture database is on the ftp server in the `REAL/texture/texture_CR.dat.Z' file. ####6. Confusion matrix. The following confusion matrix of the k_NN classifier was obtained with a Leave_One_Out error counting method on the texture_CR.dat database. k was set to 1 in order to reach the minimum mean error rate : 1.0 +/- 0.8%. ``` Class 2 3 4 6 7 8 9 10 12 13 14 2 97.0 1.0 0.4 0.0 0.0 0.0 1.6 0.0 0.0 0.0 0.0 3 0.2 99.0 0.0 0.0 0.0 0.0 0.4 0.0 0.0 0.0 0.4 4 1.0 0.0 98.8 0.0 0.0 0.0 0.2 0.0 0.0 0.0 0.0 6 0.0 0.0 0.0 99.4 0.0 0.0 0.0 0.6 0.0 0.0 0.0 7 0.0 0.0 0.0 0.0 100.0 0.0 0.0 0.0 0.0 0.0 0.0 8 0.0 0.0 0.0 0.0 0.0 98.6 0.0 1.4 0.0 0.0 0.0 9 0.4 0.0 0.2 0.0 0.0 0.2 98.8 0.4 0.0 0.0 0.0 10 0.0 0.0 0.0 0.0 0.0 1.4 0.0 98.6 0.0 0.0 0.0 12 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 100.0 0.0 0.0 13 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 99.8 0.2 14 0.0 0.4 0.0 0.0 0.0 0.4 0.0 0.0 0.2 0.0 99.0 ``` 7. Result of the Principal Component Analysis: The Principal Components Analysis is a very classical method in pattern recognition [Duda73]. PCA reduces the sample dimension in a linear way for the best representation in lower dimensions keeping the maximum of inertia. The best axe for the representation is however not necessary the best axe for the discrimination. After PCA, features are selected according to the percentage of initial inertia which is covered by the different axes and the number of features is determined according to the percentage of initial inertia to keep for the classification process. This selection method has been applied on the texture_CR database. When quasi-linear correlations exists between some initial features, these redundant dimensions are removed by PCA and this preprocessing is then recommended. In this case, before a PCA, the determinant of the data covariance matrix is near zero; this database is thus badly conditioned for all process which use this information (the quadratic classifier for example). The following file is available for the texture database: ''texture_PCA.dat.Z'', it is the projection of the ''texture_CR'' database on its principal components (sorted in a decreasing order of the related inertia percentage; so, if you desire to work on the database projected on its x first principal components you only have to keep the x first attributes of the texture_PCA.dat database and the class labels (last attribute)). Table here below provides the inertia percentages associated to the eigenvalues corresponding to the principal component axis sorted in the decreasing order of the associated inertia percentage. 99.85 percent of the total database inertia will remain if the 20 first principal components are kept. ``` Eigen Value Inertia Cumulated value percentage inertia 1 30.267500000 75.6687000000 75.6687000000 2 3.6512500000 9.1281300000 84.7969000000 3 2.2937000000 5.7342400000 90.5311000000 4 1.7039700000 4.2599300000 94.7910000000 5 0.6716540000 1.6791300000 96.4702000000 6 0.5015290000 1.2538200000 97.7240000000 7 0.1922830000 0.4807070000 98.2047000000 8 0.1561070000 0.3902670000 98.5950000000 9 0.1099570000 0.2748920000 98.8699000000 10 0.0890891000 0.2227230000 99.0926000000 11 0.0656016000 0.1640040000 99.2566000000 12 0.0489988000 0.1224970000 99.3791000000 13 0.0433819000 0.1084550000 99.4875000000 14 0.0345022000 0.0862554000 99.5738000000 15 0.0299203000 0.0748007000 99.6486000000 16 0.0248857000 0.0622141000 99.7108000000 17 0.0167901000 0.0419752000 99.7528000000 18 0.0161633000 0.0404083000 99.7932000000 19 0.0128898000 0.0322246000 99.8254000000 20 0.0113884000 0.0284710000 99.8539000000 21 0.0078481400 0.0196204000 99.8735000000 22 0.0071527800 0.0178820000 99.8914000000 23 0.0067661400 0.0169153000 99.9083000000 24 0.0053149500 0.0132874000 99.9216000000 25 0.0051102600 0.0127757000 99.9344000000 26 0.0047116600 0.0117792000 99.9461000000 27 0.0036193700 0.0090484300 99.9552000000 28 0.0033222000 0.0083054900 99.9635000000 29 0.0030722400 0.0076806100 99.9712000000 30 0.0026373300 0.0065933300 99.9778000000 31 0.0020996800 0.0052492000 99.9830000000 32 0.0019376500 0.0048441200 99.9879000000 33 0.0015642300 0.0039105700 99.9918000000 34 0.0009679080 0.0024197700 99.9942000000 35 0.0009578000 0.0023945000 99.9966000000 36 0.0007379780 0.0018449400 99.9984000000 37 0.0006280250 0.0015700600 100.000000000 38 0.0000000040 0.0000000099 100.000000000 39 0.0000000001 0.0000000003 100.000000000 40 0.0000000008 0.0000000019 100.000000000 ``` This matrix can be found in the texture_EV.dat file. The Discriminant Factorial Analysis (DFA) can be applied to a learning database where each learning sample belongs to a particular class [Duda73]. The number of discriminant features selected by DFA is fixed in function of the number of classes (c) and of the number of input dimensions (d); this number is equal to the minimum between d and c-1. In the usual case where d is greater than c, the output dimension is fixed equal to the number of classes minus one and the discriminant axes are selected in order to maximize the between-variance and to minimize the within-variance of the classes. The discrimination power (ratio of the projected between-variance over the projected within-variance) is not the same for each discriminant axis: this ratio decreases for each axis. So for a problem with many classes, this preprocessing will not be always efficient as the last output features will not be so discriminant. This analysis uses the information of the inverse of the global covariance matrix, so the covariance matrix must be well conditioned (for example, a preliminary PCA must be applied to remove the linearly correlated dimensions). The Discriminant Factorial Analysis (DFA) has been applied on the 18 first principal components of the texture_PCA database (thus by keeping only the 18 first attributes of these databases before to apply the DFA preprocessing) in order to build the texture_DFA.dat.Z database file, having 10 dimensions (the texture database having 11 classes). In the case of the texture database, experiments shown that a DFA preprocessing is very useful and most of the time improved the classifiers performances. [Duda73] Duda, R.O. and Hart, P.E.,Pattern Classification and Scene Analysis, John Wiley & Sons, 1973.

41 features

Class (target)nominal11 unique values
0 missing
V1numeric861 unique values
0 missing
V2numeric979 unique values
0 missing
V3numeric1199 unique values
0 missing
V4numeric1072 unique values
0 missing
V5numeric1025 unique values
0 missing
V6numeric961 unique values
0 missing
V7numeric965 unique values
0 missing
V8numeric1003 unique values
0 missing
V9numeric1032 unique values
0 missing
V10numeric1234 unique values
0 missing
V11numeric861 unique values
0 missing
V12numeric894 unique values
0 missing
V13numeric1300 unique values
0 missing
V14numeric696 unique values
0 missing
V15numeric810 unique values
0 missing
V16numeric727 unique values
0 missing
V17numeric805 unique values
0 missing
V18numeric899 unique values
0 missing
V19numeric852 unique values
0 missing
V20numeric1282 unique values
0 missing
V21numeric861 unique values
0 missing
V22numeric990 unique values
0 missing
V23numeric1223 unique values
0 missing
V24numeric1150 unique values
0 missing
V25numeric1112 unique values
0 missing
V26numeric1100 unique values
0 missing
V27numeric1010 unique values
0 missing
V28numeric1082 unique values
0 missing
V29numeric1110 unique values
0 missing
V30numeric1217 unique values
0 missing
V31numeric861 unique values
0 missing
V32numeric898 unique values
0 missing
V33numeric1309 unique values
0 missing
V34numeric742 unique values
0 missing
V35numeric838 unique values
0 missing
V36numeric750 unique values
0 missing
V37numeric797 unique values
0 missing
V38numeric921 unique values
0 missing
V39numeric865 unique values
0 missing
V40numeric1252 unique values
0 missing

62 properties

5500
Number of instances (rows) of the dataset.
41
Number of attributes (columns) of the dataset.
11
Number of distinct values of the target attribute (if it is nominal).
0
Number of missing values in the dataset.
0
Number of instances with at least one value missing.
40
Number of numeric attributes.
1
Number of nominal attributes.
Number of attributes needed to optimally describe the class (under the assumption of independence among attributes). Equals ClassEntropy divided by MeanMutualInformation.
0.29
Mean skewness among attributes of the numeric type.
-0.58
Second quartile (Median) of means among attributes of the numeric type.
9.09
Percentage of instances belonging to the most frequent class.
0.23
Mean standard deviation of attributes of the numeric type.
Second quartile (Median) of mutual information between the nominal attributes and the target attribute.
500
Number of instances belonging to the most frequent class.
Minimal entropy among attributes.
0.03
Second quartile (Median) of skewness among attributes of the numeric type.
Maximum entropy among attributes.
-1.08
Minimum kurtosis among attributes of the numeric type.
0
Percentage of binary attributes.
0.22
Second quartile (Median) of standard deviation of attributes of the numeric type.
11.45
Maximum kurtosis among attributes of the numeric type.
-1.1
Minimum of means among attributes of the numeric type.
0
Percentage of instances having missing values.
Third quartile of entropy among attributes.
-0.24
Maximum of means among attributes of the numeric type.
Minimal mutual information between the nominal attributes and the target attribute.
0
Percentage of missing values.
1.09
Third quartile of kurtosis among attributes of the numeric type.
Maximum mutual information between the nominal attributes and the target attribute.
11
The minimal number of distinct values among attributes of the nominal type.
97.56
Percentage of numeric attributes.
-0.49
Third quartile of means among attributes of the numeric type.
11
The maximum number of distinct values among attributes of the nominal type.
-1.14
Minimum skewness among attributes of the numeric type.
2.44
Percentage of nominal attributes.
Third quartile of mutual information between the nominal attributes and the target attribute.
2.67
Maximum skewness among attributes of the numeric type.
0.14
Minimum standard deviation of attributes of the numeric type.
First quartile of entropy among attributes.
0.52
Third quartile of skewness among attributes of the numeric type.
0.35
Maximum standard deviation of attributes of the numeric type.
9.09
Percentage of instances belonging to the least frequent class.
-0.62
First quartile of kurtosis among attributes of the numeric type.
0.25
Third quartile of standard deviation of attributes of the numeric type.
Average entropy of the attributes.
500
Number of instances belonging to the least frequent class.
-0.71
First quartile of means among attributes of the numeric type.
0
Standard deviation of the number of distinct values among attributes of the nominal type.
1.2
Mean kurtosis among attributes of the numeric type.
0
Number of binary attributes.
First quartile of mutual information between the nominal attributes and the target attribute.
-0.61
Mean of means among attributes of the numeric type.
-0.29
First quartile of skewness among attributes of the numeric type.
1
Average class difference between consecutive instances.
Average mutual information between the nominal attributes and the target attribute.
0.2
First quartile of standard deviation of attributes of the numeric type.
3.46
Entropy of the target attribute values.
An estimate of the amount of irrelevant information in the attributes regarding the class. Equals (MeanAttributeEntropy - MeanMutualInformation) divided by MeanMutualInformation.
Second quartile (Median) of entropy among attributes.
0.01
Number of attributes divided by the number of instances.
11
Average number of distinct values among the attributes of the nominal type.
0.03
Second quartile (Median) of kurtosis among attributes of the numeric type.

20 tasks

20390 runs - estimation_procedure: 10-fold Crossvalidation - target_feature: Class
31 runs - estimation_procedure: 10-fold Crossvalidation - target_feature: Class
1 runs - estimation_procedure: 5 times 2-fold Crossvalidation - target_feature: Class
0 runs - estimation_procedure: 10 times 10-fold Crossvalidation - target_feature: Class
0 runs - estimation_procedure: 33% Holdout set - evaluation_measure: predictive_accuracy - target_feature: Class
0 runs - estimation_procedure: 33% Holdout set - target_feature: Class
0 runs - estimation_procedure: 4-fold Crossvalidation - target_feature: Class
0 runs - estimation_procedure: 10-fold Crossvalidation - target_feature: Class
0 runs - estimation_procedure: 50 times Clustering
0 runs - estimation_procedure: 50 times Clustering
0 runs - estimation_procedure: 50 times Clustering
0 runs - estimation_procedure: 50 times Clustering
0 runs - estimation_procedure: 50 times Clustering
0 runs - target_feature: Class
0 runs - estimation_procedure: 50 times Clustering
0 runs - estimation_procedure: 50 times Clustering
0 runs - estimation_procedure: 50 times Clustering
0 runs - estimation_procedure: 50 times Clustering
0 runs - estimation_procedure: 50 times Clustering
Define a new task