{ "data_id": "529", "name": "pollen", "exact_name": "pollen", "version": 1, "version_label": null, "description": "**Author**: \n**Source**: Unknown - Date unknown \n**Please cite**: \n\nThis dataset is synthetic. It was generated by David Coleman\nat RCA Laboratories in Princeton, N.J. For convenience, we will\nrefer to it as the POLLEN DATA. The first three variables are the\nlengths of geometric features observed sampled pollen grains - in the\nx, y, and z dimensions: a \"ridge\" along x, a \"nub\" in the y\ndirection, and a \"crack\" in along the z dimension. The fourth\nvariable is pollen grain weight, and the fifth is density.\n\nThere are 3848 observations, in random order (for people whose\nsoftware packages cannot handle this much data, it is recommended\nthat the data be sampled). The dataset is broken up into eight\npieces, POLLEN1.DAT - POLLEN8.DAT, each with 481 observations.\nWe will call the variables:\n\n1. RIDGE\n2. NUB\n3. CRACK\n4. WEIGHT\n5. DENSITY\n\n6. OBSERVATION NUMBER (for convenience)\n\nThe data analyst is advised that there is more than one \"feature\" to\nthese data. Each feature can be observed through various graphical\ntechniques, but analytic methods, as well, can help \"crack\" the\ndataset.\n\nAdditional Info:\n\nI no longer have the description handed out during the JSM, but can\ntell you how I generated the data, in minitab.\n\n1. Part A was generated: 5000 (I think) 5-variable, uncorrelated, i.i.d.\nGaussian observations.\n\n2. To get part B, I duplicated part A, then reversed the sign on the\nobservations for 3 of the 5 variables.\n\n3. Part B was appended to Part A.\n\n4. The order of the observations was randomized.\n\n5. While waiting for my tardy car-pool companion, I took a piece of\ngraph paper, and figured out a dot-matrix representation of the word,\n\"EUREKA.\" I then added these observations to the \"center\" of the\ndatatset.\n\n6. The data were scaled, by variable (something like 1,3,5,7,11).\n\n7. The data were rotated, then translated.\n\n8. A few points in space within the datacloud were chosen as ellipsoid\ncenters, then for each center, all observations within a (scaled and\nrotated) radius were identified, and eliminated - to form ellipsoidal\nvoids.\n\n9. The variables were given entirely ficticious names.\n\nFYI, only the folks at Bell Labs, Murray Hill, found everything,\nincluding the voids.\n\nHope this is helpful!\n\nReferences:\n\nBecker, R.A., Denby, L., McGill, R., and Wilks,\nA. (1986). Datacryptanalysis: A Case Study.\nProceedings of the Section on Statistical Graphics, 92-97.\n\nSlomka, M. (1986). The Analysis of a Synthetic Data Set.\nProceedings of the Section on Statistical Graphics, 113-116.\n\n\n\nInformation about the dataset\nCLASSTYPE: numeric\nCLASSINDEX: none specific", "format": "ARFF", "uploader": "Joaquin Vanschoren", "uploader_id": 2, "visibility": "public", "creator": "David Coleman", "contributor": null, "date": "2014-09-29 00:08:04", "update_comment": "set targets, ignores", "last_update": "2014-10-07 01:24:32", "licence": "Public", "status": "active", "error_message": null, "url": "https:\/\/www.openml.org\/data\/download\/52641\/pollen.arff", "default_target_attribute": "DENSITY", "row_id_attribute": null, "ignore_attribute": "\"OBSERVATION_NUMBER\"", "runs": 0, "suggest": { "input": [ "pollen", "This dataset is synthetic. It was generated by David Coleman at RCA Laboratories in Princeton, N.J. For convenience, we will refer to it as the POLLEN DATA. The first three variables are the lengths of geometric features observed sampled pollen grains - in the x, y, and z dimensions: a \"ridge\" along x, a \"nub\" in the y direction, and a \"crack\" in along the z dimension. The fourth variable is pollen grain weight, and the fifth is density. There are 3848 observations, in random order (for people w " ], "weight": 5 }, "qualities": { "NumberOfInstances": 3848, "NumberOfFeatures": 5, "NumberOfClasses": 0, "NumberOfMissingValues": 0, "NumberOfInstancesWithMissingValues": 0, "NumberOfNumericFeatures": 5, "NumberOfSymbolicFeatures": 0, "MeanSkewnessOfNumericAtts": 0.02061748290880903, "Quartile1StdDevOfNumericAtts": 4.1653525715308275, "REPTreeDepth2AUC": null, "CfsSubsetEval_kNN1NErrRate": null, "kNN1NAUC": null, "J48.001.Kappa": null, "MeanStdDevOfNumericAtts": 6.529446437142662, "Quartile2AttributeEntropy": null, "REPTreeDepth2ErrRate": null, "CfsSubsetEval_kNN1NKappa": null, "kNN1NErrRate": null, "MajorityClassPercentage": null, "MinAttributeEntropy": null, "Quartile2KurtosisOfNumericAtts": -0.15547870901963412, "REPTreeDepth2Kappa": null, "ClassEntropy": null, "kNN1NKappa": null, "MajorityClassSize": null, "MinKurtosisOfNumericAtts": -0.3087882324123812, "Quartile2MeansOfNumericAtts": 0.00016629417879406245, "REPTreeDepth3AUC": null, "DecisionStumpAUC": null, "MaxAttributeEntropy": null, "MinMeansOfNumericAtts": -0.0036366424116424435, "Quartile2MutualInformation": null, "REPTreeDepth3ErrRate": null, "DecisionStumpErrRate": null, "MaxKurtosisOfNumericAtts": 0.19518735117786212, "MinMutualInformation": null, "Quartile2SkewnessOfNumericAtts": 0.0721900276939728, "REPTreeDepth3Kappa": null, "DecisionStumpKappa": null, "MaxMeansOfNumericAtts": 0.004237032224532028, "MinNominalAttDistinctValues": null, "PercentageOfBinaryFeatures": 0, "Quartile2StdDevOfNumericAtts": 6.398236557183355, "RandomTreeDepth1AUC": null, "Dimensionality": 0.0012993762993762994, "MaxMutualInformation": null, "MinSkewnessOfNumericAtts": -0.13058045094003162, "PercentageOfInstancesWithMissingValues": 0, "Quartile3AttributeEntropy": null, "RandomTreeDepth1ErrRate": null, "EquivalentNumberOfAtts": null, "MaxNominalAttDistinctValues": null, "MinStdDevOfNumericAtts": 3.1443945904294375, "PercentageOfMissingValues": 0, "Quartile3KurtosisOfNumericAtts": 0.07075610416706746, "AutoCorrelation": -2.5595993501429675, "RandomTreeDepth1Kappa": null, "J48.00001.AUC": null, "MaxSkewnessOfNumericAtts": 0.10979375313044712, "MinorityClassPercentage": null, "PercentageOfNumericFeatures": 100, "Quartile3MeansOfNumericAtts": 0.0036700753638252257, "CfsSubsetEval_DecisionStumpAUC": null, "RandomTreeDepth2AUC": null, "J48.00001.ErrRate": null, "MaxStdDevOfNumericAtts": 10.043091650980065, "MinorityClassSize": null, "PercentageOfSymbolicFeatures": 0, "Quartile3MutualInformation": null, "CfsSubsetEval_DecisionStumpErrRate": null, "RandomTreeDepth2ErrRate": null, "J48.00001.Kappa": null, "MeanAttributeEntropy": null, "NaiveBayesAUC": null, "Quartile1AttributeEntropy": null, "Quartile3SkewnessOfNumericAtts": 0.10926401384749387, "CfsSubsetEval_DecisionStumpKappa": null, "RandomTreeDepth2Kappa": null, "J48.0001.AUC": null, "MeanKurtosisOfNumericAtts": -0.09660835571472726, "NaiveBayesErrRate": null, "Quartile1KurtosisOfNumericAtts": -0.23453763894406854, "Quartile3StdDevOfNumericAtts": 8.95914524273415, "CfsSubsetEval_NaiveBayesAUC": null, "RandomTreeDepth3AUC": null, "J48.0001.ErrRate": null, "MeanMeansOfNumericAtts": 0.0008058939708938329, "MeanMutualInformation": null, "NaiveBayesKappa": null, "Quartile1MeansOfNumericAtts": -0.0017384875259876746, "REPTreeDepth1AUC": null, "CfsSubsetEval_NaiveBayesErrRate": null, "RandomTreeDepth3ErrRate": null, "J48.0001.Kappa": null, "MeanNoiseToSignalRatio": null, "NumberOfBinaryFeatures": 0, "Quartile1MutualInformation": null, "REPTreeDepth1ErrRate": null, "CfsSubsetEval_NaiveBayesKappa": null, "RandomTreeDepth3Kappa": null, "J48.001.AUC": null, "MeanNominalAttDistinctValues": null, "Quartile1SkewnessOfNumericAtts": -0.09381532042245767, "REPTreeDepth1Kappa": null, "CfsSubsetEval_kNN1NAUC": null, "StdvNominalAttDistinctValues": null, "J48.001.ErrRate": null }, "tags": [ { "tag": "https:\/\/www.openml.org\/s\/130", "uploader": "5824" } ], "features": [ { "name": "DENSITY", "index": "4", "type": "numeric", "distinct": "3784", "missing": "0", "target": "1", "min": "-12", "max": "11", "mean": "0", "stdev": "3" }, { "name": "RIDGE", "index": "0", "type": "numeric", "distinct": "3809", "missing": "0", "min": "-23", "max": "21", "mean": "0", "stdev": "6" }, { "name": "NUB", "index": "1", "type": "numeric", "distinct": "3811", "missing": "0", "min": "-16", "max": "17", "mean": "0", "stdev": "5" }, { "name": "CRACK", "index": "2", "type": "numeric", "distinct": "3816", "missing": "0", "min": "-31", "max": "30", "mean": "0", "stdev": "8" }, { "name": "WEIGHT", "index": "3", "type": "numeric", "distinct": "3826", "missing": "0", "min": "-34", "max": "36", "mean": "0", "stdev": "10" }, { "name": "OBSERVATION_NUMBER", "index": "5", "type": "numeric", "distinct": "3848", "missing": "0", "ignore": "1", "min": "1", "max": "3848", "mean": "1925", "stdev": "1111" } ], "nr_of_issues": 0, "nr_of_downvotes": 0, "nr_of_likes": 0, "nr_of_downloads": 0, "total_downloads": 0, "reach": 0, "reuse": 0, "impact_of_reuse": 0, "reach_of_reuse": 0, "impact": 0 }