{ "data_id": "199", "name": "fruitfly", "exact_name": "fruitfly", "version": 1, "version_label": "1", "description": "**Author**: \n**Source**: Unknown - \n**Please cite**: \n\n!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!\n\n Identifier attribute deleted.\n\n !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!\n\n NAME: Sexual activity and the lifespan of male fruitflies\n TYPE: Designed (almost factorial) experiment\n SIZE: 125 observations, 5 variables\n \n DESCRIPTIVE ABSTRACT:\n A cost of increased reproduction in terms of reduced longevity has been\n shown for female fruitflies, but not for males. The flies used were an\n outbred stock. Sexual activity was manipulated by supplying individual\n males with one or eight receptive virgin females per day. The\n longevity of these males was compared with that of two control types.\n The first control consisted of two sets of individual males kept with\n one or eight newly inseminated females. Newly inseminated females will\n not usually remate for at least two days, and thus served as a control\n for any effect of competition with the male for food or space. The\n second control was a set of individual males kept with no females.\n There were 25 males in each of the five groups, which were treated\n identically in number of anaesthetizations (using CO2) and provision of\n fresh food medium.\n \n SOURCE:\n Figure 2 in the article \"Sexual Activity and the Lifespan of Male\n Fruitflies\" by Linda Partridge and Marion Farquhar. _Nature_, 294,\n 580-581, 1981.\n \n VARIABLE DESCRIPTIONS:\n Columns Variable Description\n ------- -------- -----------\n 1- 2 ID Serial No. (1-25) within each group of 25\n (the order in which data points were abstracted)\n \n 4 PARTNERS Number of companions (0, 1 or 8)\n \n 6 TYPE Type of companion\n 0: newly pregnant female\n 1: virgin female\n 9: not applicable (when PARTNERS=0)\n \n 8- 9 LONGEVITY Lifespan, in days\n \n 11-14 THORAX Length of thorax, in mm (x.xx)\n \n 16-17 SLEEP Percentage of each day spent sleeping\n \n \n SPECIAL NOTES:\n `Compliance' of the males in the two experimental groups was documented\n as follows: On two days per week throughout the life of each\n experimental male, the females that had been supplied as virgins to\n that male were kept and examined for fertile eggs. The insemination\n rate declined from approximately 7 females\/day at age one week to just\n under 2\/day at age eight weeks in the males supplied with eight virgin\n females per day, and from just under 1\/day at age one week to\n approximately 0.6\/day at age eight weeks in the males supplied with one\n virgin female per day. These `compliance' data were not supplied for\n individual males, but the authors say that \"There were no significant\n differences between the individual males within each experimental\n group.\"\n \n STORY BEHIND THE DATA:\n James Hanley found this dataset in _Nature_ and was attracted by the\n way the raw data were presented in classical analysis of covariance\n style in Figure 2. He read the data points from the graphs and brought\n them to the attention of a colleague with whom he was teaching the\n applied statistics course. Dr. Liddell thought that with only three\n explanatory variables (THORAX, plus PARTNERS and TYPE to describe the\n five groups), it would not be challenging enough as a data-analysis\n project. He suggested adding another variable. James Hanley added\n SLEEP, a variable not mentioned in the published article. Teachers can\n contact us about the construction of this variable. (We prefer to\n divulge the details at the end of the data-analysis project.)\n \n Further discussion of the background and pedagogical use of this\n dataset can be found in Hanley (1983) and in Hanley and Shapiro\n (1994). To obtain the Hanley and Shapiro article, send the one-line\n e-mail message:\n send jse\/v2n1\/datasets.hanley\n to the address archive@jse.stat.ncsu.edu\n \n PEDAGOGICAL NOTES:\n This has been the most successful and the most memorable dataset we\n have used in an \"applications of statistics\" course, which we have\n taught for ten years. The most common analysis techniques have been\n analysis of variance, classical analysis of covariance, and multiple\n regression. Because the variable THORAX is so strong (it explains\n about 1\/3 of the variance in LONGEVITY), it is important to consider it\n to increase the precision of between-group contrasts. When students\n first check and find that the distributions of thorax length, and in\n particular, the mean thorax length, are very similar in the different\n groups, many of them are willing to say (in epidemiological\n terminology) that THORAX is not a confounding variable, and that it can\n be omitted from the analysis.\n \n There is usually lively discussion about the primary contrast. The\n five groups and their special structure allow opportunities for\n students to understand and verbalize what we mean by the term\n \"statistical interaction.\"\n \n There is also much debate as to whether one should take the SLEEP\n variable into account. Some students say that it is an `intermediate'\n variable. Some students formally test the mean level of SLEEP across\n groups, find one pair where there is a statistically significant\n difference, and want to treat it as a confounding variable. A few\n students muse about how it was measured.\n \n There is heteroscedasticity in the LONGEVITY variable.\n \n One very observant student (now a professor) argued that THORAX cannot\n be used as a predictor or explanatory variable for the LONGEVITY\n outcome since fruitflies who die young may not be fully grown, i.e., it\n is also an intermediate variable. One Ph.D. student who had studied\n entomology assured us that fruitflies do not grow longer after birth;\n therefore, the THORAX length is not time-dependent!\n \n Curiously, the dataset has seldom been analyzed using techniques from\n survival analysis. The fact that there are no censored observations is\n not really an excuse, and one could easily devise a way to introduce\n censoring of LONGEVITY.\n \n REFERENCES:\n Hanley, J. A. (1983), \"Appropriate Uses of Multivariate Analysis,\"\n _Annual Review of Public Health_, 4, 155-180.\n \n Hanley, J. A., and Shapiro, S. H. (1994), \"Sexual Activity and the\n Lifespan of Male Fruitflies: A Dataset That Gets Attention,\" _Journal\n of Statistics Education_, Volume 2, Number 1.\n \n SUBMITTED BY:\n James A. Hanley and Stanley H. Shapiro\n Department of Epidemiology and Biostatistics\n McGill University\n 1020 Pine Avenue West\n Montreal, Quebec, H3A 1A2\n Canada\n tel: +1 (514) 398-6270 (JH) \n +1 (514) 398-6272 (SS)\n fax: +1 (514) 398-4503\n INJH@musicb.mcgill.ca, StanS@epid.lan.mcgill.ca", "format": "ARFF", "uploader": "Jan van Rijn", "uploader_id": 1, "visibility": "public", "creator": "L. Partridge and M. Farquhar", "contributor": "James A. Hanley and Stanley H. Shapiro", "date": "2014-04-23 13:16:42", "update_comment": null, "last_update": "2014-04-23 13:16:42", "licence": "Public", "status": "active", "error_message": null, "url": "https:\/\/www.openml.org\/data\/download\/3636\/dataset_2185_fruitfly.arff", "default_target_attribute": "class", "row_id_attribute": null, "ignore_attribute": null, "runs": 4, "suggest": { "input": [ "fruitfly", "!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Identifier attribute deleted. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! NAME: Sexual activity and the lifespan of male fruitflies TYPE: Designed (almost factorial) experiment SIZE: 125 observations, 5 variables DESCRIPTIVE ABSTRACT: A cost of increased reproduction in terms of reduced longevity has been shown for female fruitflies, but not for males. The flies used were an outbred stock. Sexual activity was manipulated by supplying individual males with one or eight r " ], "weight": 5 }, "qualities": { "NumberOfInstances": 125, "NumberOfFeatures": 5, "NumberOfClasses": 0, "NumberOfMissingValues": 0, "NumberOfInstancesWithMissingValues": 0, "NumberOfNumericFeatures": 3, "NumberOfSymbolicFeatures": 2, "Quartile2SkewnessOfNumericAtts": -0.011618715414205413, "REPTreeDepth3Kappa": null, "DecisionStumpKappa": null, "MaxMeansOfNumericAtts": 57.44, "MinMutualInformation": null, "PercentageOfBinaryFeatures": 0, "Quartile2StdDevOfNumericAtts": 15.878847768387132, "RandomTreeDepth1AUC": null, "Dimensionality": 0.04, "MaxMutualInformation": null, "MinNominalAttDistinctValues": 3, "PercentageOfInstancesWithMissingValues": 0, "Quartile3AttributeEntropy": null, "RandomTreeDepth1ErrRate": null, "EquivalentNumberOfAtts": null, "MaxNominalAttDistinctValues": 3, "MinSkewnessOfNumericAtts": -0.6380573853536728, "PercentageOfMissingValues": 0, "Quartile3KurtosisOfNumericAtts": 3.1484095157236704, "AutoCorrelation": -16.653225806451612, "RandomTreeDepth1Kappa": null, "J48.00001.AUC": null, "MaxSkewnessOfNumericAtts": 1.5903052309118162, "MinStdDevOfNumericAtts": 0.07745366981455389, "PercentageOfNumericFeatures": 60, "Quartile3MeansOfNumericAtts": 57.44, "CfsSubsetEval_DecisionStumpAUC": null, "RandomTreeDepth2AUC": null, "J48.00001.ErrRate": null, "MaxStdDevOfNumericAtts": 17.563892580537072, "MinorityClassPercentage": null, "PercentageOfSymbolicFeatures": 40, "Quartile3MutualInformation": null, "CfsSubsetEval_DecisionStumpErrRate": null, "RandomTreeDepth2ErrRate": null, "J48.00001.Kappa": null, "MeanAttributeEntropy": null, "MinorityClassSize": null, "Quartile1AttributeEntropy": null, "Quartile3SkewnessOfNumericAtts": 1.5903052309118162, "CfsSubsetEval_DecisionStumpKappa": null, "RandomTreeDepth2Kappa": null, "J48.0001.AUC": null, "MeanKurtosisOfNumericAtts": 0.7789944524450039, "NaiveBayesAUC": null, "Quartile1KurtosisOfNumericAtts": -0.410404642598019, "Quartile3StdDevOfNumericAtts": 17.563892580537072, "CfsSubsetEval_NaiveBayesAUC": null, "RandomTreeDepth3AUC": null, "J48.0001.ErrRate": null, "MeanMeansOfNumericAtts": 27.241653333333332, "NaiveBayesErrRate": null, "Quartile1MeansOfNumericAtts": 0.82096, "REPTreeDepth1AUC": null, "CfsSubsetEval_NaiveBayesErrRate": null, "RandomTreeDepth3ErrRate": null, "J48.0001.Kappa": null, "MeanMutualInformation": null, "NaiveBayesKappa": null, "Quartile1MutualInformation": null, "REPTreeDepth1ErrRate": null, "CfsSubsetEval_NaiveBayesKappa": null, "RandomTreeDepth3Kappa": null, "J48.001.AUC": null, "MeanNoiseToSignalRatio": null, "NumberOfBinaryFeatures": 0, "Quartile1SkewnessOfNumericAtts": -0.6380573853536728, "REPTreeDepth1Kappa": null, "CfsSubsetEval_kNN1NAUC": null, "StdvNominalAttDistinctValues": 0, "J48.001.ErrRate": null, "MeanNominalAttDistinctValues": 3, "Quartile1StdDevOfNumericAtts": 0.07745366981455389, "REPTreeDepth2AUC": null, "CfsSubsetEval_kNN1NErrRate": null, "kNN1NAUC": null, "J48.001.Kappa": null, "MeanSkewnessOfNumericAtts": 0.3135430433813126, "Quartile2AttributeEntropy": null, "REPTreeDepth2ErrRate": null, "CfsSubsetEval_kNN1NKappa": null, "kNN1NErrRate": null, "MajorityClassPercentage": null, "MeanStdDevOfNumericAtts": 11.173398006246252, "Quartile2KurtosisOfNumericAtts": -0.4010215157906396, "REPTreeDepth2Kappa": null, "ClassEntropy": null, "kNN1NKappa": null, "MajorityClassSize": null, "MinAttributeEntropy": null, "Quartile2MeansOfNumericAtts": 23.464, "REPTreeDepth3AUC": null, "DecisionStumpAUC": null, "MaxAttributeEntropy": null, "MinKurtosisOfNumericAtts": -0.410404642598019, "Quartile2MutualInformation": null, "REPTreeDepth3ErrRate": null, "DecisionStumpErrRate": null, "MaxKurtosisOfNumericAtts": 3.1484095157236704, "MinMeansOfNumericAtts": 0.82096 }, "tags": [ { "uploader": "38960", "tag": "Life Science" }, { "uploader": "38960", "tag": "Statistics" }, { "uploader": "7210", "tag": "survival" }, { "uploader": "7210", "tag": "survival-analysis" } ], "features": [ { "name": "class", "index": "4", "type": "numeric", "distinct": "47", "missing": "0", "target": "1", "min": "1", "max": "83", "mean": "23", "stdev": "16" }, { "name": "PARTNERS", "index": "0", "type": "nominal", "distinct": "3", "missing": "0", "distr": [] }, { "name": "TYPE", "index": "1", "type": "nominal", "distinct": "3", "missing": "0", "distr": [] }, { "name": "THORAX", "index": "2", "type": "numeric", "distinct": "46", "missing": "0", "min": "16", "max": "97", "mean": "57", "stdev": "18" }, { "name": "SLEEP", "index": "3", "type": "numeric", "distinct": "14", "missing": "0", "min": "1", "max": "1", "mean": "1", "stdev": "0" } ], "nr_of_issues": 0, "nr_of_downvotes": 0, "nr_of_likes": 0, "nr_of_downloads": 0, "total_downloads": 0, "reach": 0, "reuse": 0, "impact_of_reuse": 0, "reach_of_reuse": 0, "impact": 0 }