{ "data_id": "45558", "name": "Pulsar-Dataset-HTRU2", "exact_name": "Pulsar-Dataset-HTRU2", "version": 2, "version_label": null, "description": "Pulsar candidates collected during the HTRU survey. Pulsars are a type of star, of considerable scientific interest. Candidates must be classified in to pulsar and non-pulsar classes to aid discovery.\n\n## Source:\n\nDr Robert Lyon, University of Manchester, School of Physics and Astronomy, Alan Turing Building, Manchester M13 9PL, United Kingdom, robert.lyon '@' manchester.ac.uk\n\n## Data Set Information:\n\nHTRU2 is a data set which describes a sample of pulsar candidates collected during the High Time Resolution Universe Survey (South) [1].\n\nPulsars are a rare type of Neutron star that produce radio emission detectable here on Earth. They are of considerable scientific interest as probes of space-time, the inter-stellar medium, and states of matter (see [2] for more uses).\n\nAs pulsars rotate, their emission beam sweeps across the sky, and when this crosses our line of sight, produces a detectable pattern of broadband radio emission. As pulsars\nrotate rapidly, this pattern repeats periodically. Thus pulsar search involves looking for periodic radio signals with large radio telescopes.\n\nEach pulsar produces a slightly different emission pattern, which varies slightly with each rotation (see [2] for an introduction to pulsar astrophysics to find out why). Thus a potential signal detection known as a 'candidate', is averaged over many rotations of the pulsar, as determined by the length of an observation. In the absence of additional info, each candidate could potentially describe a real pulsar. However in practice almost all detections are caused by radio frequency interference (RFI) and noise, making legitimate signals hard to find.\n\nMachine learning tools are now being used to automatically label pulsar candidates to facilitate rapid analysis. Classification systems in particular are being widely adopted,\n(see [4,5,6,7,8,9]) which treat the candidate data sets as binary classification problems. Here the legitimate pulsar examples are a minority positive class, and spurious examples the majority negative class. At present multi-class labels are unavailable, given the costs associated with data annotation.\n\nThe data set shared here contains 16,259 spurious examples caused by RFI\/noise, and 1,639 real pulsar examples. These examples have all been checked by human annotators.\n\nThe data is presented in two formats: CSV and ARFF (used by the WEKA data mining tool). Candidates are stored in both files in separate rows. Each row lists the variables first, and the class label is the final entry. The class labels used are 0 (negative) and 1 (positive).\n\nPlease note that the data contains no positional information or other astronomical details. It is simply feature data extracted from candidate files using the PulsarFeatureLab tool (see [10]).\n\n## Attribute Information:\n\nEach candidate is described by 8 continuous variables, and a single class variable. The first four are simple statistics obtained from the integrated pulse profile (folded profile). This is an array of continuous variables that describe a longitude-resolved version of the signal that has been averaged in both time and frequency (see [3] for more details). The remaining four variables are similarly obtained from the DM-SNR curve (again see [3] for more details). These are summarised below:\n\n1. Mean of the integrated profile.\n2. Standard deviation of the integrated profile.\n3. Excess kurtosis of the integrated profile.\n4. Skewness of the integrated profile.\n5. Mean of the DM-SNR curve.\n6. Standard deviation of the DM-SNR curve.\n7. Excess kurtosis of the DM-SNR curve.\n8. Skewness of the DM-SNR curve.\n9. Class\n\nHTRU 2 Summary\n17,898 total examples.\n1,639 positive examples.\n16,259 negative examples.\n\n## Relevant Papers:\n\n[1] M. J. Keith et al., 'The High Time Resolution Universe Pulsar Survey - I. System Configuration and Initial Discoveries',2010, Monthly Notices of the Royal Astronomical Society, vol. 409, pp. 619-627. DOI: 10.1111\/j.1365-2966.2010.17325.x\n\n[2] D. R. Lorimer and M. Kramer, 'Handbook of Pulsar Astronomy', Cambridge University Press, 2005.\n\n[3] R. J. Lyon, 'Why Are Pulsars Hard To Find?', PhD Thesis, University of Manchester, 2016.\n\n[4] R. J. Lyon et al., 'Fifty Years of Pulsar Candidate Selection: From simple filters to a new principled real-time classification approach', Monthly Notices of the Royal Astronomical Society 459 (1), 1104-1123, DOI: 10.1093\/mnras\/stw656\n\n[5] R. P. Eatough et al., 'Selection of radio pulsar candidates using artificial neural networks', Monthly Notices of the Royal Astronomical Society, vol. 407, no. 4, pp. 2443-2450, 2010.\n\n[6] S. D. Bates et al., 'The high time resolution universe pulsar survey vi. an artificial neural network and timing of 75 pulsars', Monthly Notices of the Royal Astronomical Society, vol. 427, no. 2, pp. 1052-1065, 2012.\n\n[7] D. Thornton, 'The High Time Resolution Radio Sky', PhD thesis, University of Manchester, Jodrell Bank Centre for Astrophysics School of Physics and Astronomy, 2013.\n\n[8] K. J. Lee et al., 'PEACE: pulsar evaluation algorithm for candidate extraction a software package for post-analysis processing of pulsar survey candidates', Monthly Notices of the Royal Astronomical Society, vol. 433, no. 1, pp. 688-694, 2013.\n\n[9] V. Morello et al., 'SPINN: a straightforward machine learning solution to the pulsar candidate selection problem', Monthly Notices of the Royal Astronomical Society, vol. 443, no. 2, pp. 1651-1662, 2014.\n\n[10] R. J. Lyon, 'PulsarFeatureLab', 2015, [Web Link](https:\/\/dx.doi.org\/10.6084\/m9.figshare.1536472.v1).\n\n## Note\n\nCompared to v1 this contains one additional sample, which was mistakenly used for the attribute names in v1.", "format": "arff", "uploader": "Matthias Feurer", "uploader_id": 86, "visibility": "public", "creator": null, "contributor": null, "date": "2023-06-05 09:37:32", "update_comment": null, "last_update": "2023-06-05 09:37:32", "licence": "Public", "status": "active", "error_message": null, "url": "https:\/\/api.openml.org\/data\/download\/22116526\/dataset", "kaggle_url": null, "default_target_attribute": "class", "row_id_attribute": null, "ignore_attribute": null, "runs": 0, "suggest": { "input": [ "Pulsar-Dataset-HTRU2", "Pulsar candidates collected during the HTRU survey. Pulsars are a type of star, of considerable scientific interest. Candidates must be classified in to pulsar and non-pulsar classes to aid discovery. ## Source: Dr Robert Lyon, University of Manchester, School of Physics and Astronomy, Alan Turing Building, Manchester M13 9PL, United Kingdom, robert.lyon '@' manchester.ac.uk ## Data Set Information: HTRU2 is a data set which describes a sample of pulsar candidates collected during the High Time " ], "weight": 5 }, "qualities": { "NumberOfInstances": 17898, "NumberOfFeatures": 9, "NumberOfClasses": 2, "NumberOfMissingValues": 0, "NumberOfInstancesWithMissingValues": 0, "NumberOfNumericFeatures": 8, "NumberOfSymbolicFeatures": 1, "PercentageOfBinaryFeatures": 11.11111111111111, "PercentageOfInstancesWithMissingValues": 0, "PercentageOfMissingValues": 0, "AutoCorrelation": 0.8615410403978321, "PercentageOfNumericFeatures": 88.88888888888889, "Dimensionality": 0.0005028494803888703, "PercentageOfSymbolicFeatures": 11.11111111111111, "MajorityClassPercentage": 90.8425522404738, "MajorityClassSize": 16259, "MinorityClassPercentage": 9.157447759526203, "MinorityClassSize": 1639, "NumberOfBinaryFeatures": 1 }, "tags": [], "features": [ { "name": "class", "index": "8", "type": "nominal", "distinct": "2", "missing": "0", "target": "1", "distr": [ [ "0", "1" ], [ [ "16259", "0" ], [ "0", "1639" ] ] ] }, { "name": "Profile_mean", "index": "0", "type": "numeric", "distinct": "8626", "missing": "0", "min": "6", "max": "193", "mean": "111", "stdev": "26" }, { "name": "Profile_stdev", "index": "1", "type": "numeric", "distinct": "17862", "missing": "0", "min": "25", "max": "99", "mean": "47", "stdev": "7" }, { "name": "Profile_skewness", "index": "2", "type": "numeric", "distinct": "17897", "missing": "0", "min": "-2", "max": "8", "mean": "0", "stdev": "1" }, { "name": "Profile_kurtosis", "index": "3", "type": "numeric", "distinct": "17898", "missing": "0", "min": "-2", "max": "68", "mean": "2", "stdev": "6" }, { "name": "DM_mean", "index": "4", "type": "numeric", "distinct": "9000", "missing": "0", "min": "0", "max": "223", "mean": "13", "stdev": "29" }, { "name": "DM_stdev", "index": "5", "type": "numeric", "distinct": "17894", "missing": "0", "min": "7", "max": "111", "mean": "26", "stdev": "19" }, { "name": "DM_skewness", "index": "6", "type": "numeric", "distinct": "17895", "missing": "0", "min": "-3", "max": "35", "mean": "8", "stdev": "5" }, { "name": "DM_kurtosis", "index": "7", "type": "numeric", "distinct": "17895", "missing": "0", "min": "-2", "max": "1191", "mean": "105", "stdev": "107" } ], "nr_of_issues": 0, "nr_of_downvotes": 0, "nr_of_likes": 0, "nr_of_downloads": 0, "total_downloads": 0, "reach": 0, "reuse": 0, "impact_of_reuse": 0, "reach_of_reuse": 0, "impact": 0 }