{ "data_id": "46250", "name": "US-Births", "exact_name": "US-Births", "version": 1, "version_label": null, "description": "Number of births in the United States.\n\nFrom original source:\n-----\nNumber of births in the United States. There are several data sets covering different date ranges and\nobtaining data from different sources.\n-----\n\nThis dataset correspond to the births betweehn 1968-1988.\n\nThere are 4 columns:\n\nid_series: The id of the time series.\n\ndate: The date of the time series in the format \"%Y-%m-%d\".\n\ntime_step: The time step on the time series.\n\nvalue_0: The values of the time series, which will be used for the forecasting task.\n\nPreprocessing:\n\n1 - Created 'date' column with columns 'year', 'month', 'day', in the format %Y-%m-%d.\n\n2 - Dropped columns 'year', 'month', 'day', 'day_of_year', 'day_of_week'.\n\n3 - Created the column 'id_series' with value 0, there is only one long time series.\n\n4 - Ensured that there are no missing dates and that the frequency of the time_series is daily.\n\n5 - Created column 'time_step' with increasing values of time step for the time series.\n\n6 - Renamed column 'births' to 'value_0'.\n\n6 - Casted column 'value_0' to int. Defined 'id_series' as 'category'.", "format": "arff", "uploader": "Bruno Belucci Teixeira", "uploader_id": 30703, "visibility": "public", "creator": "\"Randall Pruim, Daniel Kaplan, Nicholas Horton\"", "contributor": "\"Bruno Belucci\"", "date": "2024-06-25 01:00:19", "update_comment": null, "last_update": "2024-06-25 01:00:19", "licence": "Creative Commons Attribution 4.0 International", "status": "active", "error_message": null, "url": "https:\/\/api.openml.org\/data\/download\/22120714\/dataset", "default_target_attribute": null, "row_id_attribute": null, "ignore_attribute": null, "runs": 0, "suggest": { "input": [ "US-Births", "Number of births in the United States. From original source: ----- Number of births in the United States. There are several data sets covering different date ranges and obtaining data from different sources. ----- This dataset correspond to the births betweehn 1968-1988. There are 4 columns: id_series: The id of the time series. date: The date of the time series in the format \"%Y-%m-%d\". time_step: The time step on the time series. value_0: The values of the time series, which will be used for t " ], "weight": 5 }, "qualities": { "NumberOfInstances": 7305, "NumberOfFeatures": 4, "NumberOfClasses": null, "NumberOfMissingValues": 0, "NumberOfInstancesWithMissingValues": 0, "NumberOfNumericFeatures": 2, "NumberOfSymbolicFeatures": 1, "PercentageOfNumericFeatures": 50, "Dimensionality": 0.0005475701574264203, "PercentageOfSymbolicFeatures": 25, "MajorityClassPercentage": null, "MajorityClassSize": null, "MinorityClassPercentage": null, "MinorityClassSize": null, "NumberOfBinaryFeatures": 0, "PercentageOfBinaryFeatures": 0, "PercentageOfInstancesWithMissingValues": 0, "PercentageOfMissingValues": 0, "AutoCorrelation": null }, "tags": [], "features": [ { "name": "id_series", "index": "0", "type": "nominal", "distinct": "1", "missing": "0", "distr": [] }, { "name": "date", "index": "1", "type": "string", "distinct": "7305", "missing": "0" }, { "name": "value_0", "index": "2", "type": "numeric", "distinct": "3538", "missing": "0", "min": "6675", "max": "12851", "mean": "9649", "stdev": "1127" }, { "name": "time_step", "index": "3", "type": "numeric", "distinct": "7305", "missing": "0", "min": "0", "max": "7304", "mean": "3652", "stdev": "2109" } ], "nr_of_issues": 0, "nr_of_downvotes": 0, "nr_of_likes": 0, "nr_of_downloads": 0, "total_downloads": 0, "reach": 0, "reuse": 0, "impact_of_reuse": 0, "reach_of_reuse": 0, "impact": 0 }