

active ARFF Database: Open Database, Contents: Database Contents Visibility: public Uploaded 24-03-2022 by Dustin Carrion
0 likes downloaded by 0 people , 0 total downloads 0 issues 0 downvotes
  • Computer Systems Machine Learning
Issue #Downvotes for this reason By

Loading wiki
Help us complete this description Edit
Context This dataset deals with pollution in the U.S. Pollution in the U.S. has been well documented by the U.S. EPA but it is a pain to download all the data and arrange them in a format that interests data scientists. Hence I gathered four major pollutants (Nitrogen Dioxide, Sulphur Dioxide, Carbon Monoxide and Ozone) for every day from 2000 - 2016 and place them neatly in a CSV file. Content There is a total of 28 fields. The four pollutants (NO2, O3, SO2 and O3) each has 5 specific columns. Observations totaled to over 1.4 million. This kernel provides a good introduction to this dataset! For observations on specific columns visit the Column Metadata on the Data tab. Acknowledgements All the data is scraped from the database of U.S. EPA : Inspiration I did a related project with some of my friends in college, and decided to open source our dataset so that data scientists don't need to re-scrape the U.S. EPA site for historical pollution data.

29 features

Unnamed:_0numeric134576 unique values
0 missing
State_Codenumeric47 unique values
0 missing
County_Codenumeric73 unique values
0 missing
Site_Numnumeric110 unique values
0 missing
Addressstring204 unique values
0 missing
Statestring47 unique values
0 missing
Countystring133 unique values
0 missing
Citystring144 unique values
0 missing
Date_Localstring5996 unique values
0 missing
NO2_Unitsstring1 unique values
0 missing
NO2_Meannumeric31859 unique values
0 missing
NO2_1st_Max_Valuenumeric990 unique values
0 missing
NO2_1st_Max_Hournumeric24 unique values
0 missing
NO2_AQInumeric129 unique values
0 missing
O3_Unitsstring1 unique values
0 missing
O3_Meannumeric8196 unique values
0 missing
O3_1st_Max_Valuenumeric134 unique values
0 missing
O3_1st_Max_Hournumeric24 unique values
0 missing
O3_AQInumeric125 unique values
0 missing
SO2_Unitsstring1 unique values
0 missing
SO2_Meannumeric12736 unique values
0 missing
SO2_1st_Max_Valuenumeric921 unique values
0 missing
SO2_1st_Max_Hournumeric24 unique values
0 missing
SO2_AQInumeric140 unique values
872907 missing
CO_Unitsstring1 unique values
0 missing
CO_Meannumeric34123 unique values
0 missing
CO_1st_Max_Valuenumeric2698 unique values
0 missing
CO_1st_Max_Hournumeric24 unique values
0 missing
CO_AQInumeric107 unique values
873323 missing

19 properties

Number of instances (rows) of the dataset.
Number of attributes (columns) of the dataset.
Number of distinct values of the target attribute (if it is nominal).
Number of missing values in the dataset.
Number of instances with at least one value missing.
Number of numeric attributes.
Number of nominal attributes.
Number of attributes divided by the number of instances.
Percentage of numeric attributes.
Percentage of instances belonging to the most frequent class.
Percentage of nominal attributes.
Number of instances belonging to the most frequent class.
Percentage of instances belonging to the least frequent class.
Number of instances belonging to the least frequent class.
Number of binary attributes.
Percentage of binary attributes.
Percentage of instances having missing values.
Average class difference between consecutive instances.
Percentage of missing values.

0 tasks

Define a new task