Context
This dataset deals with pollution in the U.S. Pollution in the U.S. has been well documented by the U.S. EPA but it is a pain to download all the data and arrange them in a format that interests data scientists. Hence I gathered four major pollutants (Nitrogen Dioxide, Sulphur Dioxide, Carbon Monoxide and Ozone) for every day from 2000 - 2016 and place them neatly in a CSV file.
Content
There is a total of 28 fields. The four pollutants (NO2, O3, SO2 and O3) each has 5 specific columns. Observations totaled to over 1.4 million. This kernel provides a good introduction to this dataset!
For observations on specific columns visit the Column Metadata on the Data tab.
Acknowledgements
All the data is scraped from the database of U.S. EPA : https://aqsdr1.epa.gov/aqsweb/aqstmp/airdata/download_files.html
Inspiration
I did a related project with some of my friends in college, and decided to open source our dataset so that data scientists don't need to re-scrape the U.S. EPA site for historical pollution data.