Data
Water_Quality

Water_Quality

active ARFF Public Domain (CC0) Visibility: public Uploaded 31-05-2024 by Iwo Godzwon
0 likes downloaded by 0 people , 0 total downloads 0 issues 0 downvotes
Issue #Downvotes for this reason By


Loading wiki
Help us complete this description Edit
Description: The dataset named "water-quality-1.csv" comprises a comprehensive collection of water quality measurements from various sites, meticulously recorded to monitor environmental health and pollution levels. It encompasses a diverse range of parameters such as Fecal Coliform, Conductivity Field, Temperature, Total Nitrogen, and Nitrite + Nitrate Nitrogen, crucial for assessing water quality in ecosystems. The data spans multiple years, offering insights into temporal changes affecting water bodies. Attribute Description: - Sample ID: Unique identifier for each sample (e.g., 58086). - Grab ID: Identifier for the specific collection instance, with some entries missing. - Profile ID: Unique profile number associated with each sample site (e.g., 46937). - Sample Number: A distinct code for each sample, combining letters and numbers (e.g., 'L47270-122'). - Collect DateTime: Date and time when the sample was collected, in MM/DD/YYYY HH:MM:SS AM/PM format. - Depth (m): Depth at which the sample was collected, in meters (e.g., 1.0). - Site Type: Classification of the water body from which the sample was taken (e.g., Large Lakes). - Area: Geographic location or name of the water body (e.g., Central Puget Sound). - Locator: A unique code for the site's location (e.g., KTHA03). - Site: Detailed description of the sample location (e.g., Lake Sammamish near Issaquah Creek). - Parameter: The water quality parameter measured (e.g., Fecal Coliform). - Value: The measured value for the parameter, with some missing entries. - Units: Measurement units for the parameter values (e.g., umhos/cm). - QualityId: A numerical value indicating the quality of the data (e.g., 2). - Lab Qualifier, MDL, RDL, Text Value, Sample Info, Steward Note, Replicates, Replicate Of, Method, Date Analyzed, Data Source: These fields contain additional information about the laboratory procedures, data quality, analysis methods, and sources. Use Case: This dataset is invaluable for researchers and environmentalists looking to study water quality trends, identify pollution hotspots, and evaluate the effectiveness of environmental policies over time. It can aid in comparative analysis across different water bodies and help in the formulation of strategies for water conservation and pollution control. Moreover, policymakers can utilize this data to enforce environmental regulations and initiate cleanup efforts in degraded aquatic ecosystems.

25 features

Sample IDnumeric154694 unique values
0 missing
Grab IDnumeric112985 unique values
376778 missing
Profile IDnumeric54951 unique values
0 missing
Sample Numberstring154694 unique values
0 missing
Collect DateTimestring102284 unique values
0 missing
Depth (m)numeric646 unique values
376778 missing
Site Typestring6 unique values
0 missing
Areastring67 unique values
133 missing
Locatorstring180 unique values
0 missing
Sitestring178 unique values
0 missing
Parameterstring47 unique values
0 missing
Valuenumeric6012 unique values
109085 missing
Unitsstring23 unique values
780 missing
QualityIdnumeric8 unique values
0 missing
Lab Qualifierstring51 unique values
1110071 missing
MDLnumeric165 unique values
651711 missing
RDLnumeric471 unique values
653298 missing
Text Valuestring24856 unique values
1030752 missing
Sample Infostring353 unique values
1256301 missing
Steward Notestring64 unique values
1258764 missing
Replicatesnumeric202 unique values
1257803 missing
Replicate Ofnumeric202 unique values
1257913 missing
Methodstring202 unique values
190439 missing
Date Analyzedstring4610 unique values
691662 missing
Data Sourcestring1 unique values
0 missing

19 properties

1259444
Number of instances (rows) of the dataset.
25
Number of attributes (columns) of the dataset.
Number of distinct values of the target attribute (if it is nominal).
10222268
Number of missing values in the dataset.
1259444
Number of instances with at least one value missing.
10
Number of numeric attributes.
0
Number of nominal attributes.
0
Percentage of binary attributes.
100
Percentage of instances having missing values.
32.47
Percentage of missing values.
Average class difference between consecutive instances.
40
Percentage of numeric attributes.
0
Number of attributes divided by the number of instances.
0
Percentage of nominal attributes.
Percentage of instances belonging to the most frequent class.
Number of instances belonging to the most frequent class.
Percentage of instances belonging to the least frequent class.
Number of instances belonging to the least frequent class.
0
Number of binary attributes.

0 tasks

Define a new task