Data
communities_and_crime

communities_and_crime

active ARFF CC BY 4.0 Visibility: public Uploaded 23-07-2024 by Bruno Belucci Teixeira
0 likes downloaded by 0 people , 0 total downloads 0 issues 0 downvotes
Issue #Downvotes for this reason By


Loading wiki
Help us complete this description Edit
From original source: ----- Communities within the United States. The data combines socio-economic data from the 1990 US Census, law enforcement data from the 1990 US LEMAS survey, and crime data from the 1995 FBI UCR. Additional Information Many variables are included so that algorithms that select or learn weights for attributes could be tested. However, clearly unrelated attributes were not included; attributes were picked if there was any plausible connection to crime (N=122), plus the attribute to be predicted (Per Capita Violent Crimes). The variables included in the dataset involve the community, such as the percent of the population considered urban, and the median family income, and involving law enforcement, such as per capita number of police officers, and percent of officers assigned to drug units. The per capita violent crimes variable was calculated using population and the sum of crime variables considered violent crimes in the United States: murder, rape, robbery, and assault. There was apparently some controversy in some states concerning the counting of rapes. These resulted in missing values for rape, which resulted in incorrect values for per capita violent crime. These cities are not included in the dataset. Many of these omitted communities were from the midwestern USA. Data is described below based on original values. All numeric data was normalized into the decimal range 0.00-1.00 using an Unsupervised, equal-interval binning method. Attributes retain their distribution and skew (hence for example the population attribute has a mean value of 0.06 because most communities are small). E.g. An attribute described as 'mean people per household' is actually the normalized (0-1) version of that value. The normalization preserves rough ratios of values WITHIN an attribute (e.g. double the value for double the population within the available precision - except for extreme values (all values more than 3 SD above the mean are normalized to 1.00; all values more than 3 SD below the mean are nromalized to 0.00)). However, the normalization does not preserve relationships between values BETWEEN attributes (e.g. it would not be meaningful to compare the value for whitePerCap with the value for blackPerCap for a community) A limitation was that the LEMAS survey was of the police departments with at least 100 officers, plus a random sample of smaller departments. For our purposes, communities not found in both census and crime datasets were omitted. Many communities are missing LEMAS data. ----- Columns with index 0,1,2,3,4 were deleted from the dataset, usually because they related to some kind of index.

123 features

127 (target)numeric98 unique values
0 missing
5numeric66 unique values
0 missing
6numeric93 unique values
0 missing
7numeric100 unique values
0 missing
8numeric99 unique values
0 missing
9numeric91 unique values
0 missing
10numeric91 unique values
0 missing
11numeric93 unique values
0 missing
12numeric89 unique values
0 missing
13numeric94 unique values
0 missing
14numeric98 unique values
0 missing
15numeric67 unique values
0 missing
16numeric64 unique values
0 missing
17numeric99 unique values
0 missing
18numeric96 unique values
0 missing
19numeric99 unique values
0 missing
20numeric96 unique values
0 missing
21numeric96 unique values
0 missing
22numeric101 unique values
0 missing
23numeric93 unique values
0 missing
24numeric98 unique values
0 missing
25numeric98 unique values
0 missing
26numeric101 unique values
0 missing
27numeric91 unique values
0 missing
28numeric86 unique values
0 missing
29numeric98 unique values
0 missing
30numeric97 unique values
1 missing
31numeric94 unique values
0 missing
32numeric66 unique values
0 missing
33numeric100 unique values
0 missing
34numeric97 unique values
0 missing
35numeric99 unique values
0 missing
36numeric96 unique values
0 missing
37numeric98 unique values
0 missing
38numeric96 unique values
0 missing
39numeric100 unique values
0 missing
40numeric96 unique values
0 missing
41numeric98 unique values
0 missing
42numeric99 unique values
0 missing
43numeric98 unique values
0 missing
44numeric96 unique values
0 missing
45numeric91 unique values
0 missing
46numeric94 unique values
0 missing
47numeric92 unique values
0 missing
48numeric101 unique values
0 missing
49numeric97 unique values
0 missing
50numeric99 unique values
0 missing
51numeric96 unique values
0 missing
52numeric95 unique values
0 missing
53numeric98 unique values
0 missing
54numeric55 unique values
0 missing
55numeric97 unique values
0 missing
56numeric47 unique values
0 missing
57numeric99 unique values
0 missing
58numeric100 unique values
0 missing
59numeric97 unique values
0 missing
60numeric97 unique values
0 missing
61numeric95 unique values
0 missing
62numeric97 unique values
0 missing
63numeric98 unique values
0 missing
64numeric100 unique values
0 missing
65numeric98 unique values
0 missing
66numeric94 unique values
0 missing
67numeric99 unique values
0 missing
68numeric96 unique values
0 missing
69numeric96 unique values
0 missing
70numeric94 unique values
0 missing
71numeric98 unique values
0 missing
72numeric100 unique values
0 missing
73numeric94 unique values
0 missing
74numeric100 unique values
0 missing
75numeric3 unique values
0 missing
76numeric70 unique values
0 missing
77numeric92 unique values
0 missing
78numeric99 unique values
0 missing
79numeric97 unique values
0 missing
80numeric98 unique values
0 missing
81numeric49 unique values
0 missing
82numeric99 unique values
0 missing
83numeric91 unique values
0 missing
84numeric99 unique values
0 missing
85numeric100 unique values
0 missing
86numeric98 unique values
0 missing
87numeric101 unique values
0 missing
88numeric99 unique values
0 missing
89numeric99 unique values
0 missing
90numeric100 unique values
0 missing
91numeric95 unique values
0 missing
92numeric97 unique values
0 missing
93numeric70 unique values
0 missing
94numeric54 unique values
0 missing
95numeric53 unique values
0 missing
96numeric96 unique values
0 missing
97numeric99 unique values
0 missing
98numeric99 unique values
0 missing
99numeric100 unique values
0 missing
100numeric97 unique values
0 missing
101numeric38 unique values
1675 missing
102numeric52 unique values
1675 missing
103numeric34 unique values
1675 missing
104numeric55 unique values
1675 missing
105numeric44 unique values
1675 missing
106numeric59 unique values
1675 missing
107numeric75 unique values
1675 missing
108numeric52 unique values
1675 missing
109numeric76 unique values
1675 missing
110numeric74 unique values
1675 missing
111numeric73 unique values
1675 missing
112numeric54 unique values
1675 missing
113numeric50 unique values
1675 missing
114numeric72 unique values
1675 missing
115numeric30 unique values
1675 missing
116numeric15 unique values
1675 missing
117numeric77 unique values
1675 missing
118numeric61 unique values
0 missing
119numeric96 unique values
0 missing
120numeric98 unique values
0 missing
121numeric63 unique values
1675 missing
122numeric38 unique values
1675 missing
123numeric72 unique values
1675 missing
124numeric3 unique values
1675 missing
125numeric80 unique values
0 missing
126numeric51 unique values
1675 missing

19 properties

1994
Number of instances (rows) of the dataset.
123
Number of attributes (columns) of the dataset.
0
Number of distinct values of the target attribute (if it is nominal).
36851
Number of missing values in the dataset.
1675
Number of instances with at least one value missing.
123
Number of numeric attributes.
0
Number of nominal attributes.
0
Percentage of binary attributes.
84
Percentage of instances having missing values.
15.03
Percentage of missing values.
0.76
Average class difference between consecutive instances.
100
Percentage of numeric attributes.
0.06
Number of attributes divided by the number of instances.
Percentage of instances belonging to the most frequent class.
0
Percentage of nominal attributes.
Number of instances belonging to the most frequent class.
Percentage of instances belonging to the least frequent class.
Number of instances belonging to the least frequent class.
0
Number of binary attributes.

1 tasks

0 runs - estimation_procedure: 10-fold Crossvalidation - target_feature: 127
Define a new task