OpenML
communities-and-crime

communities-and-crime

active ARFF Publicly available Visibility: public Uploaded 28-05-2021 by Hage Tuin
0 likes downloaded by 0 people , 0 total downloads 0 issues 0 downvotes
Issue #Downvotes for this reason By


Loading wiki
Help us complete this description Edit
Author: Michael Redmond (redmond 'at' lasalle.edu); Computer Science; La Salle University; Philadelphia, PA, 19141, USA Source: [UCI](https://archive.ics.uci.edu/ml/datasets/Communities+and+Crime+Unnormalized) - 2011 Please cite: U. S. Department of Commerce, Bureau of the Census, Census Of Population And Housing 1990 United States: Summary Tape File 1a & 3a (Computer Files), U.S. Department Of Commerce, Bureau Of The Census Producer, Washington, DC and Inter-university Consortium for Political and Social Research Ann Arbor, Michigan. (1992) U.S. Department of Justice, Bureau of Justice Statistics, Law Enforcement Management And Administrative Statistics (Computer File) U.S. Department Of Commerce, Bureau Of The Census Producer, Washington, DC and Inter-university Consortium for Political and Social Research Ann Arbor, Michigan. (1992) U.S. Department of Justice, Federal Bureau of Investigation, Crime in the United States (Computer File) (1995) Description The source datasets needed to be combined via programming. Many variables are included so that algorithms that select or learn weights for attributes could be tested. However, clearly unrelated attributes were not included; attributes were picked if there was any plausible connection to crime (N=125), plus the crime variables which are potential dependent variables. The variables included in the dataset involve the community, such as the percent of the population considered urban, and the median family income, and involving law enforcement, such as per capita number of police officers, and percent of officers assigned to drug units. The crime attributes (N=18) that could be predicted are the 8 crimes considered 'Index Crimes' by the FBI)(Murders, Rape, Robbery, .... ), per capita (actually per 100,000 population) versions of each, and Per Capita Violent Crimes and Per Capita Nonviolent Crimes). A limitation was that the LEMAS survey was of the police departments with at least 100 officers, plus a random sample of smaller departments. For our purposes, communities not found in both census and crime datasets were omitted. Many communities are missing LEMAS data. The per capita crimes variables were calculated using population values included in the 1995 FBI data (which differ from the 1990 Census values). The per capita violent crimes variable was calculated using population and the sum of crime variables considered violent crimes in the United States: murder, rape, robbery, and assault. There was apparently some controversy in some states concerning the counting of rapes. These resulted in missing values for rape, which resulted in missing values for per capita violent crime. Many of these omitted communities were from the midwestern USA (Minnesota, Illinois, and Michigan have many of these). The per capita nonviolent crime variable was calculated using the sum of crime variables considered non-violent crimes in the United States: burglaries, larcenies, auto thefts and arsons. (There are many other types of crimes, these only include FBI 'Index Crimes') Some further pre-processing of the dataset must be done. Choose the desirable dependent variable from among the 18 possible. It would not be interesting or appropriate to predict total crime (e.g. violent crime) while including subtotals (e.g. murders) as independent variables. There are also identifying variables (community name, county code, community code) that are not predictive, and would get in the way of some algorithms. Weka's Unsupervised Attribute Remove Filter can be used to remove unwanted attributes. The FBI notes that use of this data to evaluate communities is over-simplistic, as many relevant factors are not included. For one example, communities with large numbers of visitors will have higher per capita crime (measured by residents) than communities with fewer visitors, other things being equal.

147 features

0string2018 unique values
0 missing
1string48 unique values
0 missing
2string114 unique values
1221 missing
3string959 unique values
1224 missing
4numeric10 unique values
0 missing
5numeric2154 unique values
0 missing
6numeric198 unique values
0 missing
7numeric1172 unique values
0 missing
8numeric1609 unique values
0 missing
9numeric667 unique values
0 missing
10numeric1026 unique values
0 missing
11numeric950 unique values
0 missing
12numeric1184 unique values
0 missing
13numeric947 unique values
0 missing
14numeric1221 unique values
0 missing
15numeric1600 unique values
0 missing
16numeric293 unique values
0 missing
17numeric2141 unique values
0 missing
18numeric1536 unique values
0 missing
19numeric290 unique values
0 missing
20numeric1774 unique values
0 missing
21numeric1548 unique values
0 missing
22numeric1125 unique values
0 missing
23numeric1258 unique values
0 missing
24numeric2150 unique values
0 missing
25numeric2069 unique values
0 missing
26numeric2074 unique values
0 missing
27numeric2035 unique values
0 missing
28numeric1925 unique values
0 missing
29numeric2088 unique values
0 missing
30string1917 unique values
1 missing
31numeric2045 unique values
0 missing
32numeric1904 unique values
0 missing
33numeric1462 unique values
0 missing
34numeric1275 unique values
0 missing
35numeric1690 unique values
0 missing
36numeric1660 unique values
0 missing
37numeric868 unique values
0 missing
38numeric1544 unique values
0 missing
39numeric1543 unique values
0 missing
40numeric1364 unique values
0 missing
41numeric1429 unique values
0 missing
42numeric1529 unique values
0 missing
43numeric948 unique values
0 missing
44numeric1411 unique values
0 missing
45numeric1052 unique values
0 missing
46numeric994 unique values
0 missing
47numeric161 unique values
0 missing
48numeric1664 unique values
0 missing
49numeric1687 unique values
0 missing
50numeric1679 unique values
0 missing
51numeric1635 unique values
0 missing
52numeric1546 unique values
0 missing
53numeric1433 unique values
0 missing
54numeric1184 unique values
0 missing
55numeric768 unique values
0 missing
56numeric1666 unique values
0 missing
57numeric1481 unique values
0 missing
58numeric1658 unique values
0 missing
59numeric1755 unique values
0 missing
60numeric1808 unique values
0 missing
61numeric448 unique values
0 missing
62numeric568 unique values
0 missing
63numeric670 unique values
0 missing
64numeric738 unique values
0 missing
65numeric1416 unique values
0 missing
66numeric625 unique values
0 missing
67numeric785 unique values
0 missing
68numeric680 unique values
0 missing
69numeric185 unique values
0 missing
70numeric185 unique values
0 missing
71numeric214 unique values
0 missing
72numeric1802 unique values
0 missing
73numeric850 unique values
0 missing
74numeric1755 unique values
0 missing
75numeric4 unique values
0 missing
76numeric1314 unique values
0 missing
77numeric1019 unique values
0 missing
78numeric1801 unique values
0 missing
79numeric715 unique values
0 missing
80numeric1780 unique values
0 missing
81numeric49 unique values
0 missing
82numeric972 unique values
0 missing
83numeric181 unique values
0 missing
84numeric1196 unique values
0 missing
85numeric1270 unique values
0 missing
86numeric1344 unique values
0 missing
87numeric900 unique values
0 missing
88numeric555 unique values
0 missing
89numeric622 unique values
0 missing
90numeric659 unique values
0 missing
91numeric370 unique values
0 missing
92numeric609 unique values
0 missing
93numeric159 unique values
0 missing
94numeric154 unique values
0 missing
95numeric87 unique values
0 missing
96numeric274 unique values
0 missing
97numeric128 unique values
0 missing
98numeric1180 unique values
0 missing
99numeric1833 unique values
0 missing
100numeric1690 unique values
0 missing
101numeric1623 unique values
0 missing
102numeric1337 unique values
0 missing
103string220 unique values
1872 missing
104string343 unique values
1872 missing
105string215 unique values
1872 missing
106string342 unique values
1872 missing
107string319 unique values
1872 missing
108string343 unique values
1872 missing
109string336 unique values
1872 missing
110string322 unique values
1872 missing
111string317 unique values
1872 missing
112string317 unique values
1872 missing
113string288 unique values
1872 missing
114string223 unique values
1872 missing
115string113 unique values
1872 missing
116string313 unique values
1872 missing
117string67 unique values
1872 missing
118string15 unique values
1872 missing
119string304 unique values
1872 missing
120numeric572 unique values
0 missing
121numeric2162 unique values
0 missing
122numeric746 unique values
0 missing
123string193 unique values
1872 missing
124string341 unique values
1872 missing
125string307 unique values
1872 missing
126string3 unique values
1872 missing
127numeric266 unique values
0 missing
128string343 unique values
1872 missing
129numeric92 unique values
0 missing
130numeric906 unique values
0 missing
131string171 unique values
208 missing
132string1621 unique values
208 missing
133string417 unique values
1 missing
134string2060 unique values
1 missing
135string573 unique values
13 missing
136string2149 unique values
13 missing
137string909 unique values
3 missing
138string2199 unique values
3 missing
139string1454 unique values
3 missing
140string2211 unique values
3 missing
141string648 unique values
3 missing
142string2172 unique values
3 missing
143string178 unique values
91 missing
144string1577 unique values
91 missing
145string1973 unique values
221 missing
146string2113 unique values
97 missing

19 properties

2215
Number of instances (rows) of the dataset.
147
Number of attributes (columns) of the dataset.
Number of distinct values of the target attribute (if it is nominal).
44592
Number of missing values in the dataset.
2104
Number of instances with at least one value missing.
104
Number of numeric attributes.
0
Number of nominal attributes.
0.07
Number of attributes divided by the number of instances.
70.75
Percentage of numeric attributes.
Percentage of instances belonging to the most frequent class.
0
Percentage of nominal attributes.
Number of instances belonging to the most frequent class.
Percentage of instances belonging to the least frequent class.
Number of instances belonging to the least frequent class.
0
Number of binary attributes.
0
Percentage of binary attributes.
94.99
Percentage of instances having missing values.
Average class difference between consecutive instances.
13.7
Percentage of missing values.

0 tasks

Define a new task