OpenML

JavaScript is required to properly view the contents of this page!

Explore
- Data
- Task
- Flow
- Run
- Study
- Task type
- Measure
- People
Help
Blog
Contact
Please cite us

us_crime

active ARFF Publicly available Visibility: public Uploaded 25-08-2014 by Tobias Kuehn
0 likes downloaded by 0 people , 0 total downloads 0 issues 0 downvotes

Issue	#Downvotes for this reason	By

Loading wiki

Help us complete this description Edit

Author: Source: Unknown - 2009 Please cite: Title: Communities and Crime Abstract: Communities within the United States. The data combines socio-economic data from the 1990 US Census, law enforcement data from the 1990 US LEMAS survey, and crime data from the 1995 FBI UCR. Data Set Characteristics: Multivariate Attribute Characteristics: Real Associated Tasks: Regression Number of Instances: 1994 Number of Attributes: 128 Missing Values? Yes Area: Social Date Donated: 2009-07-13 Source: Creator: Michael Redmond (redmond 'at' lasalle.edu); Computer Science; La Salle University; Philadelphia, PA, 19141, USA -- culled from 1990 US Census, 1995 US FBI Uniform Crime Report, 1990 US Law Enforcement Management and Administrative Statistics Survey, available from ICPSR at U of Michigan. -- Donor: Michael Redmond (redmond 'at' lasalle.edu); Computer Science; La Salle University; Philadelphia, PA, 19141, USA -- Date: July 2009 Data Set Information: Many variables are included so that algorithms that select or learn weights for attributes could be tested. However, clearly unrelated attributes were not included; attributes were picked if there was any plausible connection to crime (N=122), plus the attribute to be predicted (Per Capita Violent Crimes). The variables included in the dataset involve the community, such as the percent of the population considered urban, and the median family income, and involving law enforcement, such as per capita number of police officers, and percent of officers assigned to drug units. The per capita violent crimes variable was calculated using population and the sum of crime variables considered violent crimes in the United States: murder, rape, robbery, and assault. There was apparently some controversy in some states concerning the counting of rapes. These resulted in missing values for rape, which resulted in incorrect values for per capita violent crime. These cities are not included in the dataset. Many of these omitted communities were from the midwestern USA. Data is described below based on original values. All numeric data was normalized into the decimal range 0.00-1.00 using an Unsupervised, equal-interval binning method. Attributes retain their distribution and skew (hence for example the population attribute has a mean value of 0.06 because most communities are small). E.g. An attribute described as 'mean people per household' is actually the normalized (0-1) version of that value. The normalization preserves rough ratios of values WITHIN an attribute (e.g. double the value for double the population within the available precision - except for extreme values (all values more than 3 SD above the mean are normalized to 1.00; all values more than 3 SD below the mean are nromalized to 0.00)). However, the normalization does not preserve relationships between values BETWEEN attributes (e.g. it would not be meaningful to compare the value for whitePerCap with the value for blackPerCap for a community) A limitation was that the LEMAS survey was of the police departments with at least 100 officers, plus a random sample of smaller departments. For our purposes, communities not found in both census and crime datasets were omitted. Many communities are missing LEMAS data.

128 features

ViolentCrimesPerPop (target)	numeric	98 unique values 0 missing
state	numeric	46 unique values 0 missing
county	numeric	108 unique values 1174 missing
community	numeric	799 unique values 1177 missing
communityname	string	1828 unique values 0 missing
fold	numeric	10 unique values 0 missing
population	numeric	66 unique values 0 missing
householdsize	numeric	93 unique values 0 missing
racepctblack	numeric	100 unique values 0 missing
racePctWhite	numeric	99 unique values 0 missing
racePctAsian	numeric	91 unique values 0 missing
racePctHisp	numeric	91 unique values 0 missing
agePct12t21	numeric	93 unique values 0 missing
agePct12t29	numeric	89 unique values 0 missing
agePct16t24	numeric	94 unique values 0 missing
agePct65up	numeric	98 unique values 0 missing
numbUrban	numeric	67 unique values 0 missing
pctUrban	numeric	64 unique values 0 missing
medIncome	numeric	99 unique values 0 missing
pctWWage	numeric	96 unique values 0 missing
pctWFarmSelf	numeric	99 unique values 0 missing
pctWInvInc	numeric	96 unique values 0 missing
pctWSocSec	numeric	96 unique values 0 missing
pctWPubAsst	numeric	101 unique values 0 missing
pctWRetire	numeric	93 unique values 0 missing
medFamInc	numeric	98 unique values 0 missing
perCapInc	numeric	98 unique values 0 missing
whitePerCap	numeric	101 unique values 0 missing
blackPerCap	numeric	91 unique values 0 missing
indianPerCap	numeric	86 unique values 0 missing
AsianPerCap	numeric	98 unique values 0 missing
OtherPerCap	numeric	97 unique values 1 missing
HispPerCap	numeric	94 unique values 0 missing
NumUnderPov	numeric	66 unique values 0 missing
PctPopUnderPov	numeric	100 unique values 0 missing
PctLess9thGrade	numeric	97 unique values 0 missing
PctNotHSGrad	numeric	99 unique values 0 missing
PctBSorMore	numeric	96 unique values 0 missing
PctUnemployed	numeric	98 unique values 0 missing
PctEmploy	numeric	96 unique values 0 missing
PctEmplManu	numeric	100 unique values 0 missing
PctEmplProfServ	numeric	96 unique values 0 missing
PctOccupManu	numeric	98 unique values 0 missing
PctOccupMgmtProf	numeric	99 unique values 0 missing
MalePctDivorce	numeric	98 unique values 0 missing
MalePctNevMarr	numeric	96 unique values 0 missing
FemalePctDiv	numeric	91 unique values 0 missing
TotalPctDiv	numeric	94 unique values 0 missing
PersPerFam	numeric	92 unique values 0 missing
PctFam2Par	numeric	101 unique values 0 missing
PctKids2Par	numeric	97 unique values 0 missing
PctYoungKids2Par	numeric	99 unique values 0 missing
PctTeen2Par	numeric	96 unique values 0 missing
PctWorkMomYoungKids	numeric	95 unique values 0 missing
PctWorkMom	numeric	98 unique values 0 missing
NumIlleg	numeric	55 unique values 0 missing
PctIlleg	numeric	97 unique values 0 missing
NumImmig	numeric	47 unique values 0 missing
PctImmigRecent	numeric	99 unique values 0 missing
PctImmigRec5	numeric	100 unique values 0 missing
PctImmigRec8	numeric	97 unique values 0 missing
PctImmigRec10	numeric	97 unique values 0 missing
PctRecentImmig	numeric	95 unique values 0 missing
PctRecImmig5	numeric	97 unique values 0 missing
PctRecImmig8	numeric	98 unique values 0 missing
PctRecImmig10	numeric	100 unique values 0 missing
PctSpeakEnglOnly	numeric	98 unique values 0 missing
PctNotSpeakEnglWell	numeric	94 unique values 0 missing
PctLargHouseFam	numeric	99 unique values 0 missing
PctLargHouseOccup	numeric	96 unique values 0 missing
PersPerOccupHous	numeric	96 unique values 0 missing
PersPerOwnOccHous	numeric	94 unique values 0 missing
PersPerRentOccHous	numeric	98 unique values 0 missing
PctPersOwnOccup	numeric	100 unique values 0 missing
PctPersDenseHous	numeric	94 unique values 0 missing
PctHousLess3BR	numeric	100 unique values 0 missing
MedNumBR	numeric	3 unique values 0 missing
HousVacant	numeric	70 unique values 0 missing
PctHousOccup	numeric	92 unique values 0 missing
PctHousOwnOcc	numeric	99 unique values 0 missing
PctVacantBoarded	numeric	97 unique values 0 missing
PctVacMore6Mos	numeric	98 unique values 0 missing
MedYrHousBuilt	numeric	49 unique values 0 missing
PctHousNoPhone	numeric	99 unique values 0 missing
PctWOFullPlumb	numeric	91 unique values 0 missing
OwnOccLowQuart	numeric	99 unique values 0 missing
OwnOccMedVal	numeric	100 unique values 0 missing
OwnOccHiQuart	numeric	98 unique values 0 missing
RentLowQ	numeric	101 unique values 0 missing
RentMedian	numeric	99 unique values 0 missing
RentHighQ	numeric	99 unique values 0 missing
MedRent	numeric	100 unique values 0 missing
MedRentPctHousInc	numeric	95 unique values 0 missing
MedOwnCostPctInc	numeric	97 unique values 0 missing
MedOwnCostPctIncNoMtg	numeric	70 unique values 0 missing
NumInShelters	numeric	54 unique values 0 missing
NumStreet	numeric	53 unique values 0 missing
PctForeignBorn	numeric	96 unique values 0 missing
PctBornSameState	numeric	99 unique values 0 missing
PctSameHouse85	numeric	99 unique values 0 missing
PctSameCity85	numeric	100 unique values 0 missing
PctSameState85	numeric	97 unique values 0 missing
LemasSwornFT	numeric	38 unique values 1675 missing
LemasSwFTPerPop	numeric	52 unique values 1675 missing
LemasSwFTFieldOps	numeric	34 unique values 1675 missing
LemasSwFTFieldPerPop	numeric	55 unique values 1675 missing
LemasTotalReq	numeric	44 unique values 1675 missing
LemasTotReqPerPop	numeric	59 unique values 1675 missing
PolicReqPerOffic	numeric	75 unique values 1675 missing
PolicPerPop	numeric	52 unique values 1675 missing
RacialMatchCommPol	numeric	76 unique values 1675 missing
PctPolicWhite	numeric	74 unique values 1675 missing
PctPolicBlack	numeric	73 unique values 1675 missing
PctPolicHisp	numeric	54 unique values 1675 missing
PctPolicAsian	numeric	50 unique values 1675 missing
PctPolicMinor	numeric	72 unique values 1675 missing
OfficAssgnDrugUnits	numeric	30 unique values 1675 missing
NumKindsDrugsSeiz	numeric	15 unique values 1675 missing
PolicAveOTWorked	numeric	77 unique values 1675 missing
LandArea	numeric	61 unique values 0 missing
PopDens	numeric	96 unique values 0 missing
PctUsePubTrans	numeric	98 unique values 0 missing
PolicCars	numeric	63 unique values 1675 missing
PolicOperBudg	numeric	38 unique values 1675 missing
LemasPctPolicOnPatr	numeric	72 unique values 1675 missing
LemasGangUnitDeploy	numeric	3 unique values 1675 missing
LemasPctOfficDrugUn	numeric	80 unique values 0 missing
PolicBudgPerPop	numeric	51 unique values 1675 missing

Show first 100 features

107 properties

NumberOfInstances

1994

Number of instances (rows) of the dataset.

NumberOfFeatures

128

Number of attributes (columns) of the dataset.

NumberOfClasses

Number of distinct values of the target attribute (if it is nominal).

NumberOfMissingValues

39202

Number of missing values in the dataset.

NumberOfInstancesWithMissingValues

1871

Number of instances with at least one value missing.

NumberOfNumericFeatures

127

Number of numeric attributes.

NumberOfSymbolicFeatures

Number of nominal attributes.

Quartile2SkewnessOfNumericAtts

1.1

Second quartile (Median) of skewness among attributes of the numeric type.

REPTreeDepth3Kappa

Kappa coefficient achieved by the landmarker weka.classifiers.trees.REPTree -L 3

DecisionStumpKappa

Kappa coefficient achieved by the landmarker weka.classifiers.trees.DecisionStump

MaxMeansOfNumericAtts

46188.34

Maximum of means among attributes of the numeric type.

MinMutualInformation

Minimal mutual information between the nominal attributes and the target attribute.

PercentageOfBinaryFeatures

Percentage of binary attributes.

Quartile2StdDevOfNumericAtts

0.2

Second quartile (Median) of standard deviation of attributes of the numeric type.

RandomTreeDepth1AUC

Area Under the ROC Curve achieved by the landmarker weka.classifiers.trees.RandomTree -depth 1

Dimensionality

0.06

Number of attributes divided by the number of instances.

MaxMutualInformation

Maximum mutual information between the nominal attributes and the target attribute.

MinNominalAttDistinctValues

The minimal number of distinct values among attributes of the nominal type.

PercentageOfInstancesWithMissingValues

93.83

Percentage of instances having missing values.

Quartile3AttributeEntropy

Third quartile of entropy among attributes.

RandomTreeDepth1ErrRate

Error rate achieved by the landmarker weka.classifiers.trees.RandomTree -depth 1

EquivalentNumberOfAtts

Number of attributes needed to optimally describe the class (under the assumption of independence among attributes). Equals ClassEntropy divided by MeanMutualInformation.

MaxNominalAttDistinctValues

The maximum number of distinct values among attributes of the nominal type.

MinSkewnessOfNumericAtts

-5.05

Minimum skewness among attributes of the numeric type.

PercentageOfMissingValues

15.36

Percentage of missing values.

Quartile3KurtosisOfNumericAtts

4.58

Third quartile of kurtosis among attributes of the numeric type.

AutoCorrelation

0.76

Average class difference between consecutive instances.

RandomTreeDepth1Kappa

Kappa coefficient achieved by the landmarker weka.classifiers.trees.RandomTree -depth 1

J48.00001.AUC

Area Under the ROC Curve achieved by the landmarker weka.classifiers.trees.J48 -C .00001

MaxSkewnessOfNumericAtts

7.47

Maximum skewness among attributes of the numeric type.

MinStdDevOfNumericAtts

0.09

Minimum standard deviation of attributes of the numeric type.

PercentageOfNumericFeatures

99.22

Percentage of numeric attributes.

Quartile3MeansOfNumericAtts

0.49

Third quartile of means among attributes of the numeric type.

CfsSubsetEval_DecisionStumpAUC

Area Under the ROC Curve achieved by the landmarker weka.classifiers.trees.DecisionStump -E "weka.attributeSelection.CfsSubsetEval -P 1 -E 1" -S "weka.attributeSelection.BestFirst -D 1 -N 5" -W

RandomTreeDepth2AUC

Area Under the ROC Curve achieved by the landmarker weka.classifiers.trees.RandomTree -depth 2

J48.00001.ErrRate

Error rate achieved by the landmarker weka.classifiers.trees.J48 -C .00001

MaxStdDevOfNumericAtts

25299.73

Maximum standard deviation of attributes of the numeric type.

MinorityClassPercentage

Percentage of instances belonging to the least frequent class.

PercentageOfSymbolicFeatures

Percentage of nominal attributes.

Quartile3MutualInformation

Third quartile of mutual information between the nominal attributes and the target attribute.

CfsSubsetEval_DecisionStumpErrRate

Error rate achieved by the landmarker weka.classifiers.trees.DecisionStump -E "weka.attributeSelection.CfsSubsetEval -P 1 -E 1" -S "weka.attributeSelection.BestFirst -D 1 -N 5" -W

RandomTreeDepth2ErrRate

Error rate achieved by the landmarker weka.classifiers.trees.RandomTree -depth 2

J48.00001.Kappa

Kappa coefficient achieved by the landmarker weka.classifiers.trees.J48 -C .00001

MeanAttributeEntropy

Average entropy of the attributes.

MinorityClassSize

Number of instances belonging to the least frequent class.

Quartile1AttributeEntropy

First quartile of entropy among attributes.

Quartile3SkewnessOfNumericAtts

2.08

Third quartile of skewness among attributes of the numeric type.

CfsSubsetEval_DecisionStumpKappa

Kappa coefficient achieved by the landmarker weka.classifiers.trees.DecisionStump -E "weka.attributeSelection.CfsSubsetEval -P 1 -E 1" -S "weka.attributeSelection.BestFirst -D 1 -N 5" -W

RandomTreeDepth2Kappa

Kappa coefficient achieved by the landmarker weka.classifiers.trees.RandomTree -depth 2

J48.0001.AUC

Area Under the ROC Curve achieved by the landmarker weka.classifiers.trees.J48 -C .0001

MeanKurtosisOfNumericAtts

5.8

Mean kurtosis among attributes of the numeric type.

NaiveBayesAUC

Area Under the ROC Curve achieved by the landmarker weka.classifiers.bayes.NaiveBayes

Quartile1KurtosisOfNumericAtts

0.08

First quartile of kurtosis among attributes of the numeric type.

Quartile3StdDevOfNumericAtts

0.22

Third quartile of standard deviation of attributes of the numeric type.

CfsSubsetEval_NaiveBayesAUC

Area Under the ROC Curve achieved by the landmarker weka.classifiers.bayes.NaiveBayes -E "weka.attributeSelection.CfsSubsetEval -P 1 -E 1" -S "weka.attributeSelection.BestFirst -D 1 -N 5" -W

RandomTreeDepth3AUC

Area Under the ROC Curve achieved by the landmarker weka.classifiers.trees.RandomTree -depth 3

J48.0001.ErrRate

Error rate achieved by the landmarker weka.classifiers.trees.J48 -C .0001

MeanMeansOfNumericAtts

364.76

Mean of means among attributes of the numeric type.

NaiveBayesErrRate

Error rate achieved by the landmarker weka.classifiers.bayes.NaiveBayes

Quartile1MeansOfNumericAtts

0.22

First quartile of means among attributes of the numeric type.

REPTreeDepth1AUC

Area Under the ROC Curve achieved by the landmarker weka.classifiers.trees.REPTree -L 1

CfsSubsetEval_NaiveBayesErrRate

Error rate achieved by the landmarker weka.classifiers.bayes.NaiveBayes -E "weka.attributeSelection.CfsSubsetEval -P 1 -E 1" -S "weka.attributeSelection.BestFirst -D 1 -N 5" -W

RandomTreeDepth3ErrRate

Error rate achieved by the landmarker weka.classifiers.trees.RandomTree -depth 3

J48.0001.Kappa

Kappa coefficient achieved by the landmarker weka.classifiers.trees.J48 -C .0001

MeanMutualInformation

Average mutual information between the nominal attributes and the target attribute.

NaiveBayesKappa

Kappa coefficient achieved by the landmarker weka.classifiers.bayes.NaiveBayes

Quartile1MutualInformation

First quartile of mutual information between the nominal attributes and the target attribute.

REPTreeDepth1ErrRate

Error rate achieved by the landmarker weka.classifiers.trees.REPTree -L 1

CfsSubsetEval_NaiveBayesKappa

Kappa coefficient achieved by the landmarker weka.classifiers.bayes.NaiveBayes -E "weka.attributeSelection.CfsSubsetEval -P 1 -E 1" -S "weka.attributeSelection.BestFirst -D 1 -N 5" -W

RandomTreeDepth3Kappa

Kappa coefficient achieved by the landmarker weka.classifiers.trees.RandomTree -depth 3

J48.001.AUC

Area Under the ROC Curve achieved by the landmarker weka.classifiers.trees.J48 -C .001

MeanNoiseToSignalRatio

An estimate of the amount of irrelevant information in the attributes regarding the class. Equals (MeanAttributeEntropy - MeanMutualInformation) divided by MeanMutualInformation.

NumberOfBinaryFeatures

Number of binary attributes.

Quartile1SkewnessOfNumericAtts

0.06

First quartile of skewness among attributes of the numeric type.

REPTreeDepth1Kappa

Kappa coefficient achieved by the landmarker weka.classifiers.trees.REPTree -L 1

CfsSubsetEval_kNN1NAUC

Area Under the ROC Curve achieved by the landmarker weka.classifiers.lazy.IBk -E "weka.attributeSelection.CfsSubsetEval -P 1 -E 1" -S "weka.attributeSelection.BestFirst -D 1 -N 5" -W

StdvNominalAttDistinctValues

Standard deviation of the number of distinct values among attributes of the nominal type.

J48.001.ErrRate

Error rate achieved by the landmarker weka.classifiers.trees.J48 -C .001

MeanNominalAttDistinctValues

Average number of distinct values among the attributes of the nominal type.

Quartile1StdDevOfNumericAtts

0.17

First quartile of standard deviation of attributes of the numeric type.

REPTreeDepth2AUC

Area Under the ROC Curve achieved by the landmarker weka.classifiers.trees.REPTree -L 2

CfsSubsetEval_kNN1NErrRate

Error rate achieved by the landmarker weka.classifiers.lazy.IBk -E "weka.attributeSelection.CfsSubsetEval -P 1 -E 1" -S "weka.attributeSelection.BestFirst -D 1 -N 5" -W

kNN1NAUC

Area Under the ROC Curve achieved by the landmarker weka.classifiers.lazy.IBk

J48.001.Kappa

Kappa coefficient achieved by the landmarker weka.classifiers.trees.J48 -C .001

MeanSkewnessOfNumericAtts

1.3

Mean skewness among attributes of the numeric type.

Quartile2AttributeEntropy

Second quartile (Median) of entropy among attributes.

REPTreeDepth2ErrRate

Error rate achieved by the landmarker weka.classifiers.trees.REPTree -L 2

CfsSubsetEval_kNN1NKappa

Kappa coefficient achieved by the landmarker weka.classifiers.lazy.IBk -E "weka.attributeSelection.CfsSubsetEval -P 1 -E 1" -S "weka.attributeSelection.BestFirst -D 1 -N 5" -W

kNN1NErrRate

Error rate achieved by the landmarker weka.classifiers.lazy.IBk

MajorityClassPercentage

Percentage of instances belonging to the most frequent class.

MeanStdDevOfNumericAtts

200.55

Mean standard deviation of attributes of the numeric type.

Quartile2KurtosisOfNumericAtts

1.57

Second quartile (Median) of kurtosis among attributes of the numeric type.

REPTreeDepth2Kappa

Kappa coefficient achieved by the landmarker weka.classifiers.trees.REPTree -L 2

ClassEntropy

Entropy of the target attribute values.

kNN1NKappa

Kappa coefficient achieved by the landmarker weka.classifiers.lazy.IBk

MajorityClassSize

Number of instances belonging to the most frequent class.

MinAttributeEntropy

Minimal entropy among attributes.

Quartile2MeansOfNumericAtts

0.36

Second quartile (Median) of means among attributes of the numeric type.

REPTreeDepth3AUC

Area Under the ROC Curve achieved by the landmarker weka.classifiers.trees.REPTree -L 3

DecisionStumpAUC

Area Under the ROC Curve achieved by the landmarker weka.classifiers.trees.DecisionStump

MaxAttributeEntropy

Maximum entropy among attributes.

MinKurtosisOfNumericAtts

-1.45

Minimum kurtosis among attributes of the numeric type.

Quartile2MutualInformation

Second quartile (Median) of mutual information between the nominal attributes and the target attribute.

REPTreeDepth3ErrRate

Error rate achieved by the landmarker weka.classifiers.trees.REPTree -L 3

DecisionStumpErrRate

Error rate achieved by the landmarker weka.classifiers.trees.DecisionStump

MaxKurtosisOfNumericAtts

69.18

Maximum kurtosis among attributes of the numeric type.

MinMeansOfNumericAtts

0.02

Minimum of means among attributes of the numeric type.

Show all 107 properties

13 tasks

Supervised Regression on us_crime

0 runs - estimation_procedure: 10 times 10-fold Crossvalidation - evaluation_measure: mean_absolute_error - target_feature: ViolentCrimesPerPop

Supervised Regression on us_crime

0 runs - estimation_procedure: 10-fold Crossvalidation - evaluation_measure: mean_absolute_error - target_feature: ViolentCrimesPerPop

Clustering on us_crime