Data
colleges

colleges

active ARFF NA Visibility: public Uploaded 17-11-2020 by Pieter Gijsbers
0 likes downloaded by 0 people , 0 total downloads 0 issues 0 downvotes
Issue #Downvotes for this reason By


Loading wiki
Help us complete this description Edit
Modified version for the automl benchmark. Regroups information for about 7800 different US colleges. Including geographical information, stats about the population attending and post graduation career earnings.

45 features

percent_pell_grant (target)numeric4502 unique values
0 missing
UNITID (row identifier)numeric7063 unique values
0 missing
school_name (ignore)string6961 unique values
0 missing
citynominal2460 unique values
0 missing
statenominal59 unique values
0 missing
zipnominal6039 unique values
0 missing
school_webpage (ignore)string5710 unique values
56 missing
latitudenumeric6508 unique values
342 missing
longitudenumeric6598 unique values
342 missing
admission_ratenumeric1739 unique values
4847 missing
sat_verbal_midrangenumeric164 unique values
5763 missing
sat_math_midrangenumeric163 unique values
5749 missing
sat_writing_midrangenumeric131 unique values
6270 missing
act_combined_midrangenumeric22 unique values
5722 missing
act_english_midrangenumeric24 unique values
5899 missing
act_math_midrangenumeric24 unique values
5898 missing
act_writing_midrangenumeric8 unique values
6763 missing
sat_total_averagenumeric477 unique values
5644 missing
undergrad_sizenumeric3020 unique values
1 missing
percent_whitenumeric4453 unique values
1 missing
percent_blacknumeric3277 unique values
1 missing
percent_hispanicnumeric2802 unique values
1 missing
percent_asiannumeric1239 unique values
1 missing
percent_part_timenumeric3466 unique values
6 missing
average_cost_academic_yearnumeric3802 unique values
2928 missing
average_cost_program_yearnumeric2349 unique values
4522 missing
tuition_(instate)numeric2981 unique values
2926 missing
tuition_(out_of_state)numeric3039 unique values
2926 missing
spend_per_studentnumeric5294 unique values
11 missing
faculty_salarynumeric3297 unique values
2672 missing
percent_part_time_facultynumeric2332 unique values
3161 missing
completion_ratenumeric1912 unique values
4617 missing
predominant_degreenominal3 unique values
75 missing
highest_degreenominal5 unique values
0 missing
ownershipnominal3 unique values
0 missing
regionnominal10 unique values
0 missing
gendernominal3 unique values
0 missing
carnegie_basic_classificationnominal33 unique values
2986 missing
carnegie_undergraduatenominal13 unique values
3506 missing
carnegie_sizenominal17 unique values
3505 missing
religious_affiliationnominal55 unique values
6260 missing
percent_femalenumeric101 unique values
1510 missing
agege24numeric99 unique values
1510 missing
famincnumeric4777 unique values
1510 missing
mean_earnings_6_yearsnumeric494 unique values
1451 missing
median_earnings_6_yearsnumeric489 unique values
1451 missing
mean_earnings_10_yearsnumeric607 unique values
1736 missing
median_earnings_10_yearsnumeric568 unique values
1736 missing

19 properties

7063
Number of instances (rows) of the dataset.
45
Number of attributes (columns) of the dataset.
0
Number of distinct values of the target attribute (if it is nominal).
104249
Number of missing values in the dataset.
7063
Number of instances with at least one value missing.
33
Number of numeric attributes.
12
Number of nominal attributes.
0.01
Number of attributes divided by the number of instances.
73.33
Percentage of numeric attributes.
Percentage of instances belonging to the most frequent class.
26.67
Percentage of nominal attributes.
Number of instances belonging to the most frequent class.
Percentage of instances belonging to the least frequent class.
Number of instances belonging to the least frequent class.
0
Number of binary attributes.
0
Percentage of binary attributes.
100
Percentage of instances having missing values.
0.79
Average class difference between consecutive instances.
32.8
Percentage of missing values.

1 tasks

3 runs - estimation_procedure: 10-fold Crossvalidation - evaluation_measure: root_mean_squared_error - target_feature: percent_pell_grant
Define a new task