

active ARFF Publicly available Visibility: public Uploaded 29-09-2014 by Joaquin Vanschoren
0 likes downloaded by 0 people , 0 total downloads 0 issues 0 downvotes
Issue #Downvotes for this reason By

Loading wiki
Help us complete this description Edit
Author: Source: Unknown - Date unknown Please cite: The USNEWS dataset for the ASA Statistical Graphics Section's 1995 Data Analysis Exposition contains information on over 1300 American colleges and universities. The data may be obtained in either of two formats. USNEWS.DATA contains the raw data in comma delimited fields with a single data line for each school. The order of variables is the same as given below for the fixed column version, although the spacing varies for each school. USNEWS3.DATA has the data arranged in fixed columns, with three data lines for each school and a maximum line length of 80 characters. This dataset is taken from the 1995 U.S. News & World Report's Guide to America's Best Colleges. This dataset is protected by copyright, is reproduced with permission of the copyright holder(s), and may not be downloaded or otherwise copied, except solely for the purpose of analysis in connection with the American Statistical Association's 1995 Data Analysis Exposition. The data are reporduced with the permission of the publisher. Most of the data are for the 1993-94 school year. You may wish to consult a copy of the U.S. News source for more detailed descriptions of the variables. KEY FOR USNEWS3.DATA Fixed column format with three data lines per school Line #1 1 - 5 FICE (Federal ID number) 7 - 51 College name 53 - 54 State (postal code) Line #2 1 - 2 Public/private indicator (public=1, private=2) 3 - 6 Average Math SAT score 7 - 10 Average Verbal SAT score 11 - 15 Average Combined SAT score 16 - 18 Average ACT score 19 - 22 First quartile - Math SAT 23 - 26 Third quartile - Math SAT 27 - 30 First quartile - Verbal SAT 31 - 34 Third quartile - Verbal SAT 35 - 37 First quartile - ACT 38 - 40 Third quartile - ACT 41 - 46 Number of applications received 47 - 52 Number of applicants accepted 53 - 57 Number of new students enrolled 58 - 61 Pct. new students from top 10% of H.S. class 62 - 65 Pct. new students from top 25% of H.S. class Line #3 1 - 6 Number of fulltime undergraduates 7 - 12 Number of parttime undergraduates 13 - 18 In-state tuition 19 - 24 Out-of-state tuition 25 - 29 Room and board costs 30 - 34 Room costs 35 - 39 Board costs 40 - 44 Additional fees 45 - 49 Estimated book costs 50 - 54 Estimated personal spending 55 - 58 Pct. of faculty with Ph.D.'s 59 - 62 Pct. of faculty with terminal degree 63 - 67 Student/faculty ratio 68 - 70 Pct.alumni who donate 71 - 76 Instructional expenditure per student 77 - 80 Graduation rate Missing values are denoted with * To obtain the dataset from Statlib, send one of the single line messages below to the address send from colleges or send from colleges For more information on the ASA Statistical Graphics Section's 1995 Data Analysis Exposition send the message send readme from colleges %%%%%%%%%%%%%% INFORMATION % %%%%%%%%%%%%%% WHAT'S WHAT AMONG AMERICAN COLLEGES AND UNIVERSITIES? This is the subject of the 1995 Data Analysis Exposition sponsored by the Statistical Graphics Section of the American Statistical Association. The purpose of the Exposition is to encourage statisticians to demonstrate techniques, especially graphical, for analyzing data and displaying the results of an analysis. Individuals and groups will work with the same set of data and present their analyses at a special session as part of the annual Joint Statistical Meetings in Orlando, Florida on August 13th-17th, 1995. The datasets for 1995 are drawn from two sources, U.S. News & World Report's Guide to Americas Best Colleges and the AAUP (American Association of University Professors) 1994 Salary Survey which appeared in the March-April 1994 issue of Academe. The U.S. News data contains information on tuition, room & board costs, SAT or ACT scores, application/acceptance rates, graduation rate, student/faculty ratio, spending per student, and a number of other variables for 1300+ schools. The AAUP data includes average salary, overall compensation, and number of faculty broken down by full, associate, and assistant professor ranks. The raw data and documentation are contained in the files described below. To obtain any of these files send a message to of the following form (substituting the file you want for XXXXX) send XXXXX from colleges Available files usnews.doc Documentation for the U.S. News data U.S. News data in comma delimited format U.S. News data in fixed column format aaup.doc Documentation for the AAUP salary data AAUP salary data in comma delimited format AAUP salary data in fixed column format Two versions of each dataset are provided to accommodate users with different software constraints. The comma delimited versions (USNEWS.DATA and AAUP.DATA) contain information for each college on a separate line with values delimited by commas. The fixed column versions (USNEWS3.DATA and AAUP2.DATA) use 2 or 3 data lines per school and a maximum line length of 80 characters. To participate in the 1995 Data Analysis Exposition you must send an abstract form to the American Statistical Association by February 1st, 1995. Information is available from the ASA Meetings Department by e-mail (, phone (703-684-1221), fax (703-684-2037), or surface mail (ASA, 1429 Duke St., Alexandria, VA 22314). Your initial abstract may be fairly general since you may do the bulk of your analysis after the February 1 deadline. You may choose your own path to proceed in analyzing the data or use some of the suggested questions below to get started. ... How well can we model tuition using the other variables? ... How might we cluster colleges into similar comparison groups? ... How can we best display faculty salary structure? ... Can we find a reasonable way to rank the schools? You may work on your own or put together a team. Show off the capabilities of your favorite software package or use the data for a class project and display your students results. You may choose to consider just a subset of schools or examine regional patterns. The main point is to find innovative ways to display the interesting features of the data. Further questions about the 1995 Exposition can be directed to Robin Lock, Mathematics Department, St. Lawrence University, Canton, NY 13617 e-mail If you would like to be informed about any subsequent adjustments or error fixes to the 1995 Exposition data, please send an e-mail message to register your interest to Special thanks for providing data for the 1995 Exposition to: Robert Morse, Director of Research for America's Best Colleges at U.S. News & World Report Maryse Eymonerie, Consultant to AAUP. Information about the dataset CLASSTYPE: numeric CLASSINDEX: none specific

0 features

Graduation_rate (target)numeric89 unique values
98 missing
FICE (ignore)numeric1302 unique values
0 missing
College_name (ignore)nominal1274 unique values
0 missing
Statenominal51 unique values
0 missing
Public/private_indicatornumeric2 unique values
0 missing
Average_Math_SAT_scorenumeric248 unique values
525 missing
Average_Verbal_SAT_scorenumeric222 unique values
525 missing
Average_Combined_SAT_scorenumeric339 unique values
523 missing
Average_ACT_scorenumeric17 unique values
588 missing
First_quartile-Math_SATnumeric85 unique values
530 missing
Third_quartile-Math_SATnumeric85 unique values
530 missing
First_quartile-Verbal_SATnumeric66 unique values
530 missing
Third_quartile-Verbal_SATnumeric85 unique values
530 missing
First_quartile-ACTnumeric20 unique values
639 missing
Third_quartile-ACTnumeric19 unique values
639 missing
Number_of_applications_receivednumeric1127 unique values
10 missing
Number_of_applicants_acceptednumeric1065 unique values
11 missing
Number_of_new_students_enrollednumeric870 unique values
5 missing
Pct._new_students_from_top_10Perc_of_H.S._classnumeric90 unique values
235 missing
Pct._new_students_from_top_25Perc_of_H.S._classnumeric93 unique values
202 missing
Number_of_fulltime_undergraduatesnumeric1151 unique values
3 missing
Number_of_parttime_undergraduatesnumeric883 unique values
32 missing
In-state_tuitionnumeric948 unique values
30 missing
Out-of-state_tuitionnumeric963 unique values
20 missing
Room_and_board_costsnumeric798 unique values
76 missing
Room_costsnumeric598 unique values
321 missing
Board_costsnumeric465 unique values
498 missing
Additional_feesnumeric433 unique values
274 missing
Estimated_book_costsnumeric164 unique values
48 missing
Estimated_personal_spendingnumeric406 unique values
181 missing
Pct._of_faculty_with_Ph.D.snumeric90 unique values
32 missing
Pct._of_faculty_with_terminal_degreenumeric77 unique values
30 missing
Student/faculty_rationumeric208 unique values
2 missing
Pct.alumni_who_donatenumeric62 unique values
222 missing
Instructional_expenditure_per_studentnumeric1181 unique values
39 missing

0 properties

Data properties are not analyzed yet. Refresh the page in a few minutes.

13 tasks

0 runs - estimation_procedure: 10-fold Crossvalidation - evaluation_measure: mean_absolute_error - target_feature: Graduation_rate
0 runs - estimation_procedure: 10 times 10-fold Crossvalidation - evaluation_measure: mean_absolute_error - target_feature: Graduation_rate
0 runs - estimation_procedure: 50 times Clustering
0 runs - estimation_procedure: 50 times Clustering
0 runs - estimation_procedure: 50 times Clustering
0 runs - estimation_procedure: 50 times Clustering
0 runs - estimation_procedure: 50 times Clustering
0 runs - estimation_procedure: 50 times Clustering
0 runs - estimation_procedure: 50 times Clustering
0 runs - estimation_procedure: 50 times Clustering
0 runs - estimation_procedure: 50 times Clustering
0 runs - estimation_procedure: 50 times Clustering
Define a new task