OpenML

JavaScript is required to properly view the contents of this page!

Explore
- Data
- Task
- Flow
- Run
- Study
- Task type
- Measure
- People
Help
Blog
Contact
Please cite us

Titanic

active ARFF Publicly available Visibility: public Uploaded 16-10-2017 by Joaquin Vanschoren
0 likes downloaded by 0 people , 0 total downloads 0 issues 0 downvotes

Issue	#Downvotes for this reason	By

Loading wiki

Help us complete this description Edit

Author: Frank E. Harrell Jr., Thomas Cason Source: [Vanderbilt Biostatistics](http://biostat.mc.vanderbilt.edu/wiki/pub/Main/DataSets/titanic.html) Please cite: The original Titanic dataset, describing the survival status of individual passengers on the Titanic. The titanic data does not contain information from the crew, but it does contain actual ages of half of the passengers. The principal source for data about Titanic passengers is the Encyclopedia Titanica. The datasets used here were begun by a variety of researchers. One of the original sources is Eaton & Haas (1994) Titanic: Triumph and Tragedy, Patrick Stephens Ltd, which includes a passenger list created by many researchers and edited by Michael A. Findlay. Thomas Cason of UVa has greatly updated and improved the titanic data frame using the Encyclopedia Titanica and created the dataset here. Some duplicate passengers have been dropped, many errors corrected, many missing ages filled in, and new variables created. For more information about how this dataset was constructed: http://biostat.mc.vanderbilt.edu/wiki/pub/Main/DataSets/titanic3info.txt ### Attribute information The variables on our extracted dataset are pclass, survived, name, age, embarked, home.dest, room, ticket, boat, and sex. pclass refers to passenger class (1st, 2nd, 3rd), and is a proxy for socio-economic class. Age is in years, and some infants had fractional values. The titanic2 data frame has no missing data and includes records for the crew, but age is dichotomized at adult vs. child. These data were obtained from Robert Dawson, Saint Mary's University, E-mail. The variables are pclass, age, sex, survived. These data frames are useful for demonstrating many of the functions in Hmisc as well as demonstrating binary logistic regression analysis using the Design library. For more details and references see Simonoff, Jeffrey S (1997): The "unusual episode" and a second statistics course. J Statistics Education, Vol. 5 No. 1.

14 features

survived (target)	nominal	2 unique values 0 missing
pclass	numeric	3 unique values 0 missing
name	string	1307 unique values 0 missing
sex	nominal	2 unique values 0 missing
age	numeric	98 unique values 263 missing
sibsp	numeric	7 unique values 0 missing
parch	numeric	8 unique values 0 missing
ticket	string	929 unique values 0 missing
fare	numeric	281 unique values 1 missing
cabin	string	186 unique values 1014 missing
embarked	nominal	3 unique values 2 missing
boat	string	27 unique values 823 missing
body	numeric	121 unique values 1188 missing
home.dest	string	369 unique values 564 missing

Show all 14 features

62 properties

NumberOfInstances

1309

Number of instances (rows) of the dataset.

NumberOfFeatures

Number of attributes (columns) of the dataset.

NumberOfClasses

Number of distinct values of the target attribute (if it is nominal).

NumberOfMissingValues

3855

Number of missing values in the dataset.

NumberOfInstancesWithMissingValues

1309

Number of instances with at least one value missing.

NumberOfNumericFeatures

Number of numeric attributes.

NumberOfSymbolicFeatures

Number of nominal attributes.

MeanKurtosisOfNumericAtts

11.03

Mean kurtosis among attributes of the numeric type.

NumberOfBinaryFeatures

Number of binary attributes.

Quartile1MutualInformation

0.02

First quartile of mutual information between the nominal attributes and the target attribute.

MeanMeansOfNumericAtts

37.86

Mean of means among attributes of the numeric type.

Quartile1SkewnessOfNumericAtts

-0.08

First quartile of skewness among attributes of the numeric type.

MeanMutualInformation

0.12

Average mutual information between the nominal attributes and the target attribute.

Quartile1StdDevOfNumericAtts

0.86

First quartile of standard deviation of attributes of the numeric type.

AutoCorrelation

0.61

Average class difference between consecutive instances.

MeanNoiseToSignalRatio

8.09

An estimate of the amount of irrelevant information in the attributes regarding the class. Equals (MeanAttributeEntropy - MeanMutualInformation) divided by MeanMutualInformation.

Quartile2AttributeEntropy

1.05

Second quartile (Median) of entropy among attributes.

ClassEntropy

0.96

Entropy of the target attribute values.

MeanNominalAttDistinctValues

2.33

Average number of distinct values among the attributes of the nominal type.

Quartile2KurtosisOfNumericAtts

10.1

Second quartile (Median) of kurtosis among attributes of the numeric type.

Dimensionality

0.01

Number of attributes divided by the number of instances.

MeanSkewnessOfNumericAtts

1.96

Mean skewness among attributes of the numeric type.

Quartile2MeansOfNumericAtts

16.09

Second quartile (Median) of means among attributes of the numeric type.

EquivalentNumberOfAtts

8.34

Number of attributes needed to optimally describe the class (under the assumption of independence among attributes). Equals ClassEntropy divided by MeanMutualInformation.

MeanStdDevOfNumericAtts

27.77

Mean standard deviation of attributes of the numeric type.

Quartile2MutualInformation

0.12

Second quartile (Median) of mutual information between the nominal attributes and the target attribute.

MajorityClassPercentage

61.8

Percentage of instances belonging to the most frequent class.

MinAttributeEntropy

0.94

Minimal entropy among attributes.

Quartile2SkewnessOfNumericAtts

2.04

Second quartile (Median) of skewness among attributes of the numeric type.

MajorityClassSize

809

Number of instances belonging to the most frequent class.

MaxAttributeEntropy

1.15

Maximum entropy among attributes.

MinKurtosisOfNumericAtts

-1.32

Minimum kurtosis among attributes of the numeric type.

PercentageOfBinaryFeatures

14.29

Percentage of binary attributes.

Quartile2StdDevOfNumericAtts

7.73

Second quartile (Median) of standard deviation of attributes of the numeric type.

MaxKurtosisOfNumericAtts

27.03

Maximum kurtosis among attributes of the numeric type.

MinMeansOfNumericAtts

0.39

Minimum of means among attributes of the numeric type.

PercentageOfInstancesWithMissingValues

100

Percentage of instances having missing values.

Quartile3AttributeEntropy

1.15

Third quartile of entropy among attributes.

MaxMeansOfNumericAtts

160.81

Maximum of means among attributes of the numeric type.

MinMutualInformation

0.02

Minimal mutual information between the nominal attributes and the target attribute.

PercentageOfMissingValues

21.04

Percentage of missing values.

Quartile3KurtosisOfNumericAtts

22.91

Third quartile of kurtosis among attributes of the numeric type.

MaxMutualInformation

0.21

Maximum mutual information between the nominal attributes and the target attribute.

MinNominalAttDistinctValues

The minimal number of distinct values among attributes of the nominal type.

PercentageOfNumericFeatures

42.86

Percentage of numeric attributes.

Quartile3MeansOfNumericAtts

65.17

Third quartile of means among attributes of the numeric type.

MaxNominalAttDistinctValues

The maximum number of distinct values among attributes of the nominal type.

MinSkewnessOfNumericAtts

-0.6

Minimum skewness among attributes of the numeric type.

PercentageOfSymbolicFeatures

21.43

Percentage of nominal attributes.

Quartile3MutualInformation

0.21

Third quartile of mutual information between the nominal attributes and the target attribute.

MaxSkewnessOfNumericAtts

4.37

Maximum skewness among attributes of the numeric type.

MinStdDevOfNumericAtts

0.84

Minimum standard deviation of attributes of the numeric type.

Quartile1AttributeEntropy

0.94

First quartile of entropy among attributes.

Quartile3SkewnessOfNumericAtts

3.98

Third quartile of skewness among attributes of the numeric type.

MaxStdDevOfNumericAtts

97.7

Maximum standard deviation of attributes of the numeric type.

MinorityClassPercentage

38.2

Percentage of instances belonging to the least frequent class.

Quartile1KurtosisOfNumericAtts

-1.27

First quartile of kurtosis among attributes of the numeric type.

Quartile3StdDevOfNumericAtts

63.24

Third quartile of standard deviation of attributes of the numeric type.

MeanAttributeEntropy

1.05

Average entropy of the attributes.

MinorityClassSize

500

Number of instances belonging to the least frequent class.

Quartile1MeansOfNumericAtts

0.47

First quartile of means among attributes of the numeric type.

StdvNominalAttDistinctValues

0.58

Standard deviation of the number of distinct values among attributes of the nominal type.

Show all 62 properties