Data
eye_movements

eye_movements

active ARFF Publicly available Visibility: public Uploaded 06-10-2014 by Joaquin Vanschoren
1 likes downloaded by 12 people , 14 total downloads 0 issues 0 downvotes
  • Chemistry grouped_data Life Science study_1 study_144 study_41 study_7 time_series study_236 study_293 study_424 study_425 study_426
Issue #Downvotes for this reason By


Loading wiki
Help us complete this description Edit
Author: Source: Unknown - Date unknown Please cite: Jarkko Salojarvi, Kai Puolamaki, Jaana Simola, Lauri Kovanen, Ilpo Kojo, Samuel Kaski. Inferring Relevance from Eye Movements: Feature Extraction. Helsinki University of Technology, Publications in Computer and Information Science, Report A82. 3 March 2005. Data set at http://www.cis.hut.fi/eyechallenge2005/ Competition 1 (preprocessed data) A straight-forward classification task. We provide pre-computed feature vectors for each word in the eye movement trajectory, with class labels. The dataset consist of several assignments. Each assignment consists of a question followed by ten sentences (titles of news articles). One of the sentences is the correct answer to the question (C) and five of the sentences are irrelevant to the question (I). Four of the sentences are relevant to the question (R), but they do not answer it. * Features are in columns, feature vectors in rows. * Each assignment is a time sequence of 22-dimensional feature vectors. * The first column is the line number, second the assignment number and the next 22 columns (3 to 24) are the different features. Columns 25 to 27 contain extra information about the example. The training data set contains the classification label in the 28th column: "0" for irrelevant, "1" for relevant and "2" for the correct answer. * Each example (row) represents a single word. You are asked to return the classification of each read sentence. * The 22 features provided are commonly used in psychological studies on eye movement. All of them are not necessarily relevant in this context. The objective of the Challenge is to predict the classification labels (I, R, C). Please see the technical report for information of eye movements, experimental setup, baseline methods and references: Jarkko Salojarvi, Kai Puolamaki, Jaana Simola, Lauri Kovanen, Ilpo Kojo, Samuel Kaski. Inferring Relevance from Eye Movements: Feature Extraction. Helsinki University of Technology, Publications in Computer and Information Science, Report A82. 3 March 2005. [PDF] Modified by TunedIT (converted to ARFF format) FEATURES The values in columns marked with an asterisk (*) are same for all occurances of the word. COL NAME DESCRIPTION 1 #line Line number 2 #assg Assignment Number 3 fixcount Number of fixations to the word 4* firstPassCnt Number of fixations to the word when it is first encountered 5* P1stFixation '1' if fixation occured when the sentence the word was in was encountered the first time 6* P2stFixation '1' if fixation occured when the sentence the word was in was encountered the second time 7* prevFixDur Duration of previous fixation 8* firstfixDur Duration of the first fixation when the word is first encountered 9* firstPassFixDur Sum of durations of fixations when the word is first encountered 10* nextFixDur Duration of the next fixation when gaze initially moves from the word 11 firstSaccLen Length of the first saccade 12 lastSaccLen Distance between fixation on the word and the next fixation 13 prevFixPos Distance between the first fixation preceding the word and the beginning ot the word 14 landingPos Distance between the first fixation on the word and the beginning of the word 15 leavingPos Distance between the last fixation on the word and the beginning of the word 16 totalFixDur Sum of all durations of fixations to the word 17 meanFixDur Mean duration of the fixations to the word 18* nRegressFrom Number of regressions leaving from the word 19* regressLen Sum of durations of regressions initiating from this word 20* nextWordRegress '1' if a regression initiated from the following word 21* regressDur Sum of durations of the fixations on the word during regression 22 pupilDiamMax Maximum pupil diameter 23 pupilDiamLag Maximum pupil diameter 0.5 - 1.5 seconds after the beginning of fixation 24 timePrtctg First fixation duration divided by the total number of fixations 25 nWordsInTitle Number of word in the sentence (title) this word is in 26 titleNo Title number 27 wordNo Word number (ordinal) in this title 28 label Classification for training data ('0'=irrelevant, '1'=relevant, '2'=correct)

28 features

label (target)nominal3 unique values
0 missing
lineNonumeric10936 unique values
0 missing
assgNonumeric336 unique values
0 missing
fixcountnumeric8 unique values
0 missing
firstPassCntnumeric7 unique values
0 missing
P1stFixationnominal2 unique values
0 missing
P2stFixationnominal2 unique values
0 missing
prevFixDurnumeric61 unique values
0 missing
firstfixDurnumeric63 unique values
0 missing
firstPassFixDurnumeric111 unique values
0 missing
nextFixDurnumeric68 unique values
0 missing
firstSaccLennumeric9548 unique values
0 missing
lastSaccLennumeric9350 unique values
0 missing
prevFixPosnumeric7866 unique values
0 missing
landingPosnumeric6847 unique values
0 missing
leavingPosnumeric6900 unique values
0 missing
totalFixDurnumeric149 unique values
0 missing
meanFixDurnumeric254 unique values
0 missing
nRegressFromnumeric6 unique values
0 missing
regressLennumeric572 unique values
0 missing
nextWordRegressnominal2 unique values
0 missing
regressDurnumeric381 unique values
0 missing
pupilDiamMaxnumeric3810 unique values
0 missing
pupilDiamLagnumeric2517 unique values
0 missing
timePrtctgnumeric1065 unique values
0 missing
nWordsInTitlenumeric9 unique values
0 missing
titleNonumeric10 unique values
0 missing
wordNonumeric10 unique values
0 missing

107 properties

10936
Number of instances (rows) of the dataset.
28
Number of attributes (columns) of the dataset.
3
Number of distinct values of the target attribute (if it is nominal).
0
Number of missing values in the dataset.
0
Number of instances with at least one value missing.
24
Number of numeric attributes.
4
Number of nominal attributes.
0
Minimal mutual information between the nominal attributes and the target attribute.
2.53
Second quartile (Median) of skewness among attributes of the numeric type.
0.3
Kappa coefficient achieved by the landmarker weka.classifiers.trees.REPTree -L 3
0.15
Kappa coefficient achieved by the landmarker weka.classifiers.trees.DecisionStump
5468.5
Maximum of means among attributes of the numeric type.
2
The minimal number of distinct values among attributes of the nominal type.
10.71
Percentage of binary attributes.
81.35
Second quartile (Median) of standard deviation of attributes of the numeric type.
0.63
Area Under the ROC Curve achieved by the landmarker weka.classifiers.trees.RandomTree -depth 1
0
Number of attributes divided by the number of instances.
0.04
Maximum mutual information between the nominal attributes and the target attribute.
0
Minimum skewness among attributes of the numeric type.
0
Percentage of instances having missing values.
0.97
Third quartile of entropy among attributes.
0.48
Error rate achieved by the landmarker weka.classifiers.trees.RandomTree -depth 1
80.08
Number of attributes needed to optimally describe the class (under the assumption of independence among attributes). Equals ClassEntropy divided by MeanMutualInformation.
3
The maximum number of distinct values among attributes of the nominal type.
0.03
Minimum standard deviation of attributes of the numeric type.
0
Percentage of missing values.
24.16
Third quartile of kurtosis among attributes of the numeric type.
0.74
Average class difference between consecutive instances.
0.27
Kappa coefficient achieved by the landmarker weka.classifiers.trees.RandomTree -depth 1
0.7
Area Under the ROC Curve achieved by the landmarker weka.classifiers.trees.J48 -C .00001
9.11
Maximum skewness among attributes of the numeric type.
26.24
Percentage of instances belonging to the least frequent class.
85.71
Percentage of numeric attributes.
215.52
Third quartile of means among attributes of the numeric type.
0.77
Area Under the ROC Curve achieved by the landmarker weka.classifiers.trees.DecisionStump -E "weka.attributeSelection.CfsSubsetEval -P 1 -E 1" -S "weka.attributeSelection.BestFirst -D 1 -N 5" -W
0.63
Area Under the ROC Curve achieved by the landmarker weka.classifiers.trees.RandomTree -depth 2
0.41
Error rate achieved by the landmarker weka.classifiers.trees.J48 -C .00001
3157.1
Maximum standard deviation of attributes of the numeric type.
2870
Number of instances belonging to the least frequent class.
14.29
Percentage of nominal attributes.
0.04
Third quartile of mutual information between the nominal attributes and the target attribute.
0.37
Error rate achieved by the landmarker weka.classifiers.trees.DecisionStump -E "weka.attributeSelection.CfsSubsetEval -P 1 -E 1" -S "weka.attributeSelection.BestFirst -D 1 -N 5" -W
0.48
Error rate achieved by the landmarker weka.classifiers.trees.RandomTree -depth 2
0.38
Kappa coefficient achieved by the landmarker weka.classifiers.trees.J48 -C .00001
0.81
Average entropy of the attributes.
0.61
Area Under the ROC Curve achieved by the landmarker weka.classifiers.bayes.NaiveBayes
0.71
First quartile of entropy among attributes.
3.72
Third quartile of skewness among attributes of the numeric type.
0.44
Kappa coefficient achieved by the landmarker weka.classifiers.trees.DecisionStump -E "weka.attributeSelection.CfsSubsetEval -P 1 -E 1" -S "weka.attributeSelection.BestFirst -D 1 -N 5" -W
0.27
Kappa coefficient achieved by the landmarker weka.classifiers.trees.RandomTree -depth 2
0.7
Area Under the ROC Curve achieved by the landmarker weka.classifiers.trees.J48 -C .0001
23.61
Mean kurtosis among attributes of the numeric type.
349.45
Mean of means among attributes of the numeric type.
0.58
Error rate achieved by the landmarker weka.classifiers.bayes.NaiveBayes
3.03
First quartile of kurtosis among attributes of the numeric type.
179.45
Third quartile of standard deviation of attributes of the numeric type.
0.77
Area Under the ROC Curve achieved by the landmarker weka.classifiers.bayes.NaiveBayes -E "weka.attributeSelection.CfsSubsetEval -P 1 -E 1" -S "weka.attributeSelection.BestFirst -D 1 -N 5" -W
0.63
Area Under the ROC Curve achieved by the landmarker weka.classifiers.trees.RandomTree -depth 3
0.41
Error rate achieved by the landmarker weka.classifiers.trees.J48 -C .0001
0.02
Average mutual information between the nominal attributes and the target attribute.
0.11
Kappa coefficient achieved by the landmarker weka.classifiers.bayes.NaiveBayes
1.64
First quartile of means among attributes of the numeric type.
0.7
Area Under the ROC Curve achieved by the landmarker weka.classifiers.trees.REPTree -L 1
0.37
Error rate achieved by the landmarker weka.classifiers.bayes.NaiveBayes -E "weka.attributeSelection.CfsSubsetEval -P 1 -E 1" -S "weka.attributeSelection.BestFirst -D 1 -N 5" -W
0.48
Error rate achieved by the landmarker weka.classifiers.trees.RandomTree -depth 3
0.38
Kappa coefficient achieved by the landmarker weka.classifiers.trees.J48 -C .0001
40.61
An estimate of the amount of irrelevant information in the attributes regarding the class. Equals (MeanAttributeEntropy - MeanMutualInformation) divided by MeanMutualInformation.
3
Number of binary attributes.
0
First quartile of mutual information between the nominal attributes and the target attribute.
0.46
Error rate achieved by the landmarker weka.classifiers.trees.REPTree -L 1
0.44
Kappa coefficient achieved by the landmarker weka.classifiers.bayes.NaiveBayes -E "weka.attributeSelection.CfsSubsetEval -P 1 -E 1" -S "weka.attributeSelection.BestFirst -D 1 -N 5" -W
0.27
Kappa coefficient achieved by the landmarker weka.classifiers.trees.RandomTree -depth 3
0.7
Area Under the ROC Curve achieved by the landmarker weka.classifiers.trees.J48 -C .001
2.25
Average number of distinct values among the attributes of the nominal type.
1.26
First quartile of skewness among attributes of the numeric type.
0.3
Kappa coefficient achieved by the landmarker weka.classifiers.trees.REPTree -L 1
0.77
Area Under the ROC Curve achieved by the landmarker weka.classifiers.lazy.IBk -E "weka.attributeSelection.CfsSubsetEval -P 1 -E 1" -S "weka.attributeSelection.BestFirst -D 1 -N 5" -W
0.5
Standard deviation of the number of distinct values among attributes of the nominal type.
0.41
Error rate achieved by the landmarker weka.classifiers.trees.J48 -C .001
3.07
Mean skewness among attributes of the numeric type.
0.91
First quartile of standard deviation of attributes of the numeric type.
0.7
Area Under the ROC Curve achieved by the landmarker weka.classifiers.trees.REPTree -L 2
0.37
Error rate achieved by the landmarker weka.classifiers.lazy.IBk -E "weka.attributeSelection.CfsSubsetEval -P 1 -E 1" -S "weka.attributeSelection.BestFirst -D 1 -N 5" -W
0.67
Area Under the ROC Curve achieved by the landmarker weka.classifiers.lazy.IBk
0.38
Kappa coefficient achieved by the landmarker weka.classifiers.trees.J48 -C .001
303.34
Mean standard deviation of attributes of the numeric type.
0.76
Second quartile (Median) of entropy among attributes.
0.46
Error rate achieved by the landmarker weka.classifiers.trees.REPTree -L 2
0.44
Kappa coefficient achieved by the landmarker weka.classifiers.lazy.IBk -E "weka.attributeSelection.CfsSubsetEval -P 1 -E 1" -S "weka.attributeSelection.BestFirst -D 1 -N 5" -W
0.43
Error rate achieved by the landmarker weka.classifiers.lazy.IBk
38.97
Percentage of instances belonging to the most frequent class.
0.71
Minimal entropy among attributes.
13.01
Second quartile (Median) of kurtosis among attributes of the numeric type.
0.3
Kappa coefficient achieved by the landmarker weka.classifiers.trees.REPTree -L 2
1.57
Entropy of the target attribute values.
0.34
Kappa coefficient achieved by the landmarker weka.classifiers.lazy.IBk
4262
Number of instances belonging to the most frequent class.
-1.2
Minimum kurtosis among attributes of the numeric type.
161.65
Second quartile (Median) of means among attributes of the numeric type.
0.7
Area Under the ROC Curve achieved by the landmarker weka.classifiers.trees.REPTree -L 3
0.6
Area Under the ROC Curve achieved by the landmarker weka.classifiers.trees.DecisionStump
0.97
Maximum entropy among attributes.
0.03
Minimum of means among attributes of the numeric type.
0.02
Second quartile (Median) of mutual information between the nominal attributes and the target attribute.
0.46
Error rate achieved by the landmarker weka.classifiers.trees.REPTree -L 3
0.57
Error rate achieved by the landmarker weka.classifiers.trees.DecisionStump
98.9
Maximum kurtosis among attributes of the numeric type.

16 tasks

287 runs - estimation_procedure: 10-fold Crossvalidation - evaluation_measure: predictive_accuracy - target_feature: label
153 runs - estimation_procedure: 10 times 10-fold Crossvalidation - evaluation_measure: predictive_accuracy - target_feature: label
0 runs - estimation_procedure: 33% Holdout set - target_feature: label
0 runs - estimation_procedure: 4-fold Crossvalidation - target_feature: label
0 runs - estimation_procedure: Interleaved Test then Train - target_feature: label
0 runs - estimation_procedure: 50 times Clustering
0 runs - estimation_procedure: 50 times Clustering
0 runs - estimation_procedure: 50 times Clustering
0 runs - estimation_procedure: 50 times Clustering
0 runs - estimation_procedure: 50 times Clustering
0 runs - estimation_procedure: 50 times Clustering
0 runs - estimation_procedure: 50 times Clustering
0 runs - estimation_procedure: 50 times Clustering
0 runs - estimation_procedure: 50 times Clustering
0 runs - estimation_procedure: 50 times Clustering
Define a new task