Data
christine_seed_0_nrows_2000_nclasses_10_ncols_100_stratify_True

christine_seed_0_nrows_2000_nclasses_10_ncols_100_stratify_True

active ARFF Publicly available Visibility: public Uploaded 17-11-2022 by Eddie Bergman
0 likes downloaded by 0 people , 0 total downloads 0 issues 0 downvotes
Issue #Downvotes for this reason By


Loading wiki
Help us complete this description Edit
Subsampling of the dataset christine (41142) with seed=0 args.nrows=2000 args.ncols=100 args.nclasses=10 args.no_stratify=True Generated with the following source code: ```python def subsample( self, seed: int, nrows_max: int = 2_000, ncols_max: int = 100, nclasses_max: int = 10, stratified: bool = True, ) -> Dataset: rng = np.random.default_rng(seed) x = self.x y = self.y # Uniformly sample classes = y.unique() if len(classes) > nclasses_max: vcs = y.value_counts() selected_classes = rng.choice( classes, size=nclasses_max, replace=False, p=vcs / sum(vcs), ) # Select the indices where one of these classes is present idxs = y.index[y.isin(classes)] x = x.iloc[idxs] y = y.iloc[idxs] # Uniformly sample columns if required if len(x.columns) > ncols_max: columns_idxs = rng.choice( list(range(len(x.columns))), size=ncols_max, replace=False ) sorted_column_idxs = sorted(columns_idxs) selected_columns = list(x.columns[sorted_column_idxs]) x = x[selected_columns] else: sorted_column_idxs = list(range(len(x.columns))) if len(x) > nrows_max: # Stratify accordingly target_name = y.name data = pd.concat((x, y), axis="columns") _, subset = train_test_split( data, test_size=nrows_max, stratify=data[target_name], shuffle=True, random_state=seed, ) x = subset.drop(target_name, axis="columns") y = subset[target_name] # We need to convert categorical columns to string for openml categorical_mask = [self.categorical_mask[i] for i in sorted_column_idxs] columns = list(x.columns) return Dataset( # Technically this is not the same but it's where it was derived from dataset=self.dataset, x=x, y=y, categorical_mask=categorical_mask, columns=columns, ) ```

101 features

class (target)nominal2 unique values
0 missing
V5numeric447 unique values
0 missing
V9numeric269 unique values
0 missing
V14numeric407 unique values
0 missing
V26numeric498 unique values
0 missing
V35numeric433 unique values
0 missing
V45numeric499 unique values
0 missing
V53numeric440 unique values
0 missing
V64numeric468 unique values
0 missing
V80numeric210 unique values
0 missing
V117numeric477 unique values
0 missing
V118numeric538 unique values
0 missing
V127numeric477 unique values
0 missing
V138numeric118 unique values
0 missing
V141numeric430 unique values
0 missing
V197numeric314 unique values
0 missing
V217numeric407 unique values
0 missing
V271numeric100 unique values
0 missing
V276numeric380 unique values
0 missing
V371numeric87 unique values
0 missing
V408numeric241 unique values
0 missing
V413numeric224 unique values
0 missing
V416numeric165 unique values
0 missing
V430numeric266 unique values
0 missing
V432numeric337 unique values
0 missing
V472numeric397 unique values
0 missing
V475numeric272 unique values
0 missing
V499numeric23 unique values
0 missing
V522numeric35 unique values
0 missing
V534numeric308 unique values
0 missing
V549numeric116 unique values
0 missing
V579numeric564 unique values
0 missing
V604numeric232 unique values
0 missing
V605numeric347 unique values
0 missing
V610numeric387 unique values
0 missing
V616numeric12 unique values
0 missing
V617numeric143 unique values
0 missing
V623numeric31 unique values
0 missing
V636numeric79 unique values
0 missing
V637numeric358 unique values
0 missing
V667numeric53 unique values
0 missing
V681numeric413 unique values
0 missing
V733numeric464 unique values
0 missing
V758numeric373 unique values
0 missing
V781numeric21 unique values
0 missing
V783numeric397 unique values
0 missing
V787numeric495 unique values
0 missing
V819numeric34 unique values
0 missing
V833nominal1 unique values
0 missing
V844numeric290 unique values
0 missing
V845numeric305 unique values
0 missing
V852numeric93 unique values
0 missing
V859numeric384 unique values
0 missing
V867numeric385 unique values
0 missing
V871numeric549 unique values
0 missing
V925numeric241 unique values
0 missing
V928numeric246 unique values
0 missing
V941numeric71 unique values
0 missing
V964numeric394 unique values
0 missing
V977numeric416 unique values
0 missing
V980numeric364 unique values
0 missing
V982numeric565 unique values
0 missing
V1005nominal1 unique values
0 missing
V1018numeric546 unique values
0 missing
V1026numeric346 unique values
0 missing
V1039numeric459 unique values
0 missing
V1046numeric357 unique values
0 missing
V1061numeric54 unique values
0 missing
V1088numeric389 unique values
0 missing
V1093numeric243 unique values
0 missing
V1101numeric290 unique values
0 missing
V1126numeric420 unique values
0 missing
V1133numeric444 unique values
0 missing
V1143numeric133 unique values
0 missing
V1158numeric289 unique values
0 missing
V1159numeric444 unique values
0 missing
V1165numeric484 unique values
0 missing
V1167numeric521 unique values
0 missing
V1198numeric542 unique values
0 missing
V1213numeric279 unique values
0 missing
V1237numeric454 unique values
0 missing
V1258numeric347 unique values
0 missing
V1272numeric499 unique values
0 missing
V1281numeric87 unique values
0 missing
V1308numeric230 unique values
0 missing
V1327numeric391 unique values
0 missing
V1340numeric348 unique values
0 missing
V1342numeric223 unique values
0 missing
V1356nominal1 unique values
0 missing
V1357numeric354 unique values
0 missing
V1363numeric457 unique values
0 missing
V1402numeric443 unique values
0 missing
V1413numeric398 unique values
0 missing
V1434numeric118 unique values
0 missing
V1450numeric155 unique values
0 missing
V1455numeric265 unique values
0 missing
V1506numeric513 unique values
0 missing
V1508numeric90 unique values
0 missing
V1516numeric379 unique values
0 missing
V1562numeric265 unique values
0 missing
V1586nominal1 unique values
0 missing

19 properties

2000
Number of instances (rows) of the dataset.
101
Number of attributes (columns) of the dataset.
2
Number of distinct values of the target attribute (if it is nominal).
0
Number of missing values in the dataset.
0
Number of instances with at least one value missing.
96
Number of numeric attributes.
5
Number of nominal attributes.
0.99
Percentage of binary attributes.
0
Percentage of instances having missing values.
0
Percentage of missing values.
0.5
Average class difference between consecutive instances.
95.05
Percentage of numeric attributes.
0.05
Number of attributes divided by the number of instances.
4.95
Percentage of nominal attributes.
50
Percentage of instances belonging to the most frequent class.
1000
Number of instances belonging to the most frequent class.
50
Percentage of instances belonging to the least frequent class.
1000
Number of instances belonging to the least frequent class.
1
Number of binary attributes.

0 tasks

Define a new task