Data
fabert_seed_3_nrows_2000_nclasses_10_ncols_100_stratify_True

fabert_seed_3_nrows_2000_nclasses_10_ncols_100_stratify_True

active ARFF Publicly available Visibility: public Uploaded 17-11-2022 by Eddie Bergman
0 likes downloaded by 0 people , 0 total downloads 0 issues 0 downvotes
Issue #Downvotes for this reason By


Loading wiki
Help us complete this description Edit
Subsampling of the dataset fabert (41164) with seed=3 args.nrows=2000 args.ncols=100 args.nclasses=10 args.no_stratify=True Generated with the following source code: ```python def subsample( self, seed: int, nrows_max: int = 2_000, ncols_max: int = 100, nclasses_max: int = 10, stratified: bool = True, ) -> Dataset: rng = np.random.default_rng(seed) x = self.x y = self.y # Uniformly sample classes = y.unique() if len(classes) > nclasses_max: vcs = y.value_counts() selected_classes = rng.choice( classes, size=nclasses_max, replace=False, p=vcs / sum(vcs), ) # Select the indices where one of these classes is present idxs = y.index[y.isin(classes)] x = x.iloc[idxs] y = y.iloc[idxs] # Uniformly sample columns if required if len(x.columns) > ncols_max: columns_idxs = rng.choice( list(range(len(x.columns))), size=ncols_max, replace=False ) sorted_column_idxs = sorted(columns_idxs) selected_columns = list(x.columns[sorted_column_idxs]) x = x[selected_columns] else: sorted_column_idxs = list(range(len(x.columns))) if len(x) > nrows_max: # Stratify accordingly target_name = y.name data = pd.concat((x, y), axis="columns") _, subset = train_test_split( data, test_size=nrows_max, stratify=data[target_name], shuffle=True, random_state=seed, ) x = subset.drop(target_name, axis="columns") y = subset[target_name] # We need to convert categorical columns to string for openml categorical_mask = [self.categorical_mask[i] for i in sorted_column_idxs] columns = list(x.columns) return Dataset( # Technically this is not the same but it's where it was derived from dataset=self.dataset, x=x, y=y, categorical_mask=categorical_mask, columns=columns, ) ```

101 features

class (target)nominal7 unique values
0 missing
V2numeric3 unique values
0 missing
V4numeric9 unique values
0 missing
V24numeric4 unique values
0 missing
V28numeric37 unique values
0 missing
V33numeric18 unique values
0 missing
V58numeric78 unique values
0 missing
V61numeric59 unique values
0 missing
V67numeric47 unique values
0 missing
V70numeric53 unique values
0 missing
V82numeric20 unique values
0 missing
V105numeric23 unique values
0 missing
V115numeric42 unique values
0 missing
V126numeric22 unique values
0 missing
V127numeric18 unique values
0 missing
V128numeric8 unique values
0 missing
V144numeric21 unique values
0 missing
V160numeric7 unique values
0 missing
V167numeric2 unique values
0 missing
V171numeric5 unique values
0 missing
V178numeric3 unique values
0 missing
V185numeric1 unique values
0 missing
V190numeric48 unique values
0 missing
V192numeric4 unique values
0 missing
V194numeric7 unique values
0 missing
V209numeric5 unique values
0 missing
V217numeric2 unique values
0 missing
V223numeric26 unique values
0 missing
V228numeric16 unique values
0 missing
V231numeric66 unique values
0 missing
V235numeric3 unique values
0 missing
V236numeric6 unique values
0 missing
V237numeric33 unique values
0 missing
V259numeric4 unique values
0 missing
V283numeric33 unique values
0 missing
V286numeric6 unique values
0 missing
V303numeric20 unique values
0 missing
V305numeric45 unique values
0 missing
V306numeric14 unique values
0 missing
V309numeric5 unique values
0 missing
V313numeric16 unique values
0 missing
V316numeric2 unique values
0 missing
V326numeric18 unique values
0 missing
V339numeric7 unique values
0 missing
V343numeric23 unique values
0 missing
V356numeric2 unique values
0 missing
V360numeric18 unique values
0 missing
V375numeric11 unique values
0 missing
V378numeric6 unique values
0 missing
V396numeric12 unique values
0 missing
V400numeric3 unique values
0 missing
V413numeric37 unique values
0 missing
V428numeric32 unique values
0 missing
V441numeric6 unique values
0 missing
V443numeric7 unique values
0 missing
V466numeric16 unique values
0 missing
V469numeric10 unique values
0 missing
V473numeric18 unique values
0 missing
V478numeric25 unique values
0 missing
V480numeric36 unique values
0 missing
V485numeric18 unique values
0 missing
V487numeric33 unique values
0 missing
V496numeric20 unique values
0 missing
V498numeric29 unique values
0 missing
V506numeric6 unique values
0 missing
V510numeric52 unique values
0 missing
V514numeric16 unique values
0 missing
V516numeric43 unique values
0 missing
V525numeric24 unique values
0 missing
V528numeric8 unique values
0 missing
V537numeric11 unique values
0 missing
V538numeric40 unique values
0 missing
V539numeric7 unique values
0 missing
V554numeric4 unique values
0 missing
V562numeric6 unique values
0 missing
V566numeric12 unique values
0 missing
V569numeric7 unique values
0 missing
V576numeric28 unique values
0 missing
V577numeric8 unique values
0 missing
V585numeric25 unique values
0 missing
V601numeric51 unique values
0 missing
V615numeric6 unique values
0 missing
V642numeric7 unique values
0 missing
V643numeric22 unique values
0 missing
V647numeric19 unique values
0 missing
V649numeric35 unique values
0 missing
V660numeric26 unique values
0 missing
V669numeric38 unique values
0 missing
V679numeric18 unique values
0 missing
V696numeric8 unique values
0 missing
V698numeric18 unique values
0 missing
V700numeric18 unique values
0 missing
V704numeric10 unique values
0 missing
V714numeric10 unique values
0 missing
V716numeric7 unique values
0 missing
V721numeric16 unique values
0 missing
V725numeric21 unique values
0 missing
V757numeric12 unique values
0 missing
V758numeric8 unique values
0 missing
V763numeric28 unique values
0 missing
V796numeric9 unique values
0 missing

19 properties

2000
Number of instances (rows) of the dataset.
101
Number of attributes (columns) of the dataset.
7
Number of distinct values of the target attribute (if it is nominal).
0
Number of missing values in the dataset.
0
Number of instances with at least one value missing.
100
Number of numeric attributes.
1
Number of nominal attributes.
0
Percentage of binary attributes.
0
Percentage of instances having missing values.
0.15
Average class difference between consecutive instances.
0
Percentage of missing values.
0.05
Number of attributes divided by the number of instances.
99.01
Percentage of numeric attributes.
23.4
Percentage of instances belonging to the most frequent class.
0.99
Percentage of nominal attributes.
468
Number of instances belonging to the most frequent class.
6.1
Percentage of instances belonging to the least frequent class.
122
Number of instances belonging to the least frequent class.
0
Number of binary attributes.

0 tasks

Define a new task