Data
fabert_seed_1_nrows_2000_nclasses_10_ncols_100_stratify_True

fabert_seed_1_nrows_2000_nclasses_10_ncols_100_stratify_True

active ARFF Publicly available Visibility: public Uploaded 17-11-2022 by Eddie Bergman
0 likes downloaded by 0 people , 0 total downloads 0 issues 0 downvotes
Issue #Downvotes for this reason By


Loading wiki
Help us complete this description Edit
Subsampling of the dataset fabert (41164) with seed=1 args.nrows=2000 args.ncols=100 args.nclasses=10 args.no_stratify=True Generated with the following source code: ```python def subsample( self, seed: int, nrows_max: int = 2_000, ncols_max: int = 100, nclasses_max: int = 10, stratified: bool = True, ) -> Dataset: rng = np.random.default_rng(seed) x = self.x y = self.y # Uniformly sample classes = y.unique() if len(classes) > nclasses_max: vcs = y.value_counts() selected_classes = rng.choice( classes, size=nclasses_max, replace=False, p=vcs / sum(vcs), ) # Select the indices where one of these classes is present idxs = y.index[y.isin(classes)] x = x.iloc[idxs] y = y.iloc[idxs] # Uniformly sample columns if required if len(x.columns) > ncols_max: columns_idxs = rng.choice( list(range(len(x.columns))), size=ncols_max, replace=False ) sorted_column_idxs = sorted(columns_idxs) selected_columns = list(x.columns[sorted_column_idxs]) x = x[selected_columns] else: sorted_column_idxs = list(range(len(x.columns))) if len(x) > nrows_max: # Stratify accordingly target_name = y.name data = pd.concat((x, y), axis="columns") _, subset = train_test_split( data, test_size=nrows_max, stratify=data[target_name], shuffle=True, random_state=seed, ) x = subset.drop(target_name, axis="columns") y = subset[target_name] # We need to convert categorical columns to string for openml categorical_mask = [self.categorical_mask[i] for i in sorted_column_idxs] columns = list(x.columns) return Dataset( # Technically this is not the same but it's where it was derived from dataset=self.dataset, x=x, y=y, categorical_mask=categorical_mask, columns=columns, ) ```

101 features

class (target)nominal7 unique values
0 missing
V15numeric25 unique values
0 missing
V20numeric9 unique values
0 missing
V25numeric9 unique values
0 missing
V31numeric12 unique values
0 missing
V44numeric26 unique values
0 missing
V47numeric13 unique values
0 missing
V49numeric13 unique values
0 missing
V62numeric24 unique values
0 missing
V70numeric55 unique values
0 missing
V88numeric6 unique values
0 missing
V89numeric15 unique values
0 missing
V91numeric6 unique values
0 missing
V99numeric50 unique values
0 missing
V102numeric26 unique values
0 missing
V119numeric25 unique values
0 missing
V122numeric2 unique values
0 missing
V151numeric8 unique values
0 missing
V169numeric24 unique values
0 missing
V177numeric20 unique values
0 missing
V184numeric33 unique values
0 missing
V195numeric14 unique values
0 missing
V205numeric9 unique values
0 missing
V209numeric5 unique values
0 missing
V210numeric24 unique values
0 missing
V221numeric6 unique values
0 missing
V222numeric4 unique values
0 missing
V224numeric5 unique values
0 missing
V240numeric3 unique values
0 missing
V245numeric36 unique values
0 missing
V272numeric5 unique values
0 missing
V280numeric6 unique values
0 missing
V282numeric55 unique values
0 missing
V287numeric15 unique values
0 missing
V293numeric1 unique values
0 missing
V297numeric3 unique values
0 missing
V302numeric69 unique values
0 missing
V321numeric24 unique values
0 missing
V325numeric16 unique values
0 missing
V330numeric17 unique values
0 missing
V331numeric30 unique values
0 missing
V332numeric24 unique values
0 missing
V350numeric24 unique values
0 missing
V358numeric24 unique values
0 missing
V360numeric24 unique values
0 missing
V362numeric55 unique values
0 missing
V372numeric14 unique values
0 missing
V390numeric24 unique values
0 missing
V394numeric4 unique values
0 missing
V395numeric53 unique values
0 missing
V404numeric46 unique values
0 missing
V406numeric4 unique values
0 missing
V409numeric45 unique values
0 missing
V411numeric14 unique values
0 missing
V436numeric10 unique values
0 missing
V461numeric7 unique values
0 missing
V462numeric13 unique values
0 missing
V467numeric5 unique values
0 missing
V473numeric18 unique values
0 missing
V478numeric28 unique values
0 missing
V502numeric58 unique values
0 missing
V531numeric48 unique values
0 missing
V535numeric22 unique values
0 missing
V545numeric32 unique values
0 missing
V546numeric41 unique values
0 missing
V550numeric41 unique values
0 missing
V557numeric47 unique values
0 missing
V562numeric15 unique values
0 missing
V574numeric14 unique values
0 missing
V582numeric23 unique values
0 missing
V591numeric16 unique values
0 missing
V593numeric58 unique values
0 missing
V596numeric9 unique values
0 missing
V597numeric11 unique values
0 missing
V600numeric51 unique values
0 missing
V603numeric7 unique values
0 missing
V606numeric23 unique values
0 missing
V618numeric9 unique values
0 missing
V625numeric10 unique values
0 missing
V634numeric18 unique values
0 missing
V656numeric6 unique values
0 missing
V664numeric59 unique values
0 missing
V667numeric17 unique values
0 missing
V669numeric40 unique values
0 missing
V670numeric10 unique values
0 missing
V672numeric20 unique values
0 missing
V678numeric21 unique values
0 missing
V699numeric14 unique values
0 missing
V709numeric39 unique values
0 missing
V717numeric25 unique values
0 missing
V722numeric2 unique values
0 missing
V730numeric41 unique values
0 missing
V731numeric3 unique values
0 missing
V732numeric6 unique values
0 missing
V734numeric44 unique values
0 missing
V738numeric12 unique values
0 missing
V740numeric3 unique values
0 missing
V745numeric25 unique values
0 missing
V775numeric26 unique values
0 missing
V780numeric30 unique values
0 missing
V781numeric33 unique values
0 missing

19 properties

2000
Number of instances (rows) of the dataset.
101
Number of attributes (columns) of the dataset.
7
Number of distinct values of the target attribute (if it is nominal).
0
Number of missing values in the dataset.
0
Number of instances with at least one value missing.
100
Number of numeric attributes.
1
Number of nominal attributes.
0
Percentage of binary attributes.
0
Percentage of instances having missing values.
0.16
Average class difference between consecutive instances.
0
Percentage of missing values.
0.05
Number of attributes divided by the number of instances.
99.01
Percentage of numeric attributes.
23.4
Percentage of instances belonging to the most frequent class.
0.99
Percentage of nominal attributes.
468
Number of instances belonging to the most frequent class.
6.1
Percentage of instances belonging to the least frequent class.
122
Number of instances belonging to the least frequent class.
0
Number of binary attributes.

0 tasks

Define a new task