Data
madeline_seed_1_nrows_2000_nclasses_10_ncols_100_stratify_True

madeline_seed_1_nrows_2000_nclasses_10_ncols_100_stratify_True

active ARFF Publicly available Visibility: public Uploaded 17-11-2022 by Eddie Bergman
0 likes downloaded by 0 people , 0 total downloads 0 issues 0 downvotes
Issue #Downvotes for this reason By


Loading wiki
Help us complete this description Edit
Subsampling of the dataset madeline (41144) with seed=1 args.nrows=2000 args.ncols=100 args.nclasses=10 args.no_stratify=True Generated with the following source code: ```python def subsample( self, seed: int, nrows_max: int = 2_000, ncols_max: int = 100, nclasses_max: int = 10, stratified: bool = True, ) -> Dataset: rng = np.random.default_rng(seed) x = self.x y = self.y # Uniformly sample classes = y.unique() if len(classes) > nclasses_max: vcs = y.value_counts() selected_classes = rng.choice( classes, size=nclasses_max, replace=False, p=vcs / sum(vcs), ) # Select the indices where one of these classes is present idxs = y.index[y.isin(classes)] x = x.iloc[idxs] y = y.iloc[idxs] # Uniformly sample columns if required if len(x.columns) > ncols_max: columns_idxs = rng.choice( list(range(len(x.columns))), size=ncols_max, replace=False ) sorted_column_idxs = sorted(columns_idxs) selected_columns = list(x.columns[sorted_column_idxs]) x = x[selected_columns] else: sorted_column_idxs = list(range(len(x.columns))) if len(x) > nrows_max: # Stratify accordingly target_name = y.name data = pd.concat((x, y), axis="columns") _, subset = train_test_split( data, test_size=nrows_max, stratify=data[target_name], shuffle=True, random_state=seed, ) x = subset.drop(target_name, axis="columns") y = subset[target_name] # We need to convert categorical columns to string for openml categorical_mask = [self.categorical_mask[i] for i in sorted_column_idxs] columns = list(x.columns) return Dataset( # Technically this is not the same but it's where it was derived from dataset=self.dataset, x=x, y=y, categorical_mask=categorical_mask, columns=columns, ) ```

101 features

class (target)nominal2 unique values
0 missing
V4numeric120 unique values
0 missing
V5numeric136 unique values
0 missing
V6numeric31 unique values
0 missing
V10numeric12 unique values
0 missing
V13numeric88 unique values
0 missing
V14numeric161 unique values
0 missing
V15numeric117 unique values
0 missing
V16numeric145 unique values
0 missing
V20numeric134 unique values
0 missing
V24numeric66 unique values
0 missing
V25numeric39 unique values
0 missing
V26numeric162 unique values
0 missing
V35numeric198 unique values
0 missing
V39numeric73 unique values
0 missing
V41numeric199 unique values
0 missing
V42numeric220 unique values
0 missing
V45numeric195 unique values
0 missing
V47numeric99 unique values
0 missing
V53numeric254 unique values
0 missing
V57numeric86 unique values
0 missing
V58numeric124 unique values
0 missing
V60numeric28 unique values
0 missing
V62numeric155 unique values
0 missing
V63numeric208 unique values
0 missing
V65numeric224 unique values
0 missing
V66numeric184 unique values
0 missing
V71numeric223 unique values
0 missing
V72numeric201 unique values
0 missing
V73numeric77 unique values
0 missing
V75numeric209 unique values
0 missing
V76numeric43 unique values
0 missing
V79numeric22 unique values
0 missing
V83numeric165 unique values
0 missing
V85numeric102 unique values
0 missing
V86numeric422 unique values
0 missing
V87numeric50 unique values
0 missing
V88numeric75 unique values
0 missing
V93numeric117 unique values
0 missing
V96numeric144 unique values
0 missing
V98numeric98 unique values
0 missing
V99numeric36 unique values
0 missing
V100numeric222 unique values
0 missing
V102numeric140 unique values
0 missing
V103numeric223 unique values
0 missing
V109numeric151 unique values
0 missing
V110numeric20 unique values
0 missing
V114numeric222 unique values
0 missing
V115numeric127 unique values
0 missing
V116numeric74 unique values
0 missing
V123numeric112 unique values
0 missing
V125numeric222 unique values
0 missing
V128numeric214 unique values
0 missing
V130numeric211 unique values
0 missing
V137numeric190 unique values
0 missing
V141numeric119 unique values
0 missing
V144numeric191 unique values
0 missing
V146numeric206 unique values
0 missing
V148numeric45 unique values
0 missing
V151numeric114 unique values
0 missing
V153numeric65 unique values
0 missing
V155numeric54 unique values
0 missing
V156numeric9 unique values
0 missing
V157numeric57 unique values
0 missing
V159numeric220 unique values
0 missing
V169numeric42 unique values
0 missing
V170numeric54 unique values
0 missing
V177numeric66 unique values
0 missing
V178numeric146 unique values
0 missing
V181numeric82 unique values
0 missing
V187numeric96 unique values
0 missing
V188numeric107 unique values
0 missing
V190numeric184 unique values
0 missing
V192numeric321 unique values
0 missing
V193numeric36 unique values
0 missing
V198numeric485 unique values
0 missing
V199numeric60 unique values
0 missing
V201numeric190 unique values
0 missing
V204numeric144 unique values
0 missing
V205numeric38 unique values
0 missing
V206numeric29 unique values
0 missing
V208numeric68 unique values
0 missing
V209numeric78 unique values
0 missing
V210numeric120 unique values
0 missing
V211numeric202 unique values
0 missing
V212numeric122 unique values
0 missing
V213numeric230 unique values
0 missing
V223numeric109 unique values
0 missing
V228numeric62 unique values
0 missing
V229numeric242 unique values
0 missing
V230numeric226 unique values
0 missing
V232numeric207 unique values
0 missing
V241numeric80 unique values
0 missing
V242numeric84 unique values
0 missing
V243numeric197 unique values
0 missing
V244numeric98 unique values
0 missing
V246numeric68 unique values
0 missing
V250numeric116 unique values
0 missing
V254numeric213 unique values
0 missing
V258numeric43 unique values
0 missing
V259numeric63 unique values
0 missing

19 properties

2000
Number of instances (rows) of the dataset.
101
Number of attributes (columns) of the dataset.
2
Number of distinct values of the target attribute (if it is nominal).
0
Number of missing values in the dataset.
0
Number of instances with at least one value missing.
100
Number of numeric attributes.
1
Number of nominal attributes.
99.01
Percentage of numeric attributes.
0.05
Number of attributes divided by the number of instances.
0.99
Percentage of nominal attributes.
50.3
Percentage of instances belonging to the most frequent class.
1006
Number of instances belonging to the most frequent class.
49.7
Percentage of instances belonging to the least frequent class.
994
Number of instances belonging to the least frequent class.
1
Number of binary attributes.
0.99
Percentage of binary attributes.
0
Percentage of instances having missing values.
0
Percentage of missing values.
0.5
Average class difference between consecutive instances.

0 tasks

Define a new task