Data
madeline_seed_2_nrows_2000_nclasses_10_ncols_100_stratify_True

madeline_seed_2_nrows_2000_nclasses_10_ncols_100_stratify_True

active ARFF Publicly available Visibility: public Uploaded 17-11-2022 by Eddie Bergman
0 likes downloaded by 0 people , 0 total downloads 0 issues 0 downvotes
Issue #Downvotes for this reason By


Loading wiki
Help us complete this description Edit
Subsampling of the dataset madeline (41144) with seed=2 args.nrows=2000 args.ncols=100 args.nclasses=10 args.no_stratify=True Generated with the following source code: ```python def subsample( self, seed: int, nrows_max: int = 2_000, ncols_max: int = 100, nclasses_max: int = 10, stratified: bool = True, ) -> Dataset: rng = np.random.default_rng(seed) x = self.x y = self.y # Uniformly sample classes = y.unique() if len(classes) > nclasses_max: vcs = y.value_counts() selected_classes = rng.choice( classes, size=nclasses_max, replace=False, p=vcs / sum(vcs), ) # Select the indices where one of these classes is present idxs = y.index[y.isin(classes)] x = x.iloc[idxs] y = y.iloc[idxs] # Uniformly sample columns if required if len(x.columns) > ncols_max: columns_idxs = rng.choice( list(range(len(x.columns))), size=ncols_max, replace=False ) sorted_column_idxs = sorted(columns_idxs) selected_columns = list(x.columns[sorted_column_idxs]) x = x[selected_columns] else: sorted_column_idxs = list(range(len(x.columns))) if len(x) > nrows_max: # Stratify accordingly target_name = y.name data = pd.concat((x, y), axis="columns") _, subset = train_test_split( data, test_size=nrows_max, stratify=data[target_name], shuffle=True, random_state=seed, ) x = subset.drop(target_name, axis="columns") y = subset[target_name] # We need to convert categorical columns to string for openml categorical_mask = [self.categorical_mask[i] for i in sorted_column_idxs] columns = list(x.columns) return Dataset( # Technically this is not the same but it's where it was derived from dataset=self.dataset, x=x, y=y, categorical_mask=categorical_mask, columns=columns, ) ```

101 features

class (target)nominal2 unique values
0 missing
V8numeric75 unique values
0 missing
V10numeric12 unique values
0 missing
V15numeric119 unique values
0 missing
V16numeric146 unique values
0 missing
V18numeric134 unique values
0 missing
V19numeric185 unique values
0 missing
V23numeric100 unique values
0 missing
V24numeric66 unique values
0 missing
V26numeric157 unique values
0 missing
V28numeric134 unique values
0 missing
V33numeric206 unique values
0 missing
V36numeric200 unique values
0 missing
V38numeric51 unique values
0 missing
V42numeric224 unique values
0 missing
V43numeric73 unique values
0 missing
V45numeric193 unique values
0 missing
V48numeric86 unique values
0 missing
V49numeric181 unique values
0 missing
V51numeric228 unique values
0 missing
V56numeric206 unique values
0 missing
V57numeric83 unique values
0 missing
V67numeric199 unique values
0 missing
V68numeric138 unique values
0 missing
V70numeric152 unique values
0 missing
V75numeric208 unique values
0 missing
V78numeric130 unique values
0 missing
V80numeric50 unique values
0 missing
V81numeric48 unique values
0 missing
V86numeric422 unique values
0 missing
V94numeric192 unique values
0 missing
V99numeric36 unique values
0 missing
V100numeric224 unique values
0 missing
V101numeric48 unique values
0 missing
V102numeric136 unique values
0 missing
V104numeric38 unique values
0 missing
V105numeric222 unique values
0 missing
V107numeric218 unique values
0 missing
V110numeric20 unique values
0 missing
V112numeric203 unique values
0 missing
V113numeric47 unique values
0 missing
V117numeric216 unique values
0 missing
V118numeric84 unique values
0 missing
V121numeric192 unique values
0 missing
V122numeric131 unique values
0 missing
V124numeric70 unique values
0 missing
V125numeric214 unique values
0 missing
V126numeric24 unique values
0 missing
V127numeric137 unique values
0 missing
V130numeric209 unique values
0 missing
V133numeric161 unique values
0 missing
V134numeric158 unique values
0 missing
V135numeric31 unique values
0 missing
V136numeric193 unique values
0 missing
V138numeric118 unique values
0 missing
V139numeric12 unique values
0 missing
V140numeric185 unique values
0 missing
V142numeric44 unique values
0 missing
V148numeric47 unique values
0 missing
V150numeric69 unique values
0 missing
V152numeric30 unique values
0 missing
V153numeric64 unique values
0 missing
V154numeric94 unique values
0 missing
V159numeric220 unique values
0 missing
V161numeric119 unique values
0 missing
V165numeric100 unique values
0 missing
V169numeric42 unique values
0 missing
V171numeric182 unique values
0 missing
V173numeric128 unique values
0 missing
V177numeric63 unique values
0 missing
V178numeric144 unique values
0 missing
V180numeric182 unique values
0 missing
V181numeric82 unique values
0 missing
V183numeric135 unique values
0 missing
V184numeric549 unique values
0 missing
V187numeric95 unique values
0 missing
V191numeric174 unique values
0 missing
V193numeric35 unique values
0 missing
V194numeric127 unique values
0 missing
V196numeric230 unique values
0 missing
V198numeric484 unique values
0 missing
V202numeric67 unique values
0 missing
V204numeric146 unique values
0 missing
V205numeric36 unique values
0 missing
V206numeric30 unique values
0 missing
V209numeric80 unique values
0 missing
V211numeric202 unique values
0 missing
V212numeric122 unique values
0 missing
V216numeric147 unique values
0 missing
V218numeric214 unique values
0 missing
V220numeric177 unique values
0 missing
V232numeric202 unique values
0 missing
V233numeric293 unique values
0 missing
V239numeric184 unique values
0 missing
V240numeric73 unique values
0 missing
V243numeric196 unique values
0 missing
V245numeric218 unique values
0 missing
V248numeric126 unique values
0 missing
V255numeric224 unique values
0 missing
V256numeric171 unique values
0 missing
V259numeric62 unique values
0 missing

19 properties

2000
Number of instances (rows) of the dataset.
101
Number of attributes (columns) of the dataset.
2
Number of distinct values of the target attribute (if it is nominal).
0
Number of missing values in the dataset.
0
Number of instances with at least one value missing.
100
Number of numeric attributes.
1
Number of nominal attributes.
0
Percentage of missing values.
0.49
Average class difference between consecutive instances.
0.05
Number of attributes divided by the number of instances.
99.01
Percentage of numeric attributes.
50.3
Percentage of instances belonging to the most frequent class.
0.99
Percentage of nominal attributes.
1006
Number of instances belonging to the most frequent class.
49.7
Percentage of instances belonging to the least frequent class.
994
Number of instances belonging to the least frequent class.
1
Number of binary attributes.
0.99
Percentage of binary attributes.
0
Percentage of instances having missing values.

0 tasks

Define a new task