Data
madeline_seed_0_nrows_2000_nclasses_10_ncols_100_stratify_True

madeline_seed_0_nrows_2000_nclasses_10_ncols_100_stratify_True

active ARFF Publicly available Visibility: public Uploaded 17-11-2022 by Eddie Bergman
0 likes downloaded by 0 people , 0 total downloads 0 issues 0 downvotes
Issue #Downvotes for this reason By


Loading wiki
Help us complete this description Edit
Subsampling of the dataset madeline (41144) with seed=0 args.nrows=2000 args.ncols=100 args.nclasses=10 args.no_stratify=True Generated with the following source code: ```python def subsample( self, seed: int, nrows_max: int = 2_000, ncols_max: int = 100, nclasses_max: int = 10, stratified: bool = True, ) -> Dataset: rng = np.random.default_rng(seed) x = self.x y = self.y # Uniformly sample classes = y.unique() if len(classes) > nclasses_max: vcs = y.value_counts() selected_classes = rng.choice( classes, size=nclasses_max, replace=False, p=vcs / sum(vcs), ) # Select the indices where one of these classes is present idxs = y.index[y.isin(classes)] x = x.iloc[idxs] y = y.iloc[idxs] # Uniformly sample columns if required if len(x.columns) > ncols_max: columns_idxs = rng.choice( list(range(len(x.columns))), size=ncols_max, replace=False ) sorted_column_idxs = sorted(columns_idxs) selected_columns = list(x.columns[sorted_column_idxs]) x = x[selected_columns] else: sorted_column_idxs = list(range(len(x.columns))) if len(x) > nrows_max: # Stratify accordingly target_name = y.name data = pd.concat((x, y), axis="columns") _, subset = train_test_split( data, test_size=nrows_max, stratify=data[target_name], shuffle=True, random_state=seed, ) x = subset.drop(target_name, axis="columns") y = subset[target_name] # We need to convert categorical columns to string for openml categorical_mask = [self.categorical_mask[i] for i in sorted_column_idxs] columns = list(x.columns) return Dataset( # Technically this is not the same but it's where it was derived from dataset=self.dataset, x=x, y=y, categorical_mask=categorical_mask, columns=columns, ) ```

101 features

class (target)nominal2 unique values
0 missing
V1numeric198 unique values
0 missing
V2numeric121 unique values
0 missing
V3numeric135 unique values
0 missing
V5numeric140 unique values
0 missing
V6numeric30 unique values
0 missing
V7numeric156 unique values
0 missing
V13numeric91 unique values
0 missing
V16numeric150 unique values
0 missing
V18numeric132 unique values
0 missing
V22numeric7 unique values
0 missing
V26numeric157 unique values
0 missing
V30numeric90 unique values
0 missing
V31numeric212 unique values
0 missing
V34numeric119 unique values
0 missing
V44numeric214 unique values
0 missing
V50numeric49 unique values
0 missing
V51numeric230 unique values
0 missing
V54numeric32 unique values
0 missing
V58numeric125 unique values
0 missing
V60numeric26 unique values
0 missing
V62numeric157 unique values
0 missing
V67numeric203 unique values
0 missing
V72numeric201 unique values
0 missing
V73numeric76 unique values
0 missing
V79numeric22 unique values
0 missing
V81numeric45 unique values
0 missing
V82numeric148 unique values
0 missing
V83numeric166 unique values
0 missing
V84numeric398 unique values
0 missing
V85numeric102 unique values
0 missing
V86numeric427 unique values
0 missing
V87numeric51 unique values
0 missing
V96numeric141 unique values
0 missing
V97numeric119 unique values
0 missing
V98numeric100 unique values
0 missing
V99numeric36 unique values
0 missing
V100numeric222 unique values
0 missing
V103numeric225 unique values
0 missing
V104numeric37 unique values
0 missing
V105numeric218 unique values
0 missing
V106numeric202 unique values
0 missing
V109numeric147 unique values
0 missing
V111numeric84 unique values
0 missing
V112numeric198 unique values
0 missing
V114numeric219 unique values
0 missing
V121numeric188 unique values
0 missing
V123numeric109 unique values
0 missing
V125numeric220 unique values
0 missing
V127numeric135 unique values
0 missing
V128numeric218 unique values
0 missing
V129numeric63 unique values
0 missing
V131numeric10 unique values
0 missing
V134numeric159 unique values
0 missing
V137numeric185 unique values
0 missing
V138numeric120 unique values
0 missing
V143numeric15 unique values
0 missing
V144numeric196 unique values
0 missing
V146numeric210 unique values
0 missing
V148numeric46 unique values
0 missing
V149numeric212 unique values
0 missing
V153numeric66 unique values
0 missing
V157numeric59 unique values
0 missing
V159numeric216 unique values
0 missing
V161numeric113 unique values
0 missing
V162numeric42 unique values
0 missing
V164numeric324 unique values
0 missing
V167numeric192 unique values
0 missing
V168numeric49 unique values
0 missing
V169numeric40 unique values
0 missing
V173numeric124 unique values
0 missing
V176numeric129 unique values
0 missing
V182numeric11 unique values
0 missing
V185numeric68 unique values
0 missing
V187numeric92 unique values
0 missing
V189numeric97 unique values
0 missing
V193numeric35 unique values
0 missing
V197numeric115 unique values
0 missing
V204numeric150 unique values
0 missing
V205numeric36 unique values
0 missing
V208numeric68 unique values
0 missing
V210numeric122 unique values
0 missing
V211numeric199 unique values
0 missing
V213numeric231 unique values
0 missing
V215numeric137 unique values
0 missing
V216numeric141 unique values
0 missing
V222numeric215 unique values
0 missing
V223numeric116 unique values
0 missing
V224numeric196 unique values
0 missing
V230numeric224 unique values
0 missing
V234numeric177 unique values
0 missing
V235numeric65 unique values
0 missing
V236numeric92 unique values
0 missing
V237numeric88 unique values
0 missing
V241numeric80 unique values
0 missing
V248numeric127 unique values
0 missing
V249numeric87 unique values
0 missing
V250numeric113 unique values
0 missing
V255numeric216 unique values
0 missing
V256numeric162 unique values
0 missing
V258numeric43 unique values
0 missing

19 properties

2000
Number of instances (rows) of the dataset.
101
Number of attributes (columns) of the dataset.
2
Number of distinct values of the target attribute (if it is nominal).
0
Number of missing values in the dataset.
0
Number of instances with at least one value missing.
100
Number of numeric attributes.
1
Number of nominal attributes.
0.99
Percentage of binary attributes.
0
Percentage of instances having missing values.
0
Percentage of missing values.
0.5
Average class difference between consecutive instances.
99.01
Percentage of numeric attributes.
0.05
Number of attributes divided by the number of instances.
0.99
Percentage of nominal attributes.
50.3
Percentage of instances belonging to the most frequent class.
1006
Number of instances belonging to the most frequent class.
49.7
Percentage of instances belonging to the least frequent class.
994
Number of instances belonging to the least frequent class.
1
Number of binary attributes.

0 tasks

Define a new task