Data
micro-mass_seed_4_nrows_2000_nclasses_10_ncols_100_stratify_True

micro-mass_seed_4_nrows_2000_nclasses_10_ncols_100_stratify_True

active ARFF Publicly available Visibility: public Uploaded 17-11-2022 by Eddie Bergman
0 likes downloaded by 0 people , 0 total downloads 0 issues 0 downvotes
Issue #Downvotes for this reason By


Loading wiki
Help us complete this description Edit
Subsampling of the dataset micro-mass (1515) with seed=4 args.nrows=2000 args.ncols=100 args.nclasses=10 args.no_stratify=True Generated with the following source code: ```python def subsample( self, seed: int, nrows_max: int = 2_000, ncols_max: int = 100, nclasses_max: int = 10, stratified: bool = True, ) -> Dataset: rng = np.random.default_rng(seed) x = self.x y = self.y # Uniformly sample classes = y.unique() if len(classes) > nclasses_max: vcs = y.value_counts() selected_classes = rng.choice( classes, size=nclasses_max, replace=False, p=vcs / sum(vcs), ) # Select the indices where one of these classes is present idxs = y.index[y.isin(classes)] x = x.iloc[idxs] y = y.iloc[idxs] # Uniformly sample columns if required if len(x.columns) > ncols_max: columns_idxs = rng.choice( list(range(len(x.columns))), size=ncols_max, replace=False ) sorted_column_idxs = sorted(columns_idxs) selected_columns = list(x.columns[sorted_column_idxs]) x = x[selected_columns] else: sorted_column_idxs = list(range(len(x.columns))) if len(x) > nrows_max: # Stratify accordingly target_name = y.name data = pd.concat((x, y), axis="columns") _, subset = train_test_split( data, test_size=nrows_max, stratify=data[target_name], shuffle=True, random_state=seed, ) x = subset.drop(target_name, axis="columns") y = subset[target_name] # We need to convert categorical columns to string for openml categorical_mask = [self.categorical_mask[i] for i in sorted_column_idxs] columns = list(x.columns) return Dataset( # Technically this is not the same but it's where it was derived from dataset=self.dataset, x=x, y=y, categorical_mask=categorical_mask, columns=columns, ) ```

101 features

Class (target)nominal20 unique values
0 missing
V34numeric100 unique values
0 missing
V45numeric12 unique values
0 missing
V54numeric1 unique values
0 missing
V73numeric49 unique values
0 missing
V92numeric63 unique values
0 missing
V99numeric68 unique values
0 missing
V111numeric34 unique values
0 missing
V126numeric1 unique values
0 missing
V145numeric4 unique values
0 missing
V163numeric3 unique values
0 missing
V164numeric4 unique values
0 missing
V169numeric66 unique values
0 missing
V199numeric23 unique values
0 missing
V205numeric8 unique values
0 missing
V217numeric154 unique values
0 missing
V219numeric44 unique values
0 missing
V227numeric66 unique values
0 missing
V242numeric53 unique values
0 missing
V249numeric6 unique values
0 missing
V255numeric1 unique values
0 missing
V273numeric60 unique values
0 missing
V278numeric46 unique values
0 missing
V282numeric50 unique values
0 missing
V335numeric7 unique values
0 missing
V356numeric1 unique values
0 missing
V371numeric7 unique values
0 missing
V380numeric12 unique values
0 missing
V394numeric1 unique values
0 missing
V423numeric62 unique values
0 missing
V433numeric1 unique values
0 missing
V448numeric1 unique values
0 missing
V449numeric82 unique values
0 missing
V451numeric32 unique values
0 missing
V459numeric4 unique values
0 missing
V463numeric159 unique values
0 missing
V486numeric3 unique values
0 missing
V489numeric1 unique values
0 missing
V499numeric44 unique values
0 missing
V519numeric97 unique values
0 missing
V558numeric40 unique values
0 missing
V565numeric1 unique values
0 missing
V574numeric117 unique values
0 missing
V583numeric64 unique values
0 missing
V598numeric69 unique values
0 missing
V607numeric51 unique values
0 missing
V608numeric121 unique values
0 missing
V612numeric1 unique values
0 missing
V613numeric27 unique values
0 missing
V616numeric1 unique values
0 missing
V617numeric18 unique values
0 missing
V625numeric26 unique values
0 missing
V640numeric1 unique values
0 missing
V648numeric55 unique values
0 missing
V650numeric149 unique values
0 missing
V666numeric51 unique values
0 missing
V680numeric1 unique values
0 missing
V681numeric64 unique values
0 missing
V691numeric1 unique values
0 missing
V696numeric1 unique values
0 missing
V715numeric86 unique values
0 missing
V717numeric15 unique values
0 missing
V726numeric10 unique values
0 missing
V742numeric30 unique values
0 missing
V797numeric61 unique values
0 missing
V815numeric3 unique values
0 missing
V823numeric47 unique values
0 missing
V837numeric96 unique values
0 missing
V848numeric72 unique values
0 missing
V860numeric13 unique values
0 missing
V873numeric6 unique values
0 missing
V888numeric79 unique values
0 missing
V947numeric62 unique values
0 missing
V952numeric4 unique values
0 missing
V973numeric2 unique values
0 missing
V1020numeric2 unique values
0 missing
V1025numeric26 unique values
0 missing
V1054numeric1 unique values
0 missing
V1057numeric50 unique values
0 missing
V1080numeric83 unique values
0 missing
V1092numeric72 unique values
0 missing
V1099numeric1 unique values
0 missing
V1118numeric44 unique values
0 missing
V1125numeric91 unique values
0 missing
V1128numeric3 unique values
0 missing
V1129numeric103 unique values
0 missing
V1153numeric37 unique values
0 missing
V1163numeric204 unique values
0 missing
V1165numeric57 unique values
0 missing
V1169numeric60 unique values
0 missing
V1175numeric9 unique values
0 missing
V1183numeric103 unique values
0 missing
V1189numeric17 unique values
0 missing
V1227numeric22 unique values
0 missing
V1235numeric76 unique values
0 missing
V1236numeric5 unique values
0 missing
V1245numeric1 unique values
0 missing
V1261numeric259 unique values
0 missing
V1266numeric21 unique values
0 missing
V1270numeric21 unique values
0 missing
V1284numeric20 unique values
0 missing

19 properties

571
Number of instances (rows) of the dataset.
101
Number of attributes (columns) of the dataset.
20
Number of distinct values of the target attribute (if it is nominal).
0
Number of missing values in the dataset.
0
Number of instances with at least one value missing.
100
Number of numeric attributes.
1
Number of nominal attributes.
1.93
Percentage of instances belonging to the least frequent class.
11
Number of instances belonging to the least frequent class.
0
Number of binary attributes.
0
Percentage of binary attributes.
0
Percentage of instances having missing values.
0
Percentage of missing values.
0.7
Average class difference between consecutive instances.
99.01
Percentage of numeric attributes.
0.18
Number of attributes divided by the number of instances.
0.99
Percentage of nominal attributes.
10.51
Percentage of instances belonging to the most frequent class.
60
Number of instances belonging to the most frequent class.

0 tasks

Define a new task