Data
arcene_seed_1_nrows_2000_nclasses_10_ncols_100_stratify_True

arcene_seed_1_nrows_2000_nclasses_10_ncols_100_stratify_True

active ARFF Publicly available Visibility: public Uploaded 17-11-2022 by Eddie Bergman
0 likes downloaded by 0 people , 0 total downloads 0 issues 0 downvotes
  • Machine Learning Manufacturing
Issue #Downvotes for this reason By


Loading wiki
Help us complete this description Edit
Subsampling of the dataset arcene (41157) with seed=1 args.nrows=2000 args.ncols=100 args.nclasses=10 args.no_stratify=True Generated with the following source code: ```python def subsample( self, seed: int, nrows_max: int = 2_000, ncols_max: int = 100, nclasses_max: int = 10, stratified: bool = True, ) -> Dataset: rng = np.random.default_rng(seed) x = self.x y = self.y # Uniformly sample classes = y.unique() if len(classes) > nclasses_max: vcs = y.value_counts() selected_classes = rng.choice( classes, size=nclasses_max, replace=False, p=vcs / sum(vcs), ) # Select the indices where one of these classes is present idxs = y.index[y.isin(classes)] x = x.iloc[idxs] y = y.iloc[idxs] # Uniformly sample columns if required if len(x.columns) > ncols_max: columns_idxs = rng.choice( list(range(len(x.columns))), size=ncols_max, replace=False ) sorted_column_idxs = sorted(columns_idxs) selected_columns = list(x.columns[sorted_column_idxs]) x = x[selected_columns] else: sorted_column_idxs = list(range(len(x.columns))) if len(x) > nrows_max: # Stratify accordingly target_name = y.name data = pd.concat((x, y), axis="columns") _, subset = train_test_split( data, test_size=nrows_max, stratify=data[target_name], shuffle=True, random_state=seed, ) x = subset.drop(target_name, axis="columns") y = subset[target_name] # We need to convert categorical columns to string for openml categorical_mask = [self.categorical_mask[i] for i in sorted_column_idxs] columns = list(x.columns) return Dataset( # Technically this is not the same but it's where it was derived from dataset=self.dataset, x=x, y=y, categorical_mask=categorical_mask, columns=columns, ) ```

101 features

class (target)nominal2 unique values
0 missing
V198numeric71 unique values
0 missing
V274numeric42 unique values
0 missing
V346numeric92 unique values
0 missing
V395numeric60 unique values
0 missing
V542numeric23 unique values
0 missing
V617numeric21 unique values
0 missing
V623numeric88 unique values
0 missing
V851numeric40 unique values
0 missing
V917numeric69 unique values
0 missing
V1155numeric17 unique values
0 missing
V1164numeric58 unique values
0 missing
V1231numeric58 unique values
0 missing
V1236numeric76 unique values
0 missing
V1332numeric26 unique values
0 missing
V1429numeric42 unique values
0 missing
V1479numeric22 unique values
0 missing
V1600numeric7 unique values
0 missing
V2022numeric61 unique values
0 missing
V2146numeric87 unique values
0 missing
V2470numeric23 unique values
0 missing
V2549numeric33 unique values
0 missing
V2598numeric83 unique values
0 missing
V2608numeric86 unique values
0 missing
V2708numeric4 unique values
0 missing
V2757numeric13 unique values
0 missing
V2789numeric63 unique values
0 missing
V2918numeric40 unique values
0 missing
V2919numeric52 unique values
0 missing
V3011numeric31 unique values
0 missing
V3091numeric89 unique values
0 missing
V3213numeric48 unique values
0 missing
V3273numeric25 unique values
0 missing
V3443numeric81 unique values
0 missing
V3618numeric57 unique values
0 missing
V3674numeric31 unique values
0 missing
V3809numeric88 unique values
0 missing
V4006numeric80 unique values
0 missing
V4058numeric86 unique values
0 missing
V4195numeric29 unique values
0 missing
V4197numeric13 unique values
0 missing
V4232numeric34 unique values
0 missing
V4262numeric14 unique values
0 missing
V4494numeric58 unique values
0 missing
V4505numeric51 unique values
0 missing
V4541numeric4 unique values
0 missing
V4564numeric36 unique values
0 missing
V4584numeric58 unique values
0 missing
V4686numeric5 unique values
0 missing
V4826numeric61 unique values
0 missing
V4956numeric86 unique values
0 missing
V4992numeric47 unique values
0 missing
V5069numeric28 unique values
0 missing
V5091numeric51 unique values
0 missing
V5106numeric40 unique values
0 missing
V5142numeric50 unique values
0 missing
V5274numeric31 unique values
0 missing
V5341numeric25 unique values
0 missing
V5388numeric61 unique values
0 missing
V5451numeric84 unique values
0 missing
V5811numeric60 unique values
0 missing
V5922numeric90 unique values
0 missing
V6112numeric32 unique values
0 missing
V6214numeric40 unique values
0 missing
V6385numeric60 unique values
0 missing
V6402numeric39 unique values
0 missing
V6733numeric8 unique values
0 missing
V7168numeric26 unique values
0 missing
V7214numeric42 unique values
0 missing
V7228numeric26 unique values
0 missing
V7453numeric80 unique values
0 missing
V7461numeric3 unique values
0 missing
V7477numeric62 unique values
0 missing
V7479numeric60 unique values
0 missing
V7528numeric30 unique values
0 missing
V7688numeric29 unique values
0 missing
V7706numeric26 unique values
0 missing
V7742numeric81 unique values
0 missing
V7828numeric25 unique values
0 missing
V8034numeric85 unique values
0 missing
V8115numeric46 unique values
0 missing
V8153numeric26 unique values
0 missing
V8197numeric21 unique values
0 missing
V8206numeric59 unique values
0 missing
V8315numeric18 unique values
0 missing
V8391numeric4 unique values
0 missing
V8513numeric37 unique values
0 missing
V8588numeric39 unique values
0 missing
V8613numeric24 unique values
0 missing
V8701numeric85 unique values
0 missing
V8982numeric8 unique values
0 missing
V9148numeric62 unique values
0 missing
V9208numeric54 unique values
0 missing
V9400numeric42 unique values
0 missing
V9414numeric15 unique values
0 missing
V9569numeric38 unique values
0 missing
V9661numeric57 unique values
0 missing
V9704numeric32 unique values
0 missing
V9757numeric49 unique values
0 missing
V9805numeric89 unique values
0 missing
V9999numeric60 unique values
0 missing

19 properties

100
Number of instances (rows) of the dataset.
101
Number of attributes (columns) of the dataset.
2
Number of distinct values of the target attribute (if it is nominal).
0
Number of missing values in the dataset.
0
Number of instances with at least one value missing.
100
Number of numeric attributes.
1
Number of nominal attributes.
0.99
Percentage of binary attributes.
0
Percentage of instances having missing values.
0.39
Average class difference between consecutive instances.
0
Percentage of missing values.
1.01
Number of attributes divided by the number of instances.
99.01
Percentage of numeric attributes.
56
Percentage of instances belonging to the most frequent class.
0.99
Percentage of nominal attributes.
56
Number of instances belonging to the most frequent class.
44
Percentage of instances belonging to the least frequent class.
44
Number of instances belonging to the least frequent class.
1
Number of binary attributes.

0 tasks

Define a new task