Data
arcene_seed_2_nrows_2000_nclasses_10_ncols_100_stratify_True

arcene_seed_2_nrows_2000_nclasses_10_ncols_100_stratify_True

active ARFF Publicly available Visibility: public Uploaded 17-11-2022 by Eddie Bergman
0 likes downloaded by 0 people , 0 total downloads 0 issues 0 downvotes
  • Computer Systems Machine Learning
Issue #Downvotes for this reason By


Loading wiki
Help us complete this description Edit
Subsampling of the dataset arcene (41157) with seed=2 args.nrows=2000 args.ncols=100 args.nclasses=10 args.no_stratify=True Generated with the following source code: ```python def subsample( self, seed: int, nrows_max: int = 2_000, ncols_max: int = 100, nclasses_max: int = 10, stratified: bool = True, ) -> Dataset: rng = np.random.default_rng(seed) x = self.x y = self.y # Uniformly sample classes = y.unique() if len(classes) > nclasses_max: vcs = y.value_counts() selected_classes = rng.choice( classes, size=nclasses_max, replace=False, p=vcs / sum(vcs), ) # Select the indices where one of these classes is present idxs = y.index[y.isin(classes)] x = x.iloc[idxs] y = y.iloc[idxs] # Uniformly sample columns if required if len(x.columns) > ncols_max: columns_idxs = rng.choice( list(range(len(x.columns))), size=ncols_max, replace=False ) sorted_column_idxs = sorted(columns_idxs) selected_columns = list(x.columns[sorted_column_idxs]) x = x[selected_columns] else: sorted_column_idxs = list(range(len(x.columns))) if len(x) > nrows_max: # Stratify accordingly target_name = y.name data = pd.concat((x, y), axis="columns") _, subset = train_test_split( data, test_size=nrows_max, stratify=data[target_name], shuffle=True, random_state=seed, ) x = subset.drop(target_name, axis="columns") y = subset[target_name] # We need to convert categorical columns to string for openml categorical_mask = [self.categorical_mask[i] for i in sorted_column_idxs] columns = list(x.columns) return Dataset( # Technically this is not the same but it's where it was derived from dataset=self.dataset, x=x, y=y, categorical_mask=categorical_mask, columns=columns, ) ```

101 features

class (target)nominal2 unique values
0 missing
V387numeric42 unique values
0 missing
V396numeric43 unique values
0 missing
V547numeric4 unique values
0 missing
V569numeric23 unique values
0 missing
V765numeric60 unique values
0 missing
V911numeric86 unique values
0 missing
V967numeric12 unique values
0 missing
V1007numeric21 unique values
0 missing
V1042numeric20 unique values
0 missing
V1068numeric30 unique values
0 missing
V1083numeric69 unique values
0 missing
V1490numeric5 unique values
0 missing
V1862numeric69 unique values
0 missing
V1863numeric63 unique values
0 missing
V1998numeric56 unique values
0 missing
V2012numeric28 unique values
0 missing
V2127numeric66 unique values
0 missing
V2162numeric45 unique values
0 missing
V2183numeric2 unique values
0 missing
V2228numeric48 unique values
0 missing
V2582numeric23 unique values
0 missing
V2587numeric5 unique values
0 missing
V2591numeric69 unique values
0 missing
V2728numeric33 unique values
0 missing
V2957numeric81 unique values
0 missing
V3033numeric37 unique values
0 missing
V3166numeric69 unique values
0 missing
V3311numeric27 unique values
0 missing
V3319numeric63 unique values
0 missing
V3388numeric91 unique values
0 missing
V3440numeric88 unique values
0 missing
V3778numeric24 unique values
0 missing
V3892numeric38 unique values
0 missing
V4055numeric49 unique values
0 missing
V4099numeric59 unique values
0 missing
V4199numeric65 unique values
0 missing
V4295numeric64 unique values
0 missing
V4374numeric86 unique values
0 missing
V4397numeric49 unique values
0 missing
V4438numeric71 unique values
0 missing
V4471numeric47 unique values
0 missing
V4475numeric51 unique values
0 missing
V4526numeric48 unique values
0 missing
V4688numeric55 unique values
0 missing
V4707numeric28 unique values
0 missing
V4750numeric43 unique values
0 missing
V4853numeric47 unique values
0 missing
V4975numeric23 unique values
0 missing
V4977numeric65 unique values
0 missing
V5011numeric59 unique values
0 missing
V5083numeric67 unique values
0 missing
V5153numeric50 unique values
0 missing
V5205numeric45 unique values
0 missing
V5226numeric61 unique values
0 missing
V5536numeric38 unique values
0 missing
V5579numeric46 unique values
0 missing
V5669numeric27 unique values
0 missing
V5765numeric8 unique values
0 missing
V5920numeric67 unique values
0 missing
V5921numeric27 unique values
0 missing
V5922numeric90 unique values
0 missing
V5947numeric66 unique values
0 missing
V6128numeric24 unique values
0 missing
V6289numeric25 unique values
0 missing
V6324numeric77 unique values
0 missing
V6426numeric23 unique values
0 missing
V6522numeric54 unique values
0 missing
V6645numeric30 unique values
0 missing
V6738numeric9 unique values
0 missing
V6776numeric59 unique values
0 missing
V6787numeric11 unique values
0 missing
V6887numeric71 unique values
0 missing
V6908numeric83 unique values
0 missing
V6919numeric77 unique values
0 missing
V6974numeric66 unique values
0 missing
V7222numeric35 unique values
0 missing
V7436numeric23 unique values
0 missing
V7687numeric66 unique values
0 missing
V7716numeric55 unique values
0 missing
V7829numeric72 unique values
0 missing
V8060numeric40 unique values
0 missing
V8066numeric59 unique values
0 missing
V8282numeric27 unique values
0 missing
V8293numeric35 unique values
0 missing
V8466numeric25 unique values
0 missing
V8603numeric83 unique values
0 missing
V8623numeric10 unique values
0 missing
V8724numeric55 unique values
0 missing
V8728numeric59 unique values
0 missing
V8813numeric31 unique values
0 missing
V8864numeric20 unique values
0 missing
V8907numeric32 unique values
0 missing
V9030numeric47 unique values
0 missing
V9198numeric87 unique values
0 missing
V9287numeric61 unique values
0 missing
V9386numeric70 unique values
0 missing
V9541numeric46 unique values
0 missing
V9611numeric50 unique values
0 missing
V9717numeric24 unique values
0 missing
V9843numeric66 unique values
0 missing

19 properties

100
Number of instances (rows) of the dataset.
101
Number of attributes (columns) of the dataset.
2
Number of distinct values of the target attribute (if it is nominal).
0
Number of missing values in the dataset.
0
Number of instances with at least one value missing.
100
Number of numeric attributes.
1
Number of nominal attributes.
1.01
Number of attributes divided by the number of instances.
99.01
Percentage of numeric attributes.
56
Percentage of instances belonging to the most frequent class.
0.99
Percentage of nominal attributes.
56
Number of instances belonging to the most frequent class.
44
Percentage of instances belonging to the least frequent class.
44
Number of instances belonging to the least frequent class.
1
Number of binary attributes.
0.99
Percentage of binary attributes.
0
Percentage of instances having missing values.
0.39
Average class difference between consecutive instances.
0
Percentage of missing values.

0 tasks

Define a new task