Data
arcene_seed_0_nrows_2000_nclasses_10_ncols_100_stratify_True

arcene_seed_0_nrows_2000_nclasses_10_ncols_100_stratify_True

active ARFF Publicly available Visibility: public Uploaded 17-11-2022 by Eddie Bergman
0 likes downloaded by 0 people , 0 total downloads 0 issues 0 downvotes
Issue #Downvotes for this reason By


Loading wiki
Help us complete this description Edit
Subsampling of the dataset arcene (41157) with seed=0 args.nrows=2000 args.ncols=100 args.nclasses=10 args.no_stratify=True Generated with the following source code: ```python def subsample( self, seed: int, nrows_max: int = 2_000, ncols_max: int = 100, nclasses_max: int = 10, stratified: bool = True, ) -> Dataset: rng = np.random.default_rng(seed) x = self.x y = self.y # Uniformly sample classes = y.unique() if len(classes) > nclasses_max: vcs = y.value_counts() selected_classes = rng.choice( classes, size=nclasses_max, replace=False, p=vcs / sum(vcs), ) # Select the indices where one of these classes is present idxs = y.index[y.isin(classes)] x = x.iloc[idxs] y = y.iloc[idxs] # Uniformly sample columns if required if len(x.columns) > ncols_max: columns_idxs = rng.choice( list(range(len(x.columns))), size=ncols_max, replace=False ) sorted_column_idxs = sorted(columns_idxs) selected_columns = list(x.columns[sorted_column_idxs]) x = x[selected_columns] else: sorted_column_idxs = list(range(len(x.columns))) if len(x) > nrows_max: # Stratify accordingly target_name = y.name data = pd.concat((x, y), axis="columns") _, subset = train_test_split( data, test_size=nrows_max, stratify=data[target_name], shuffle=True, random_state=seed, ) x = subset.drop(target_name, axis="columns") y = subset[target_name] # We need to convert categorical columns to string for openml categorical_mask = [self.categorical_mask[i] for i in sorted_column_idxs] columns = list(x.columns) return Dataset( # Technically this is not the same but it's where it was derived from dataset=self.dataset, x=x, y=y, categorical_mask=categorical_mask, columns=columns, ) ```

101 features

class (target)nominal2 unique values
0 missing
V28numeric61 unique values
0 missing
V54numeric63 unique values
0 missing
V83numeric92 unique values
0 missing
V164numeric8 unique values
0 missing
V220numeric22 unique values
0 missing
V282numeric19 unique values
0 missing
V334numeric34 unique values
0 missing
V406numeric42 unique values
0 missing
V486numeric81 unique values
0 missing
V728numeric58 unique values
0 missing
V746numeric75 unique values
0 missing
V799numeric63 unique values
0 missing
V840numeric84 unique values
0 missing
V887numeric49 unique values
0 missing
V1236numeric76 unique values
0 missing
V1347numeric6 unique values
0 missing
V1737numeric60 unique values
0 missing
V1745numeric14 unique values
0 missing
V2271numeric15 unique values
0 missing
V2542numeric74 unique values
0 missing
V2560numeric35 unique values
0 missing
V2638numeric66 unique values
0 missing
V2672numeric24 unique values
0 missing
V2752numeric89 unique values
0 missing
V2979numeric40 unique values
0 missing
V3050numeric63 unique values
0 missing
V3094numeric34 unique values
0 missing
V3214numeric5 unique values
0 missing
V3279numeric1 unique values
0 missing
V3376numeric22 unique values
0 missing
V3571numeric34 unique values
0 missing
V3744numeric40 unique values
0 missing
V3772numeric26 unique values
0 missing
V3779numeric65 unique values
0 missing
V3819numeric65 unique values
0 missing
V3876numeric81 unique values
0 missing
V3912numeric82 unique values
0 missing
V3913numeric90 unique values
0 missing
V4009numeric14 unique values
0 missing
V4202numeric83 unique values
0 missing
V4219numeric83 unique values
0 missing
V4588numeric91 unique values
0 missing
V4782numeric54 unique values
0 missing
V4846numeric44 unique values
0 missing
V4993numeric43 unique values
0 missing
V5038numeric9 unique values
0 missing
V5062numeric47 unique values
0 missing
V5229numeric21 unique values
0 missing
V5238numeric84 unique values
0 missing
V5303numeric72 unique values
0 missing
V5380numeric50 unique values
0 missing
V5392numeric53 unique values
0 missing
V5503numeric82 unique values
0 missing
V5554numeric79 unique values
0 missing
V5706numeric18 unique values
0 missing
V5770numeric61 unique values
0 missing
V5935numeric78 unique values
0 missing
V6015numeric49 unique values
0 missing
V6124numeric40 unique values
0 missing
V6230numeric7 unique values
0 missing
V6271numeric63 unique values
0 missing
V6308numeric69 unique values
0 missing
V6437numeric91 unique values
0 missing
V6439numeric31 unique values
0 missing
V6479numeric60 unique values
0 missing
V6658numeric89 unique values
0 missing
V6671numeric31 unique values
0 missing
V6715numeric29 unique values
0 missing
V6827numeric34 unique values
0 missing
V6859numeric13 unique values
0 missing
V7014numeric8 unique values
0 missing
V7139numeric29 unique values
0 missing
V7171numeric56 unique values
0 missing
V7185numeric84 unique values
0 missing
V7192numeric57 unique values
0 missing
V7234numeric28 unique values
0 missing
V7246numeric16 unique values
0 missing
V7595numeric31 unique values
0 missing
V7599numeric61 unique values
0 missing
V7604numeric31 unique values
0 missing
V8014numeric56 unique values
0 missing
V8060numeric40 unique values
0 missing
V8095numeric53 unique values
0 missing
V8327numeric59 unique values
0 missing
V8371numeric40 unique values
0 missing
V8408numeric15 unique values
0 missing
V8423numeric72 unique values
0 missing
V8429numeric49 unique values
0 missing
V8511numeric41 unique values
0 missing
V8575numeric35 unique values
0 missing
V8721numeric82 unique values
0 missing
V8874numeric49 unique values
0 missing
V8896numeric58 unique values
0 missing
V9048numeric51 unique values
0 missing
V9276numeric71 unique values
0 missing
V9320numeric45 unique values
0 missing
V9463numeric37 unique values
0 missing
V9625numeric46 unique values
0 missing
V9766numeric20 unique values
0 missing
V9927numeric24 unique values
0 missing

19 properties

100
Number of instances (rows) of the dataset.
101
Number of attributes (columns) of the dataset.
2
Number of distinct values of the target attribute (if it is nominal).
0
Number of missing values in the dataset.
0
Number of instances with at least one value missing.
100
Number of numeric attributes.
1
Number of nominal attributes.
0.99
Percentage of binary attributes.
0
Percentage of instances having missing values.
0
Percentage of missing values.
0.39
Average class difference between consecutive instances.
99.01
Percentage of numeric attributes.
1.01
Number of attributes divided by the number of instances.
0.99
Percentage of nominal attributes.
56
Percentage of instances belonging to the most frequent class.
56
Number of instances belonging to the most frequent class.
44
Percentage of instances belonging to the least frequent class.
44
Number of instances belonging to the least frequent class.
1
Number of binary attributes.

0 tasks

Define a new task