OpenML
robert_seed_0_nrows_2000_nclasses_10_ncols_100_stratify_True

robert_seed_0_nrows_2000_nclasses_10_ncols_100_stratify_True

active ARFF Publicly available Visibility: public Uploaded 17-11-2022 by Eddie Bergman
0 likes downloaded by 0 people , 0 total downloads 0 issues 0 downvotes
Issue #Downvotes for this reason By


Loading wiki
Help us complete this description Edit
Subsampling of the dataset robert (41165) with seed=0 args.nrows=2000 args.ncols=100 args.nclasses=10 args.no_stratify=True Generated with the following source code: ```python def subsample( self, seed: int, nrows_max: int = 2_000, ncols_max: int = 100, nclasses_max: int = 10, stratified: bool = True, ) -> Dataset: rng = np.random.default_rng(seed) x = self.x y = self.y # Uniformly sample classes = y.unique() if len(classes) > nclasses_max: vcs = y.value_counts() selected_classes = rng.choice( classes, size=nclasses_max, replace=False, p=vcs / sum(vcs), ) # Select the indices where one of these classes is present idxs = y.index[y.isin(classes)] x = x.iloc[idxs] y = y.iloc[idxs] # Uniformly sample columns if required if len(x.columns) > ncols_max: columns_idxs = rng.choice( list(range(len(x.columns))), size=ncols_max, replace=False ) sorted_column_idxs = sorted(columns_idxs) selected_columns = list(x.columns[sorted_column_idxs]) x = x[selected_columns] else: sorted_column_idxs = list(range(len(x.columns))) if len(x) > nrows_max: # Stratify accordingly target_name = y.name data = pd.concat((x, y), axis="columns") _, subset = train_test_split( data, test_size=nrows_max, stratify=data[target_name], shuffle=True, random_state=seed, ) x = subset.drop(target_name, axis="columns") y = subset[target_name] # We need to convert categorical columns to string for openml categorical_mask = [self.categorical_mask[i] for i in sorted_column_idxs] columns = list(x.columns) return Dataset( # Technically this is not the same but it's where it was derived from dataset=self.dataset, x=x, y=y, categorical_mask=categorical_mask, columns=columns, ) ```

101 features

class (target)nominal10 unique values
0 missing
V20numeric589 unique values
0 missing
V39numeric533 unique values
0 missing
V60numeric535 unique values
0 missing
V118numeric253 unique values
0 missing
V158numeric639 unique values
0 missing
V203numeric665 unique values
0 missing
V240numeric254 unique values
0 missing
V292numeric630 unique values
0 missing
V350numeric609 unique values
0 missing
V524numeric644 unique values
0 missing
V535numeric543 unique values
0 missing
V574numeric554 unique values
0 missing
V605numeric564 unique values
0 missing
V637numeric253 unique values
0 missing
V888numeric614 unique values
0 missing
V969numeric664 unique values
0 missing
V1246numeric658 unique values
0 missing
V1253numeric717 unique values
0 missing
V1635numeric420 unique values
0 missing
V1829numeric620 unique values
0 missing
V1840numeric494 unique values
0 missing
V1899numeric687 unique values
0 missing
V1917numeric524 unique values
0 missing
V1975numeric640 unique values
0 missing
V2140numeric604 unique values
0 missing
V2188numeric483 unique values
0 missing
V2226numeric252 unique values
0 missing
V2313numeric478 unique values
0 missing
V2360numeric574 unique values
0 missing
V2429numeric544 unique values
0 missing
V2569numeric585 unique values
0 missing
V2693numeric485 unique values
0 missing
V2716numeric532 unique values
0 missing
V2717numeric632 unique values
0 missing
V2745numeric559 unique values
0 missing
V2787numeric567 unique values
0 missing
V2809numeric620 unique values
0 missing
V2816numeric719 unique values
0 missing
V2880numeric252 unique values
0 missing
V3018numeric252 unique values
0 missing
V3034numeric251 unique values
0 missing
V3297numeric543 unique values
0 missing
V3435numeric614 unique values
0 missing
V3486numeric591 unique values
0 missing
V3583numeric599 unique values
0 missing
V3626numeric245 unique values
0 missing
V3631numeric499 unique values
0 missing
V3757numeric252 unique values
0 missing
V3767numeric590 unique values
0 missing
V3815numeric533 unique values
0 missing
V3864numeric663 unique values
0 missing
V3870numeric681 unique values
0 missing
V3951numeric255 unique values
0 missing
V3987numeric558 unique values
0 missing
V4105numeric565 unique values
0 missing
V4150numeric561 unique values
0 missing
V4271numeric551 unique values
0 missing
V4316numeric694 unique values
0 missing
V4401numeric253 unique values
0 missing
V4485numeric545 unique values
0 missing
V4500numeric488 unique values
0 missing
V4524numeric638 unique values
0 missing
V4618numeric703 unique values
0 missing
V4627numeric628 unique values
0 missing
V4658numeric478 unique values
0 missing
V4779numeric612 unique values
0 missing
V4793numeric521 unique values
0 missing
V4831numeric529 unique values
0 missing
V4908numeric545 unique values
0 missing
V4931numeric580 unique values
0 missing
V5043numeric536 unique values
0 missing
V5139numeric600 unique values
0 missing
V5158numeric625 unique values
0 missing
V5170numeric655 unique values
0 missing
V5172numeric682 unique values
0 missing
V5192numeric546 unique values
0 missing
V5203numeric550 unique values
0 missing
V5453numeric534 unique values
0 missing
V5464numeric624 unique values
0 missing
V5469numeric664 unique values
0 missing
V5760numeric551 unique values
0 missing
V5783numeric649 unique values
0 missing
V5811numeric604 unique values
0 missing
V5996numeric601 unique values
0 missing
V6018numeric622 unique values
0 missing
V6037numeric596 unique values
0 missing
V6041numeric540 unique values
0 missing
V6062numeric638 unique values
0 missing
V6110numeric619 unique values
0 missing
V6158numeric248 unique values
0 missing
V6271numeric250 unique values
0 missing
V6383numeric252 unique values
0 missing
V6403numeric628 unique values
0 missing
V6492numeric593 unique values
0 missing
V6658numeric528 unique values
0 missing
V6705numeric602 unique values
0 missing
V6802numeric537 unique values
0 missing
V6907numeric600 unique values
0 missing
V7019numeric628 unique values
0 missing
V7135numeric251 unique values
0 missing

19 properties

2000
Number of instances (rows) of the dataset.
101
Number of attributes (columns) of the dataset.
10
Number of distinct values of the target attribute (if it is nominal).
0
Number of missing values in the dataset.
0
Number of instances with at least one value missing.
100
Number of numeric attributes.
1
Number of nominal attributes.
0
Percentage of missing values.
0.1
Average class difference between consecutive instances.
99.01
Percentage of numeric attributes.
0.05
Number of attributes divided by the number of instances.
0.99
Percentage of nominal attributes.
10.45
Percentage of instances belonging to the most frequent class.
209
Number of instances belonging to the most frequent class.
9.6
Percentage of instances belonging to the least frequent class.
192
Number of instances belonging to the least frequent class.
0
Number of binary attributes.
0
Percentage of binary attributes.
0
Percentage of instances having missing values.

0 tasks

Define a new task