OpenML
robert_seed_2_nrows_2000_nclasses_10_ncols_100_stratify_True

robert_seed_2_nrows_2000_nclasses_10_ncols_100_stratify_True

active ARFF Publicly available Visibility: public Uploaded 17-11-2022 by Eddie Bergman
0 likes downloaded by 0 people , 0 total downloads 0 issues 0 downvotes
Issue #Downvotes for this reason By


Loading wiki
Help us complete this description Edit
Subsampling of the dataset robert (41165) with seed=2 args.nrows=2000 args.ncols=100 args.nclasses=10 args.no_stratify=True Generated with the following source code: ```python def subsample( self, seed: int, nrows_max: int = 2_000, ncols_max: int = 100, nclasses_max: int = 10, stratified: bool = True, ) -> Dataset: rng = np.random.default_rng(seed) x = self.x y = self.y # Uniformly sample classes = y.unique() if len(classes) > nclasses_max: vcs = y.value_counts() selected_classes = rng.choice( classes, size=nclasses_max, replace=False, p=vcs / sum(vcs), ) # Select the indices where one of these classes is present idxs = y.index[y.isin(classes)] x = x.iloc[idxs] y = y.iloc[idxs] # Uniformly sample columns if required if len(x.columns) > ncols_max: columns_idxs = rng.choice( list(range(len(x.columns))), size=ncols_max, replace=False ) sorted_column_idxs = sorted(columns_idxs) selected_columns = list(x.columns[sorted_column_idxs]) x = x[selected_columns] else: sorted_column_idxs = list(range(len(x.columns))) if len(x) > nrows_max: # Stratify accordingly target_name = y.name data = pd.concat((x, y), axis="columns") _, subset = train_test_split( data, test_size=nrows_max, stratify=data[target_name], shuffle=True, random_state=seed, ) x = subset.drop(target_name, axis="columns") y = subset[target_name] # We need to convert categorical columns to string for openml categorical_mask = [self.categorical_mask[i] for i in sorted_column_idxs] columns = list(x.columns) return Dataset( # Technically this is not the same but it's where it was derived from dataset=self.dataset, x=x, y=y, categorical_mask=categorical_mask, columns=columns, ) ```

101 features

class (target)nominal10 unique values
0 missing
V278numeric633 unique values
0 missing
V284numeric552 unique values
0 missing
V393numeric699 unique values
0 missing
V410numeric702 unique values
0 missing
V550numeric585 unique values
0 missing
V654numeric604 unique values
0 missing
V696numeric556 unique values
0 missing
V725numeric563 unique values
0 missing
V749numeric542 unique values
0 missing
V768numeric506 unique values
0 missing
V777numeric479 unique values
0 missing
V1070numeric553 unique values
0 missing
V1337numeric563 unique values
0 missing
V1434numeric583 unique values
0 missing
V1447numeric546 unique values
0 missing
V1530numeric628 unique values
0 missing
V1556numeric561 unique values
0 missing
V1567numeric253 unique values
0 missing
V1601numeric623 unique values
0 missing
V1854numeric643 unique values
0 missing
V1858numeric510 unique values
0 missing
V1863numeric523 unique values
0 missing
V1958numeric743 unique values
0 missing
V2121numeric588 unique values
0 missing
V2177numeric578 unique values
0 missing
V2275numeric632 unique values
0 missing
V2378numeric611 unique values
0 missing
V2381numeric525 unique values
0 missing
V2439numeric615 unique values
0 missing
V2471numeric588 unique values
0 missing
V2714numeric571 unique values
0 missing
V2796numeric536 unique values
0 missing
V2916numeric634 unique values
0 missing
V2941numeric566 unique values
0 missing
V3015numeric661 unique values
0 missing
V3083numeric644 unique values
0 missing
V3147numeric617 unique values
0 missing
V3164numeric622 unique values
0 missing
V3194numeric549 unique values
0 missing
V3208numeric621 unique values
0 missing
V3217numeric607 unique values
0 missing
V3255numeric558 unique values
0 missing
V3369numeric551 unique values
0 missing
V3386numeric611 unique values
0 missing
V3415numeric543 unique values
0 missing
V3492numeric625 unique values
0 missing
V3579numeric482 unique values
0 missing
V3581numeric587 unique values
0 missing
V3608numeric504 unique values
0 missing
V3652numeric599 unique values
0 missing
V3706numeric710 unique values
0 missing
V3742numeric608 unique values
0 missing
V3762numeric543 unique values
0 missing
V3973numeric622 unique values
0 missing
V4005numeric613 unique values
0 missing
V4075numeric512 unique values
0 missing
V4142numeric553 unique values
0 missing
V4258numeric602 unique values
0 missing
V4259numeric537 unique values
0 missing
V4267numeric253 unique values
0 missing
V4409numeric580 unique values
0 missing
V4516numeric254 unique values
0 missing
V4552numeric548 unique values
0 missing
V4621numeric689 unique values
0 missing
V4681numeric533 unique values
0 missing
V4771numeric454 unique values
0 missing
V4838numeric553 unique values
0 missing
V4872numeric610 unique values
0 missing
V4875numeric491 unique values
0 missing
V4948numeric613 unique values
0 missing
V4965numeric619 unique values
0 missing
V4980numeric575 unique values
0 missing
V5013numeric650 unique values
0 missing
V5182numeric616 unique values
0 missing
V5338numeric548 unique values
0 missing
V5533numeric730 unique values
0 missing
V5544numeric254 unique values
0 missing
V5632numeric255 unique values
0 missing
V5783numeric678 unique values
0 missing
V5786numeric530 unique values
0 missing
V5948numeric251 unique values
0 missing
V5960numeric620 unique values
0 missing
V6088numeric510 unique values
0 missing
V6189numeric558 unique values
0 missing
V6193numeric599 unique values
0 missing
V6263numeric462 unique values
0 missing
V6269numeric631 unique values
0 missing
V6337numeric597 unique values
0 missing
V6369numeric494 unique values
0 missing
V6409numeric605 unique values
0 missing
V6490numeric545 unique values
0 missing
V6610numeric597 unique values
0 missing
V6670numeric602 unique values
0 missing
V6739numeric614 unique values
0 missing
V6859numeric614 unique values
0 missing
V6902numeric527 unique values
0 missing
V6982numeric555 unique values
0 missing
V7063numeric539 unique values
0 missing
V7140numeric580 unique values
0 missing
V7176numeric552 unique values
0 missing

19 properties

2000
Number of instances (rows) of the dataset.
101
Number of attributes (columns) of the dataset.
10
Number of distinct values of the target attribute (if it is nominal).
0
Number of missing values in the dataset.
0
Number of instances with at least one value missing.
100
Number of numeric attributes.
1
Number of nominal attributes.
192
Number of instances belonging to the least frequent class.
0
Number of binary attributes.
0
Percentage of binary attributes.
0
Percentage of instances having missing values.
0.1
Average class difference between consecutive instances.
0
Percentage of missing values.
0.05
Number of attributes divided by the number of instances.
99.01
Percentage of numeric attributes.
10.45
Percentage of instances belonging to the most frequent class.
0.99
Percentage of nominal attributes.
209
Number of instances belonging to the most frequent class.
9.6
Percentage of instances belonging to the least frequent class.

0 tasks

Define a new task