Data
robert_seed_4_nrows_2000_nclasses_10_ncols_100_stratify_True

robert_seed_4_nrows_2000_nclasses_10_ncols_100_stratify_True

active ARFF Publicly available Visibility: public Uploaded 17-11-2022 by Eddie Bergman
0 likes downloaded by 0 people , 0 total downloads 0 issues 0 downvotes
Issue #Downvotes for this reason By


Loading wiki
Help us complete this description Edit
Subsampling of the dataset robert (41165) with seed=4 args.nrows=2000 args.ncols=100 args.nclasses=10 args.no_stratify=True Generated with the following source code: ```python def subsample( self, seed: int, nrows_max: int = 2_000, ncols_max: int = 100, nclasses_max: int = 10, stratified: bool = True, ) -> Dataset: rng = np.random.default_rng(seed) x = self.x y = self.y # Uniformly sample classes = y.unique() if len(classes) > nclasses_max: vcs = y.value_counts() selected_classes = rng.choice( classes, size=nclasses_max, replace=False, p=vcs / sum(vcs), ) # Select the indices where one of these classes is present idxs = y.index[y.isin(classes)] x = x.iloc[idxs] y = y.iloc[idxs] # Uniformly sample columns if required if len(x.columns) > ncols_max: columns_idxs = rng.choice( list(range(len(x.columns))), size=ncols_max, replace=False ) sorted_column_idxs = sorted(columns_idxs) selected_columns = list(x.columns[sorted_column_idxs]) x = x[selected_columns] else: sorted_column_idxs = list(range(len(x.columns))) if len(x) > nrows_max: # Stratify accordingly target_name = y.name data = pd.concat((x, y), axis="columns") _, subset = train_test_split( data, test_size=nrows_max, stratify=data[target_name], shuffle=True, random_state=seed, ) x = subset.drop(target_name, axis="columns") y = subset[target_name] # We need to convert categorical columns to string for openml categorical_mask = [self.categorical_mask[i] for i in sorted_column_idxs] columns = list(x.columns) return Dataset( # Technically this is not the same but it's where it was derived from dataset=self.dataset, x=x, y=y, categorical_mask=categorical_mask, columns=columns, ) ```

101 features

class (target)nominal10 unique values
0 missing
V193numeric253 unique values
0 missing
V250numeric540 unique values
0 missing
V432numeric621 unique values
0 missing
V568numeric545 unique values
0 missing
V575numeric619 unique values
0 missing
V645numeric254 unique values
0 missing
V828numeric557 unique values
0 missing
V920numeric250 unique values
0 missing
V954numeric618 unique values
0 missing
V994numeric553 unique values
0 missing
V1181numeric530 unique values
0 missing
V1228numeric669 unique values
0 missing
V1242numeric658 unique values
0 missing
V1269numeric533 unique values
0 missing
V1291numeric587 unique values
0 missing
V1372numeric540 unique values
0 missing
V1438numeric253 unique values
0 missing
V1485numeric684 unique values
0 missing
V1558numeric636 unique values
0 missing
V1564numeric701 unique values
0 missing
V1593numeric654 unique values
0 missing
V1603numeric249 unique values
0 missing
V1921numeric703 unique values
0 missing
V2014numeric667 unique values
0 missing
V2171numeric621 unique values
0 missing
V2409numeric530 unique values
0 missing
V2505numeric539 unique values
0 missing
V2602numeric654 unique values
0 missing
V2606numeric558 unique values
0 missing
V2637numeric581 unique values
0 missing
V2640numeric468 unique values
0 missing
V2678numeric254 unique values
0 missing
V2753numeric677 unique values
0 missing
V2917numeric623 unique values
0 missing
V3068numeric633 unique values
0 missing
V3225numeric705 unique values
0 missing
V3306numeric742 unique values
0 missing
V3313numeric514 unique values
0 missing
V3400numeric251 unique values
0 missing
V3418numeric536 unique values
0 missing
V3444numeric553 unique values
0 missing
V3528numeric544 unique values
0 missing
V3531numeric712 unique values
0 missing
V3560numeric598 unique values
0 missing
V3565numeric256 unique values
0 missing
V3574numeric677 unique values
0 missing
V3579numeric465 unique values
0 missing
V3633numeric576 unique values
0 missing
V3727numeric555 unique values
0 missing
V3740numeric253 unique values
0 missing
V3830numeric528 unique values
0 missing
V3867numeric665 unique values
0 missing
V3873numeric470 unique values
0 missing
V4092numeric652 unique values
0 missing
V4126numeric528 unique values
0 missing
V4134numeric651 unique values
0 missing
V4172numeric565 unique values
0 missing
V4207numeric552 unique values
0 missing
V4319numeric651 unique values
0 missing
V4348numeric545 unique values
0 missing
V4460numeric451 unique values
0 missing
V4594numeric590 unique values
0 missing
V4754numeric671 unique values
0 missing
V4757numeric615 unique values
0 missing
V4811numeric558 unique values
0 missing
V4850numeric575 unique values
0 missing
V4974numeric634 unique values
0 missing
V5035numeric253 unique values
0 missing
V5159numeric730 unique values
0 missing
V5227numeric673 unique values
0 missing
V5537numeric663 unique values
0 missing
V5624numeric639 unique values
0 missing
V5705numeric565 unique values
0 missing
V5788numeric663 unique values
0 missing
V5816numeric612 unique values
0 missing
V6120numeric664 unique values
0 missing
V6205numeric683 unique values
0 missing
V6261numeric542 unique values
0 missing
V6326numeric519 unique values
0 missing
V6365numeric606 unique values
0 missing
V6396numeric532 unique values
0 missing
V6426numeric607 unique values
0 missing
V6442numeric596 unique values
0 missing
V6630numeric689 unique values
0 missing
V6631numeric648 unique values
0 missing
V6669numeric629 unique values
0 missing
V6682numeric660 unique values
0 missing
V6698numeric563 unique values
0 missing
V6730numeric595 unique values
0 missing
V6736numeric574 unique values
0 missing
V6860numeric582 unique values
0 missing
V6876numeric653 unique values
0 missing
V6896numeric523 unique values
0 missing
V6913numeric608 unique values
0 missing
V6927numeric552 unique values
0 missing
V6938numeric523 unique values
0 missing
V7003numeric630 unique values
0 missing
V7018numeric694 unique values
0 missing
V7146numeric254 unique values
0 missing
V7164numeric627 unique values
0 missing

19 properties

2000
Number of instances (rows) of the dataset.
101
Number of attributes (columns) of the dataset.
10
Number of distinct values of the target attribute (if it is nominal).
0
Number of missing values in the dataset.
0
Number of instances with at least one value missing.
100
Number of numeric attributes.
1
Number of nominal attributes.
0
Percentage of binary attributes.
0
Percentage of instances having missing values.
0.09
Average class difference between consecutive instances.
0
Percentage of missing values.
0.05
Number of attributes divided by the number of instances.
99.01
Percentage of numeric attributes.
10.4
Percentage of instances belonging to the most frequent class.
0.99
Percentage of nominal attributes.
208
Number of instances belonging to the most frequent class.
9.6
Percentage of instances belonging to the least frequent class.
192
Number of instances belonging to the least frequent class.
0
Number of binary attributes.

0 tasks

Define a new task