Data
riccardo_seed_3_nrows_2000_nclasses_10_ncols_100_stratify_True

riccardo_seed_3_nrows_2000_nclasses_10_ncols_100_stratify_True

active ARFF Publicly available Visibility: public Uploaded 17-11-2022 by Eddie Bergman
0 likes downloaded by 0 people , 0 total downloads 0 issues 0 downvotes
Issue #Downvotes for this reason By


Loading wiki
Help us complete this description Edit
Subsampling of the dataset riccardo (41161) with seed=3 args.nrows=2000 args.ncols=100 args.nclasses=10 args.no_stratify=True Generated with the following source code: ```python def subsample( self, seed: int, nrows_max: int = 2_000, ncols_max: int = 100, nclasses_max: int = 10, stratified: bool = True, ) -> Dataset: rng = np.random.default_rng(seed) x = self.x y = self.y # Uniformly sample classes = y.unique() if len(classes) > nclasses_max: vcs = y.value_counts() selected_classes = rng.choice( classes, size=nclasses_max, replace=False, p=vcs / sum(vcs), ) # Select the indices where one of these classes is present idxs = y.index[y.isin(classes)] x = x.iloc[idxs] y = y.iloc[idxs] # Uniformly sample columns if required if len(x.columns) > ncols_max: columns_idxs = rng.choice( list(range(len(x.columns))), size=ncols_max, replace=False ) sorted_column_idxs = sorted(columns_idxs) selected_columns = list(x.columns[sorted_column_idxs]) x = x[selected_columns] else: sorted_column_idxs = list(range(len(x.columns))) if len(x) > nrows_max: # Stratify accordingly target_name = y.name data = pd.concat((x, y), axis="columns") _, subset = train_test_split( data, test_size=nrows_max, stratify=data[target_name], shuffle=True, random_state=seed, ) x = subset.drop(target_name, axis="columns") y = subset[target_name] # We need to convert categorical columns to string for openml categorical_mask = [self.categorical_mask[i] for i in sorted_column_idxs] columns = list(x.columns) return Dataset( # Technically this is not the same but it's where it was derived from dataset=self.dataset, x=x, y=y, categorical_mask=categorical_mask, columns=columns, ) ```

101 features

class (target)nominal2 unique values
0 missing
V7numeric699 unique values
0 missing
V21numeric1197 unique values
0 missing
V130numeric545 unique values
0 missing
V138numeric1039 unique values
0 missing
V166numeric1006 unique values
0 missing
V183numeric658 unique values
0 missing
V327numeric931 unique values
0 missing
V360numeric951 unique values
0 missing
V388numeric901 unique values
0 missing
V396numeric788 unique values
0 missing
V440numeric1089 unique values
0 missing
V480numeric970 unique values
0 missing
V591numeric916 unique values
0 missing
V673numeric820 unique values
0 missing
V731numeric920 unique values
0 missing
V754numeric763 unique values
0 missing
V762numeric993 unique values
0 missing
V810numeric936 unique values
0 missing
V884numeric988 unique values
0 missing
V936numeric1021 unique values
0 missing
V965numeric1017 unique values
0 missing
V995numeric581 unique values
0 missing
V1040numeric1137 unique values
0 missing
V1059numeric743 unique values
0 missing
V1081numeric568 unique values
0 missing
V1083numeric687 unique values
0 missing
V1116numeric701 unique values
0 missing
V1203numeric1047 unique values
0 missing
V1240numeric677 unique values
0 missing
V1250numeric1145 unique values
0 missing
V1266numeric383 unique values
0 missing
V1274numeric525 unique values
0 missing
V1333numeric1132 unique values
0 missing
V1354numeric751 unique values
0 missing
V1397numeric815 unique values
0 missing
V1398numeric974 unique values
0 missing
V1594numeric741 unique values
0 missing
V1651numeric1083 unique values
0 missing
V1659numeric886 unique values
0 missing
V1693numeric998 unique values
0 missing
V1774numeric542 unique values
0 missing
V1819numeric127 unique values
0 missing
V1823numeric311 unique values
0 missing
V1837numeric224 unique values
0 missing
V1907numeric684 unique values
0 missing
V2004numeric665 unique values
0 missing
V2017numeric642 unique values
0 missing
V2018numeric630 unique values
0 missing
V2082numeric910 unique values
0 missing
V2181numeric706 unique values
0 missing
V2213numeric875 unique values
0 missing
V2218numeric420 unique values
0 missing
V2448numeric897 unique values
0 missing
V2479numeric885 unique values
0 missing
V2486numeric819 unique values
0 missing
V2548numeric720 unique values
0 missing
V2594numeric867 unique values
0 missing
V2595numeric778 unique values
0 missing
V2615numeric405 unique values
0 missing
V2690numeric654 unique values
0 missing
V2745numeric674 unique values
0 missing
V2753numeric859 unique values
0 missing
V2795numeric665 unique values
0 missing
V2814numeric757 unique values
0 missing
V2815numeric1124 unique values
0 missing
V2816numeric898 unique values
0 missing
V2820numeric807 unique values
0 missing
V2913numeric206 unique values
0 missing
V2924numeric348 unique values
0 missing
V2948numeric978 unique values
0 missing
V3009numeric798 unique values
0 missing
V3087numeric918 unique values
0 missing
V3096numeric1104 unique values
0 missing
V3119numeric479 unique values
0 missing
V3169numeric855 unique values
0 missing
V3199numeric780 unique values
0 missing
V3254numeric1029 unique values
0 missing
V3288numeric675 unique values
0 missing
V3325numeric1071 unique values
0 missing
V3367numeric938 unique values
0 missing
V3406numeric994 unique values
0 missing
V3514numeric916 unique values
0 missing
V3551numeric1088 unique values
0 missing
V3606numeric1206 unique values
0 missing
V3649numeric892 unique values
0 missing
V3654numeric836 unique values
0 missing
V3682numeric739 unique values
0 missing
V3746numeric1105 unique values
0 missing
V3769numeric768 unique values
0 missing
V3787numeric693 unique values
0 missing
V3915numeric570 unique values
0 missing
V3958numeric1030 unique values
0 missing
V3972numeric1143 unique values
0 missing
V3978numeric1027 unique values
0 missing
V4004numeric1090 unique values
0 missing
V4044numeric1056 unique values
0 missing
V4103numeric1784 unique values
0 missing
V4128numeric1799 unique values
0 missing
V4280numeric1786 unique values
0 missing
V4291numeric1783 unique values
0 missing

19 properties

2000
Number of instances (rows) of the dataset.
101
Number of attributes (columns) of the dataset.
2
Number of distinct values of the target attribute (if it is nominal).
0
Number of missing values in the dataset.
0
Number of instances with at least one value missing.
100
Number of numeric attributes.
1
Number of nominal attributes.
0.99
Percentage of binary attributes.
0
Percentage of instances having missing values.
0
Percentage of missing values.
0.63
Average class difference between consecutive instances.
99.01
Percentage of numeric attributes.
0.05
Number of attributes divided by the number of instances.
0.99
Percentage of nominal attributes.
75
Percentage of instances belonging to the most frequent class.
1500
Number of instances belonging to the most frequent class.
25
Percentage of instances belonging to the least frequent class.
500
Number of instances belonging to the least frequent class.
1
Number of binary attributes.

0 tasks

Define a new task