Data
riccardo_seed_0_nrows_2000_nclasses_10_ncols_100_stratify_True

riccardo_seed_0_nrows_2000_nclasses_10_ncols_100_stratify_True

active ARFF Publicly available Visibility: public Uploaded 17-11-2022 by Eddie Bergman
0 likes downloaded by 0 people , 0 total downloads 0 issues 0 downvotes
Issue #Downvotes for this reason By


Loading wiki
Help us complete this description Edit
Subsampling of the dataset riccardo (41161) with seed=0 args.nrows=2000 args.ncols=100 args.nclasses=10 args.no_stratify=True Generated with the following source code: ```python def subsample( self, seed: int, nrows_max: int = 2_000, ncols_max: int = 100, nclasses_max: int = 10, stratified: bool = True, ) -> Dataset: rng = np.random.default_rng(seed) x = self.x y = self.y # Uniformly sample classes = y.unique() if len(classes) > nclasses_max: vcs = y.value_counts() selected_classes = rng.choice( classes, size=nclasses_max, replace=False, p=vcs / sum(vcs), ) # Select the indices where one of these classes is present idxs = y.index[y.isin(classes)] x = x.iloc[idxs] y = y.iloc[idxs] # Uniformly sample columns if required if len(x.columns) > ncols_max: columns_idxs = rng.choice( list(range(len(x.columns))), size=ncols_max, replace=False ) sorted_column_idxs = sorted(columns_idxs) selected_columns = list(x.columns[sorted_column_idxs]) x = x[selected_columns] else: sorted_column_idxs = list(range(len(x.columns))) if len(x) > nrows_max: # Stratify accordingly target_name = y.name data = pd.concat((x, y), axis="columns") _, subset = train_test_split( data, test_size=nrows_max, stratify=data[target_name], shuffle=True, random_state=seed, ) x = subset.drop(target_name, axis="columns") y = subset[target_name] # We need to convert categorical columns to string for openml categorical_mask = [self.categorical_mask[i] for i in sorted_column_idxs] columns = list(x.columns) return Dataset( # Technically this is not the same but it's where it was derived from dataset=self.dataset, x=x, y=y, categorical_mask=categorical_mask, columns=columns, ) ```

101 features

class (target)nominal2 unique values
0 missing
V12numeric751 unique values
0 missing
V23numeric1062 unique values
0 missing
V36numeric830 unique values
0 missing
V70numeric437 unique values
0 missing
V94numeric719 unique values
0 missing
V121numeric897 unique values
0 missing
V142numeric893 unique values
0 missing
V173numeric982 unique values
0 missing
V209numeric806 unique values
0 missing
V312numeric844 unique values
0 missing
V317numeric950 unique values
0 missing
V341numeric695 unique values
0 missing
V361numeric822 unique values
0 missing
V378numeric607 unique values
0 missing
V527numeric965 unique values
0 missing
V576numeric769 unique values
0 missing
V737numeric899 unique values
0 missing
V743numeric1001 unique values
0 missing
V975numeric1087 unique values
0 missing
V1090numeric966 unique values
0 missing
V1093numeric890 unique values
0 missing
V1132numeric872 unique values
0 missing
V1134numeric694 unique values
0 missing
V1170numeric1159 unique values
0 missing
V1269numeric548 unique values
0 missing
V1294numeric1032 unique values
0 missing
V1325numeric202 unique values
0 missing
V1378numeric863 unique values
0 missing
V1407numeric645 unique values
0 missing
V1448numeric793 unique values
0 missing
V1530numeric858 unique values
0 missing
V1602numeric969 unique values
0 missing
V1615numeric998 unique values
0 missing
V1620numeric1191 unique values
0 missing
V1630numeric502 unique values
0 missing
V1657numeric917 unique values
0 missing
V1664numeric878 unique values
0 missing
V1679numeric675 unique values
0 missing
V1709numeric733 unique values
0 missing
V1791numeric915 unique values
0 missing
V1806numeric1112 unique values
0 missing
V1959numeric887 unique values
0 missing
V2038numeric922 unique values
0 missing
V2075numeric555 unique values
0 missing
V2120numeric910 unique values
0 missing
V2147numeric681 unique values
0 missing
V2161numeric771 unique values
0 missing
V2231numeric1023 unique values
0 missing
V2242numeric805 unique values
0 missing
V2272numeric1119 unique values
0 missing
V2291numeric875 unique values
0 missing
V2292numeric920 unique values
0 missing
V2341numeric784 unique values
0 missing
V2361numeric1047 unique values
0 missing
V2446numeric989 unique values
0 missing
V2468numeric967 unique values
0 missing
V2545numeric1081 unique values
0 missing
V2554numeric609 unique values
0 missing
V2613numeric882 unique values
0 missing
V2664numeric1041 unique values
0 missing
V2674numeric632 unique values
0 missing
V2675numeric696 unique values
0 missing
V2733numeric802 unique values
0 missing
V2747numeric878 unique values
0 missing
V2769numeric1183 unique values
0 missing
V2831numeric931 unique values
0 missing
V2845numeric965 unique values
0 missing
V2878numeric1052 unique values
0 missing
V2917numeric728 unique values
0 missing
V2932numeric857 unique values
0 missing
V2999numeric793 unique values
0 missing
V3065numeric987 unique values
0 missing
V3071numeric49 unique values
0 missing
V3073numeric523 unique values
0 missing
V3077numeric862 unique values
0 missing
V3081numeric952 unique values
0 missing
V3084numeric656 unique values
0 missing
V3232numeric343 unique values
0 missing
V3245numeric1013 unique values
0 missing
V3260numeric1045 unique values
0 missing
V3421numeric884 unique values
0 missing
V3423numeric441 unique values
0 missing
V3442numeric1049 unique values
0 missing
V3571numeric931 unique values
0 missing
V3578numeric868 unique values
0 missing
V3579numeric1177 unique values
0 missing
V3606numeric1239 unique values
0 missing
V3620numeric773 unique values
0 missing
V3652numeric791 unique values
0 missing
V3730numeric1048 unique values
0 missing
V3800numeric950 unique values
0 missing
V3818numeric950 unique values
0 missing
V3841numeric1005 unique values
0 missing
V3943numeric773 unique values
0 missing
V3993numeric736 unique values
0 missing
V4043numeric734 unique values
0 missing
V4088numeric763 unique values
0 missing
V4171numeric1765 unique values
0 missing
V4239numeric1763 unique values
0 missing
V4296numeric1752 unique values
0 missing

19 properties

2000
Number of instances (rows) of the dataset.
101
Number of attributes (columns) of the dataset.
2
Number of distinct values of the target attribute (if it is nominal).
0
Number of missing values in the dataset.
0
Number of instances with at least one value missing.
100
Number of numeric attributes.
1
Number of nominal attributes.
75
Percentage of instances belonging to the most frequent class.
0.99
Percentage of nominal attributes.
1500
Number of instances belonging to the most frequent class.
25
Percentage of instances belonging to the least frequent class.
500
Number of instances belonging to the least frequent class.
1
Number of binary attributes.
0.99
Percentage of binary attributes.
0
Percentage of instances having missing values.
0.62
Average class difference between consecutive instances.
0
Percentage of missing values.
0.05
Number of attributes divided by the number of instances.
99.01
Percentage of numeric attributes.

0 tasks

Define a new task