Data
riccardo_seed_2_nrows_2000_nclasses_10_ncols_100_stratify_True

riccardo_seed_2_nrows_2000_nclasses_10_ncols_100_stratify_True

active ARFF Publicly available Visibility: public Uploaded 17-11-2022 by Eddie Bergman
0 likes downloaded by 0 people , 0 total downloads 0 issues 0 downvotes
Issue #Downvotes for this reason By


Loading wiki
Help us complete this description Edit
Subsampling of the dataset riccardo (41161) with seed=2 args.nrows=2000 args.ncols=100 args.nclasses=10 args.no_stratify=True Generated with the following source code: ```python def subsample( self, seed: int, nrows_max: int = 2_000, ncols_max: int = 100, nclasses_max: int = 10, stratified: bool = True, ) -> Dataset: rng = np.random.default_rng(seed) x = self.x y = self.y # Uniformly sample classes = y.unique() if len(classes) > nclasses_max: vcs = y.value_counts() selected_classes = rng.choice( classes, size=nclasses_max, replace=False, p=vcs / sum(vcs), ) # Select the indices where one of these classes is present idxs = y.index[y.isin(classes)] x = x.iloc[idxs] y = y.iloc[idxs] # Uniformly sample columns if required if len(x.columns) > ncols_max: columns_idxs = rng.choice( list(range(len(x.columns))), size=ncols_max, replace=False ) sorted_column_idxs = sorted(columns_idxs) selected_columns = list(x.columns[sorted_column_idxs]) x = x[selected_columns] else: sorted_column_idxs = list(range(len(x.columns))) if len(x) > nrows_max: # Stratify accordingly target_name = y.name data = pd.concat((x, y), axis="columns") _, subset = train_test_split( data, test_size=nrows_max, stratify=data[target_name], shuffle=True, random_state=seed, ) x = subset.drop(target_name, axis="columns") y = subset[target_name] # We need to convert categorical columns to string for openml categorical_mask = [self.categorical_mask[i] for i in sorted_column_idxs] columns = list(x.columns) return Dataset( # Technically this is not the same but it's where it was derived from dataset=self.dataset, x=x, y=y, categorical_mask=categorical_mask, columns=columns, ) ```

101 features

class (target)nominal2 unique values
0 missing
V166numeric1033 unique values
0 missing
V169numeric1143 unique values
0 missing
V233numeric699 unique values
0 missing
V245numeric905 unique values
0 missing
V328numeric962 unique values
0 missing
V387numeric779 unique values
0 missing
V415numeric359 unique values
0 missing
V433numeric799 unique values
0 missing
V445numeric938 unique values
0 missing
V457numeric379 unique values
0 missing
V459numeric901 unique values
0 missing
V634numeric864 unique values
0 missing
V792numeric839 unique values
0 missing
V794numeric557 unique values
0 missing
V849numeric940 unique values
0 missing
V860numeric824 unique values
0 missing
V911numeric801 unique values
0 missing
V929numeric955 unique values
0 missing
V951numeric1038 unique values
0 missing
V1098numeric795 unique values
0 missing
V1099numeric977 unique values
0 missing
V1111numeric517 unique values
0 missing
V1159numeric801 unique values
0 missing
V1254numeric965 unique values
0 missing
V1290numeric1029 unique values
0 missing
V1351numeric773 unique values
0 missing
V1409numeric1003 unique values
0 missing
V1411numeric911 unique values
0 missing
V1454numeric1032 unique values
0 missing
V1467numeric989 unique values
0 missing
V1610numeric911 unique values
0 missing
V1659numeric918 unique values
0 missing
V1736numeric1010 unique values
0 missing
V1739numeric985 unique values
0 missing
V1787numeric716 unique values
0 missing
V1827numeric325 unique values
0 missing
V1874numeric548 unique values
0 missing
V1885numeric823 unique values
0 missing
V1897numeric654 unique values
0 missing
V1904numeric1073 unique values
0 missing
V1913numeric385 unique values
0 missing
V1937numeric483 unique values
0 missing
V2002numeric698 unique values
0 missing
V2016numeric445 unique values
0 missing
V2030numeric1105 unique values
0 missing
V2080numeric915 unique values
0 missing
V2129numeric836 unique values
0 missing
V2134numeric1087 unique values
0 missing
V2153numeric939 unique values
0 missing
V2167numeric736 unique values
0 missing
V2206numeric827 unique values
0 missing
V2226numeric923 unique values
0 missing
V2243numeric989 unique values
0 missing
V2352numeric845 unique values
0 missing
V2372numeric489 unique values
0 missing
V2421numeric808 unique values
0 missing
V2458numeric970 unique values
0 missing
V2525numeric830 unique values
0 missing
V2534numeric954 unique values
0 missing
V2535numeric443 unique values
0 missing
V2536numeric1181 unique values
0 missing
V2627numeric703 unique values
0 missing
V2678numeric896 unique values
0 missing
V2714numeric791 unique values
0 missing
V2750numeric977 unique values
0 missing
V2772numeric979 unique values
0 missing
V2828numeric886 unique values
0 missing
V2867numeric903 unique values
0 missing
V2891numeric4 unique values
0 missing
V2898numeric886 unique values
0 missing
V2937numeric817 unique values
0 missing
V2950numeric724 unique values
0 missing
V2969numeric829 unique values
0 missing
V2979numeric466 unique values
0 missing
V3066numeric1024 unique values
0 missing
V3163numeric919 unique values
0 missing
V3292numeric575 unique values
0 missing
V3299numeric925 unique values
0 missing
V3352numeric767 unique values
0 missing
V3422numeric1178 unique values
0 missing
V3516numeric718 unique values
0 missing
V3552numeric702 unique values
0 missing
V3622numeric1136 unique values
0 missing
V3673numeric553 unique values
0 missing
V3685numeric938 unique values
0 missing
V3707numeric532 unique values
0 missing
V3722numeric495 unique values
0 missing
V3768numeric797 unique values
0 missing
V3781numeric1046 unique values
0 missing
V3818numeric932 unique values
0 missing
V3854numeric603 unique values
0 missing
V3927numeric1081 unique values
0 missing
V3954numeric1056 unique values
0 missing
V3994numeric996 unique values
0 missing
V4078numeric960 unique values
0 missing
V4093numeric961 unique values
0 missing
V4146numeric1766 unique values
0 missing
V4179numeric1779 unique values
0 missing
V4207numeric1760 unique values
0 missing
V4292numeric1776 unique values
0 missing

19 properties

2000
Number of instances (rows) of the dataset.
101
Number of attributes (columns) of the dataset.
2
Number of distinct values of the target attribute (if it is nominal).
0
Number of missing values in the dataset.
0
Number of instances with at least one value missing.
100
Number of numeric attributes.
1
Number of nominal attributes.
500
Number of instances belonging to the least frequent class.
1
Number of binary attributes.
0.99
Percentage of binary attributes.
0
Percentage of instances having missing values.
0.62
Average class difference between consecutive instances.
0
Percentage of missing values.
0.05
Number of attributes divided by the number of instances.
99.01
Percentage of numeric attributes.
75
Percentage of instances belonging to the most frequent class.
0.99
Percentage of nominal attributes.
1500
Number of instances belonging to the most frequent class.
25
Percentage of instances belonging to the least frequent class.

0 tasks

Define a new task