Data
riccardo_seed_4_nrows_2000_nclasses_10_ncols_100_stratify_True

riccardo_seed_4_nrows_2000_nclasses_10_ncols_100_stratify_True

active ARFF Publicly available Visibility: public Uploaded 17-11-2022 by Eddie Bergman
0 likes downloaded by 0 people , 0 total downloads 0 issues 0 downvotes
Issue #Downvotes for this reason By


Loading wiki
Help us complete this description Edit
Subsampling of the dataset riccardo (41161) with seed=4 args.nrows=2000 args.ncols=100 args.nclasses=10 args.no_stratify=True Generated with the following source code: ```python def subsample( self, seed: int, nrows_max: int = 2_000, ncols_max: int = 100, nclasses_max: int = 10, stratified: bool = True, ) -> Dataset: rng = np.random.default_rng(seed) x = self.x y = self.y # Uniformly sample classes = y.unique() if len(classes) > nclasses_max: vcs = y.value_counts() selected_classes = rng.choice( classes, size=nclasses_max, replace=False, p=vcs / sum(vcs), ) # Select the indices where one of these classes is present idxs = y.index[y.isin(classes)] x = x.iloc[idxs] y = y.iloc[idxs] # Uniformly sample columns if required if len(x.columns) > ncols_max: columns_idxs = rng.choice( list(range(len(x.columns))), size=ncols_max, replace=False ) sorted_column_idxs = sorted(columns_idxs) selected_columns = list(x.columns[sorted_column_idxs]) x = x[selected_columns] else: sorted_column_idxs = list(range(len(x.columns))) if len(x) > nrows_max: # Stratify accordingly target_name = y.name data = pd.concat((x, y), axis="columns") _, subset = train_test_split( data, test_size=nrows_max, stratify=data[target_name], shuffle=True, random_state=seed, ) x = subset.drop(target_name, axis="columns") y = subset[target_name] # We need to convert categorical columns to string for openml categorical_mask = [self.categorical_mask[i] for i in sorted_column_idxs] columns = list(x.columns) return Dataset( # Technically this is not the same but it's where it was derived from dataset=self.dataset, x=x, y=y, categorical_mask=categorical_mask, columns=columns, ) ```

101 features

class (target)nominal2 unique values
0 missing
V115numeric836 unique values
0 missing
V150numeric689 unique values
0 missing
V256numeric1024 unique values
0 missing
V338numeric761 unique values
0 missing
V340numeric759 unique values
0 missing
V383numeric793 unique values
0 missing
V493numeric743 unique values
0 missing
V549numeric518 unique values
0 missing
V567numeric894 unique values
0 missing
V589numeric856 unique values
0 missing
V702numeric729 unique values
0 missing
V733numeric990 unique values
0 missing
V736numeric1022 unique values
0 missing
V753numeric1031 unique values
0 missing
V770numeric936 unique values
0 missing
V817numeric536 unique values
0 missing
V855numeric978 unique values
0 missing
V882numeric537 unique values
0 missing
V926numeric925 unique values
0 missing
V928numeric926 unique values
0 missing
V950numeric649 unique values
0 missing
V953numeric906 unique values
0 missing
V1143numeric1028 unique values
0 missing
V1192numeric960 unique values
0 missing
V1288numeric751 unique values
0 missing
V1427numeric719 unique values
0 missing
V1489numeric911 unique values
0 missing
V1548numeric671 unique values
0 missing
V1551numeric1087 unique values
0 missing
V1564numeric899 unique values
0 missing
V1572numeric192 unique values
0 missing
V1585numeric747 unique values
0 missing
V1643numeric906 unique values
0 missing
V1731numeric796 unique values
0 missing
V1818numeric797 unique values
0 missing
V1908numeric430 unique values
0 missing
V1966numeric418 unique values
0 missing
V1971numeric1020 unique values
0 missing
V2014numeric728 unique values
0 missing
V2035numeric1100 unique values
0 missing
V2053numeric303 unique values
0 missing
V2097numeric835 unique values
0 missing
V2100numeric929 unique values
0 missing
V2114numeric933 unique values
0 missing
V2117numeric760 unique values
0 missing
V2126numeric938 unique values
0 missing
V2127numeric408 unique values
0 missing
V2148numeric960 unique values
0 missing
V2213numeric872 unique values
0 missing
V2224numeric1009 unique values
0 missing
V2278numeric996 unique values
0 missing
V2294numeric800 unique values
0 missing
V2304numeric895 unique values
0 missing
V2427numeric819 unique values
0 missing
V2447numeric925 unique values
0 missing
V2453numeric1097 unique values
0 missing
V2482numeric781 unique values
0 missing
V2495numeric1135 unique values
0 missing
V2555numeric983 unique values
0 missing
V2580numeric882 unique values
0 missing
V2639numeric769 unique values
0 missing
V2732numeric917 unique values
0 missing
V2824numeric1 unique values
0 missing
V2833numeric1178 unique values
0 missing
V2848numeric925 unique values
0 missing
V2888numeric1076 unique values
0 missing
V2963numeric781 unique values
0 missing
V2988numeric823 unique values
0 missing
V3049numeric1083 unique values
0 missing
V3099numeric723 unique values
0 missing
V3299numeric899 unique values
0 missing
V3333numeric1 unique values
0 missing
V3377numeric745 unique values
0 missing
V3452numeric1067 unique values
0 missing
V3464numeric808 unique values
0 missing
V3636numeric921 unique values
0 missing
V3674numeric693 unique values
0 missing
V3701numeric1089 unique values
0 missing
V3760numeric627 unique values
0 missing
V3789numeric741 unique values
0 missing
V3797numeric984 unique values
0 missing
V3806numeric702 unique values
0 missing
V3831numeric643 unique values
0 missing
V3931numeric907 unique values
0 missing
V3932numeric888 unique values
0 missing
V3951numeric874 unique values
0 missing
V3959numeric1163 unique values
0 missing
V3969numeric911 unique values
0 missing
V3998numeric401 unique values
0 missing
V4003numeric645 unique values
0 missing
V4075numeric1002 unique values
0 missing
V4076numeric892 unique values
0 missing
V4079numeric1080 unique values
0 missing
V4099numeric1762 unique values
0 missing
V4103numeric1771 unique values
0 missing
V4132numeric1783 unique values
0 missing
V4160numeric1774 unique values
0 missing
V4175numeric1775 unique values
0 missing
V4253numeric1779 unique values
0 missing
V4272numeric1767 unique values
0 missing

19 properties

2000
Number of instances (rows) of the dataset.
101
Number of attributes (columns) of the dataset.
2
Number of distinct values of the target attribute (if it is nominal).
0
Number of missing values in the dataset.
0
Number of instances with at least one value missing.
100
Number of numeric attributes.
1
Number of nominal attributes.
0.99
Percentage of binary attributes.
0
Percentage of instances having missing values.
0
Percentage of missing values.
0.62
Average class difference between consecutive instances.
99.01
Percentage of numeric attributes.
0.05
Number of attributes divided by the number of instances.
0.99
Percentage of nominal attributes.
75
Percentage of instances belonging to the most frequent class.
1500
Number of instances belonging to the most frequent class.
25
Percentage of instances belonging to the least frequent class.
500
Number of instances belonging to the least frequent class.
1
Number of binary attributes.

0 tasks

Define a new task