Data
riccardo_seed_1_nrows_2000_nclasses_10_ncols_100_stratify_True

riccardo_seed_1_nrows_2000_nclasses_10_ncols_100_stratify_True

active ARFF Publicly available Visibility: public Uploaded 17-11-2022 by Eddie Bergman
0 likes downloaded by 0 people , 0 total downloads 0 issues 0 downvotes
Issue #Downvotes for this reason By


Loading wiki
Help us complete this description Edit
Subsampling of the dataset riccardo (41161) with seed=1 args.nrows=2000 args.ncols=100 args.nclasses=10 args.no_stratify=True Generated with the following source code: ```python def subsample( self, seed: int, nrows_max: int = 2_000, ncols_max: int = 100, nclasses_max: int = 10, stratified: bool = True, ) -> Dataset: rng = np.random.default_rng(seed) x = self.x y = self.y # Uniformly sample classes = y.unique() if len(classes) > nclasses_max: vcs = y.value_counts() selected_classes = rng.choice( classes, size=nclasses_max, replace=False, p=vcs / sum(vcs), ) # Select the indices where one of these classes is present idxs = y.index[y.isin(classes)] x = x.iloc[idxs] y = y.iloc[idxs] # Uniformly sample columns if required if len(x.columns) > ncols_max: columns_idxs = rng.choice( list(range(len(x.columns))), size=ncols_max, replace=False ) sorted_column_idxs = sorted(columns_idxs) selected_columns = list(x.columns[sorted_column_idxs]) x = x[selected_columns] else: sorted_column_idxs = list(range(len(x.columns))) if len(x) > nrows_max: # Stratify accordingly target_name = y.name data = pd.concat((x, y), axis="columns") _, subset = train_test_split( data, test_size=nrows_max, stratify=data[target_name], shuffle=True, random_state=seed, ) x = subset.drop(target_name, axis="columns") y = subset[target_name] # We need to convert categorical columns to string for openml categorical_mask = [self.categorical_mask[i] for i in sorted_column_idxs] columns = list(x.columns) return Dataset( # Technically this is not the same but it's where it was derived from dataset=self.dataset, x=x, y=y, categorical_mask=categorical_mask, columns=columns, ) ```

101 features

class (target)nominal2 unique values
0 missing
V84numeric894 unique values
0 missing
V117numeric641 unique values
0 missing
V147numeric559 unique values
0 missing
V170numeric441 unique values
0 missing
V233numeric674 unique values
0 missing
V263numeric992 unique values
0 missing
V267numeric875 unique values
0 missing
V362numeric893 unique values
0 missing
V392numeric1 unique values
0 missing
V494numeric807 unique values
0 missing
V497numeric994 unique values
0 missing
V524numeric749 unique values
0 missing
V526numeric724 unique values
0 missing
V567numeric940 unique values
0 missing
V606numeric750 unique values
0 missing
V636numeric424 unique values
0 missing
V684numeric907 unique values
0 missing
V862numeric957 unique values
0 missing
V920numeric602 unique values
0 missing
V1049numeric423 unique values
0 missing
V1083numeric682 unique values
0 missing
V1112numeric842 unique values
0 missing
V1115numeric732 unique values
0 missing
V1150numeric876 unique values
0 missing
V1178numeric721 unique values
0 missing
V1189numeric341 unique values
0 missing
V1246numeric516 unique values
0 missing
V1248numeric868 unique values
0 missing
V1282numeric888 unique values
0 missing
V1312numeric764 unique values
0 missing
V1373numeric736 unique values
0 missing
V1393numeric938 unique values
0 missing
V1477numeric636 unique values
0 missing
V1548numeric629 unique values
0 missing
V1574numeric902 unique values
0 missing
V1622numeric518 unique values
0 missing
V1706numeric693 unique values
0 missing
V1724numeric958 unique values
0 missing
V1782numeric1040 unique values
0 missing
V1793numeric1129 unique values
0 missing
V1810numeric1104 unique values
0 missing
V1825numeric945 unique values
0 missing
V1912numeric534 unique values
0 missing
V1918numeric897 unique values
0 missing
V1943numeric767 unique values
0 missing
V1956numeric777 unique values
0 missing
V1964numeric962 unique values
0 missing
V1986numeric909 unique values
0 missing
V2059numeric924 unique values
0 missing
V2114numeric947 unique values
0 missing
V2128numeric942 unique values
0 missing
V2149numeric924 unique values
0 missing
V2185numeric1 unique values
0 missing
V2192numeric673 unique values
0 missing
V2198numeric982 unique values
0 missing
V2259numeric363 unique values
0 missing
V2271numeric850 unique values
0 missing
V2301numeric494 unique values
0 missing
V2316numeric533 unique values
0 missing
V2342numeric885 unique values
0 missing
V2494numeric1143 unique values
0 missing
V2539numeric785 unique values
0 missing
V2616numeric1023 unique values
0 missing
V2658numeric693 unique values
0 missing
V2713numeric1073 unique values
0 missing
V2744numeric681 unique values
0 missing
V2890numeric857 unique values
0 missing
V3069numeric419 unique values
0 missing
V3079numeric865 unique values
0 missing
V3088numeric989 unique values
0 missing
V3171numeric660 unique values
0 missing
V3179numeric797 unique values
0 missing
V3180numeric877 unique values
0 missing
V3181numeric1088 unique values
0 missing
V3233numeric993 unique values
0 missing
V3296numeric496 unique values
0 missing
V3298numeric629 unique values
0 missing
V3312numeric772 unique values
0 missing
V3331numeric740 unique values
0 missing
V3446numeric817 unique values
0 missing
V3451numeric1008 unique values
0 missing
V3459numeric1082 unique values
0 missing
V3485numeric1163 unique values
0 missing
V3522numeric1032 unique values
0 missing
V3536numeric1100 unique values
0 missing
V3600numeric879 unique values
0 missing
V3650numeric893 unique values
0 missing
V3651numeric570 unique values
0 missing
V3656numeric785 unique values
0 missing
V3727numeric884 unique values
0 missing
V3827numeric743 unique values
0 missing
V3916numeric923 unique values
0 missing
V3932numeric869 unique values
0 missing
V3989numeric642 unique values
0 missing
V3992numeric1092 unique values
0 missing
V4084numeric753 unique values
0 missing
V4129numeric1776 unique values
0 missing
V4132numeric1774 unique values
0 missing
V4163numeric1769 unique values
0 missing
V4210numeric1770 unique values
0 missing

19 properties

2000
Number of instances (rows) of the dataset.
101
Number of attributes (columns) of the dataset.
2
Number of distinct values of the target attribute (if it is nominal).
0
Number of missing values in the dataset.
0
Number of instances with at least one value missing.
100
Number of numeric attributes.
1
Number of nominal attributes.
0.05
Number of attributes divided by the number of instances.
99.01
Percentage of numeric attributes.
75
Percentage of instances belonging to the most frequent class.
0.99
Percentage of nominal attributes.
1500
Number of instances belonging to the most frequent class.
25
Percentage of instances belonging to the least frequent class.
500
Number of instances belonging to the least frequent class.
1
Number of binary attributes.
0.99
Percentage of binary attributes.
0
Percentage of instances having missing values.
0.63
Average class difference between consecutive instances.
0
Percentage of missing values.

0 tasks

Define a new task