Data
guillermo_seed_3_nrows_2000_nclasses_10_ncols_100_stratify_True

guillermo_seed_3_nrows_2000_nclasses_10_ncols_100_stratify_True

active ARFF Publicly available Visibility: public Uploaded 17-11-2022 by Eddie Bergman
0 likes downloaded by 0 people , 0 total downloads 0 issues 0 downvotes
Issue #Downvotes for this reason By


Loading wiki
Help us complete this description Edit
Subsampling of the dataset guillermo (41159) with seed=3 args.nrows=2000 args.ncols=100 args.nclasses=10 args.no_stratify=True Generated with the following source code: ```python def subsample( self, seed: int, nrows_max: int = 2_000, ncols_max: int = 100, nclasses_max: int = 10, stratified: bool = True, ) -> Dataset: rng = np.random.default_rng(seed) x = self.x y = self.y # Uniformly sample classes = y.unique() if len(classes) > nclasses_max: vcs = y.value_counts() selected_classes = rng.choice( classes, size=nclasses_max, replace=False, p=vcs / sum(vcs), ) # Select the indices where one of these classes is present idxs = y.index[y.isin(classes)] x = x.iloc[idxs] y = y.iloc[idxs] # Uniformly sample columns if required if len(x.columns) > ncols_max: columns_idxs = rng.choice( list(range(len(x.columns))), size=ncols_max, replace=False ) sorted_column_idxs = sorted(columns_idxs) selected_columns = list(x.columns[sorted_column_idxs]) x = x[selected_columns] else: sorted_column_idxs = list(range(len(x.columns))) if len(x) > nrows_max: # Stratify accordingly target_name = y.name data = pd.concat((x, y), axis="columns") _, subset = train_test_split( data, test_size=nrows_max, stratify=data[target_name], shuffle=True, random_state=seed, ) x = subset.drop(target_name, axis="columns") y = subset[target_name] # We need to convert categorical columns to string for openml categorical_mask = [self.categorical_mask[i] for i in sorted_column_idxs] columns = list(x.columns) return Dataset( # Technically this is not the same but it's where it was derived from dataset=self.dataset, x=x, y=y, categorical_mask=categorical_mask, columns=columns, ) ```

101 features

class (target)nominal2 unique values
0 missing
V7numeric614 unique values
0 missing
V21numeric1021 unique values
0 missing
V130numeric464 unique values
0 missing
V138numeric735 unique values
0 missing
V166numeric862 unique values
0 missing
V183numeric236 unique values
0 missing
V327numeric511 unique values
0 missing
V360numeric903 unique values
0 missing
V388numeric412 unique values
0 missing
V396numeric472 unique values
0 missing
V440numeric792 unique values
0 missing
V480numeric716 unique values
0 missing
V591numeric679 unique values
0 missing
V673numeric521 unique values
0 missing
V731numeric732 unique values
0 missing
V754numeric291 unique values
0 missing
V762numeric646 unique values
0 missing
V810numeric419 unique values
0 missing
V884numeric737 unique values
0 missing
V936numeric748 unique values
0 missing
V965numeric780 unique values
0 missing
V995numeric749 unique values
0 missing
V1040numeric983 unique values
0 missing
V1059numeric459 unique values
0 missing
V1081numeric445 unique values
0 missing
V1083numeric428 unique values
0 missing
V1116numeric649 unique values
0 missing
V1203numeric666 unique values
0 missing
V1240numeric320 unique values
0 missing
V1250numeric852 unique values
0 missing
V1266numeric237 unique values
0 missing
V1274numeric261 unique values
0 missing
V1333numeric619 unique values
0 missing
V1354numeric708 unique values
0 missing
V1397numeric709 unique values
0 missing
V1398numeric870 unique values
0 missing
V1594numeric726 unique values
0 missing
V1651numeric905 unique values
0 missing
V1659numeric854 unique values
0 missing
V1693numeric778 unique values
0 missing
V1774numeric687 unique values
0 missing
V1819numeric221 unique values
0 missing
V1823numeric143 unique values
0 missing
V1837numeric87 unique values
0 missing
V1907numeric614 unique values
0 missing
V2004numeric717 unique values
0 missing
V2017numeric770 unique values
0 missing
V2018numeric384 unique values
0 missing
V2082numeric645 unique values
0 missing
V2181numeric323 unique values
0 missing
V2213numeric585 unique values
0 missing
V2218numeric377 unique values
0 missing
V2448numeric573 unique values
0 missing
V2479numeric445 unique values
0 missing
V2486numeric724 unique values
0 missing
V2548numeric368 unique values
0 missing
V2594numeric406 unique values
0 missing
V2595numeric571 unique values
0 missing
V2615numeric170 unique values
0 missing
V2690numeric639 unique values
0 missing
V2745numeric667 unique values
0 missing
V2753numeric878 unique values
0 missing
V2795numeric478 unique values
0 missing
V2814numeric494 unique values
0 missing
V2815numeric974 unique values
0 missing
V2816numeric836 unique values
0 missing
V2820numeric667 unique values
0 missing
V2913numeric248 unique values
0 missing
V2924numeric204 unique values
0 missing
V2948numeric679 unique values
0 missing
V3009numeric794 unique values
0 missing
V3087numeric761 unique values
0 missing
V3096numeric924 unique values
0 missing
V3119numeric344 unique values
0 missing
V3169numeric471 unique values
0 missing
V3199numeric514 unique values
0 missing
V3254numeric781 unique values
0 missing
V3288numeric540 unique values
0 missing
V3325numeric640 unique values
0 missing
V3367numeric880 unique values
0 missing
V3406numeric563 unique values
0 missing
V3514numeric524 unique values
0 missing
V3551numeric950 unique values
0 missing
V3606numeric941 unique values
0 missing
V3649numeric548 unique values
0 missing
V3654numeric708 unique values
0 missing
V3682numeric512 unique values
0 missing
V3746numeric817 unique values
0 missing
V3769numeric420 unique values
0 missing
V3787numeric707 unique values
0 missing
V3915numeric632 unique values
0 missing
V3958numeric782 unique values
0 missing
V3972numeric846 unique values
0 missing
V3978numeric912 unique values
0 missing
V4004numeric577 unique values
0 missing
V4044numeric918 unique values
0 missing
V4103numeric1220 unique values
0 missing
V4128numeric1221 unique values
0 missing
V4280numeric1223 unique values
0 missing
V4291numeric1218 unique values
0 missing

19 properties

2000
Number of instances (rows) of the dataset.
101
Number of attributes (columns) of the dataset.
2
Number of distinct values of the target attribute (if it is nominal).
0
Number of missing values in the dataset.
0
Number of instances with at least one value missing.
100
Number of numeric attributes.
1
Number of nominal attributes.
0.52
Average class difference between consecutive instances.
0
Percentage of missing values.
0.05
Number of attributes divided by the number of instances.
99.01
Percentage of numeric attributes.
60
Percentage of instances belonging to the most frequent class.
0.99
Percentage of nominal attributes.
1200
Number of instances belonging to the most frequent class.
40
Percentage of instances belonging to the least frequent class.
800
Number of instances belonging to the least frequent class.
1
Number of binary attributes.
0.99
Percentage of binary attributes.
0
Percentage of instances having missing values.

0 tasks

Define a new task