Data
fabert_seed_2_nrows_2000_nclasses_10_ncols_100_stratify_True

fabert_seed_2_nrows_2000_nclasses_10_ncols_100_stratify_True

active ARFF Publicly available Visibility: public Uploaded 17-11-2022 by Eddie Bergman
0 likes downloaded by 0 people , 0 total downloads 0 issues 0 downvotes
Issue #Downvotes for this reason By


Loading wiki
Help us complete this description Edit
Subsampling of the dataset fabert (41164) with seed=2 args.nrows=2000 args.ncols=100 args.nclasses=10 args.no_stratify=True Generated with the following source code: ```python def subsample( self, seed: int, nrows_max: int = 2_000, ncols_max: int = 100, nclasses_max: int = 10, stratified: bool = True, ) -> Dataset: rng = np.random.default_rng(seed) x = self.x y = self.y # Uniformly sample classes = y.unique() if len(classes) > nclasses_max: vcs = y.value_counts() selected_classes = rng.choice( classes, size=nclasses_max, replace=False, p=vcs / sum(vcs), ) # Select the indices where one of these classes is present idxs = y.index[y.isin(classes)] x = x.iloc[idxs] y = y.iloc[idxs] # Uniformly sample columns if required if len(x.columns) > ncols_max: columns_idxs = rng.choice( list(range(len(x.columns))), size=ncols_max, replace=False ) sorted_column_idxs = sorted(columns_idxs) selected_columns = list(x.columns[sorted_column_idxs]) x = x[selected_columns] else: sorted_column_idxs = list(range(len(x.columns))) if len(x) > nrows_max: # Stratify accordingly target_name = y.name data = pd.concat((x, y), axis="columns") _, subset = train_test_split( data, test_size=nrows_max, stratify=data[target_name], shuffle=True, random_state=seed, ) x = subset.drop(target_name, axis="columns") y = subset[target_name] # We need to convert categorical columns to string for openml categorical_mask = [self.categorical_mask[i] for i in sorted_column_idxs] columns = list(x.columns) return Dataset( # Technically this is not the same but it's where it was derived from dataset=self.dataset, x=x, y=y, categorical_mask=categorical_mask, columns=columns, ) ```

101 features

class (target)nominal7 unique values
0 missing
V30numeric10 unique values
0 missing
V31numeric11 unique values
0 missing
V40numeric30 unique values
0 missing
V46numeric41 unique values
0 missing
V61numeric72 unique values
0 missing
V66numeric18 unique values
0 missing
V77numeric15 unique values
0 missing
V80numeric3 unique values
0 missing
V81numeric25 unique values
0 missing
V82numeric16 unique values
0 missing
V109numeric13 unique values
0 missing
V135numeric27 unique values
0 missing
V139numeric31 unique values
0 missing
V145numeric11 unique values
0 missing
V154numeric11 unique values
0 missing
V161numeric11 unique values
0 missing
V166numeric9 unique values
0 missing
V169numeric25 unique values
0 missing
V173numeric11 unique values
0 missing
V184numeric32 unique values
0 missing
V189numeric19 unique values
0 missing
V198numeric14 unique values
0 missing
V207numeric25 unique values
0 missing
V211numeric5 unique values
0 missing
V221numeric8 unique values
0 missing
V238numeric3 unique values
0 missing
V239numeric19 unique values
0 missing
V247numeric8 unique values
0 missing
V257numeric30 unique values
0 missing
V269numeric5 unique values
0 missing
V281numeric11 unique values
0 missing
V290numeric11 unique values
0 missing
V292numeric10 unique values
0 missing
V309numeric6 unique values
0 missing
V314numeric43 unique values
0 missing
V315numeric57 unique values
0 missing
V320numeric49 unique values
0 missing
V342numeric30 unique values
0 missing
V343numeric27 unique values
0 missing
V345numeric12 unique values
0 missing
V350numeric16 unique values
0 missing
V356numeric3 unique values
0 missing
V363numeric15 unique values
0 missing
V367numeric17 unique values
0 missing
V380numeric40 unique values
0 missing
V381numeric3 unique values
0 missing
V383numeric13 unique values
0 missing
V393numeric46 unique values
0 missing
V400numeric4 unique values
0 missing
V401numeric26 unique values
0 missing
V406numeric4 unique values
0 missing
V416numeric25 unique values
0 missing
V427numeric32 unique values
0 missing
V430numeric46 unique values
0 missing
V431numeric42 unique values
0 missing
V457numeric9 unique values
0 missing
V459numeric10 unique values
0 missing
V461numeric3 unique values
0 missing
V464numeric11 unique values
0 missing
V474numeric38 unique values
0 missing
V482numeric12 unique values
0 missing
V488numeric44 unique values
0 missing
V494numeric7 unique values
0 missing
V497numeric12 unique values
0 missing
V501numeric13 unique values
0 missing
V503numeric7 unique values
0 missing
V516numeric41 unique values
0 missing
V519numeric38 unique values
0 missing
V521numeric11 unique values
0 missing
V525numeric28 unique values
0 missing
V531numeric47 unique values
0 missing
V544numeric10 unique values
0 missing
V548numeric16 unique values
0 missing
V575numeric24 unique values
0 missing
V579numeric6 unique values
0 missing
V581numeric8 unique values
0 missing
V588numeric33 unique values
0 missing
V609numeric51 unique values
0 missing
V610numeric21 unique values
0 missing
V630numeric26 unique values
0 missing
V638numeric2 unique values
0 missing
V652numeric20 unique values
0 missing
V653numeric6 unique values
0 missing
V656numeric5 unique values
0 missing
V665numeric5 unique values
0 missing
V671numeric23 unique values
0 missing
V676numeric15 unique values
0 missing
V682numeric13 unique values
0 missing
V686numeric4 unique values
0 missing
V690numeric2 unique values
0 missing
V696numeric11 unique values
0 missing
V698numeric24 unique values
0 missing
V708numeric10 unique values
0 missing
V711numeric13 unique values
0 missing
V729numeric28 unique values
0 missing
V732numeric7 unique values
0 missing
V774numeric6 unique values
0 missing
V787numeric23 unique values
0 missing
V789numeric4 unique values
0 missing
V799numeric38 unique values
0 missing

19 properties

2000
Number of instances (rows) of the dataset.
101
Number of attributes (columns) of the dataset.
7
Number of distinct values of the target attribute (if it is nominal).
0
Number of missing values in the dataset.
0
Number of instances with at least one value missing.
100
Number of numeric attributes.
1
Number of nominal attributes.
99.01
Percentage of numeric attributes.
0.05
Number of attributes divided by the number of instances.
0.99
Percentage of nominal attributes.
23.4
Percentage of instances belonging to the most frequent class.
468
Number of instances belonging to the most frequent class.
6.1
Percentage of instances belonging to the least frequent class.
122
Number of instances belonging to the least frequent class.
0
Number of binary attributes.
0
Percentage of binary attributes.
0
Percentage of instances having missing values.
0
Percentage of missing values.
0.17
Average class difference between consecutive instances.

0 tasks

Define a new task