Data
jasmine_seed_1_nrows_2000_nclasses_10_ncols_100_stratify_True

jasmine_seed_1_nrows_2000_nclasses_10_ncols_100_stratify_True

active ARFF Publicly available Visibility: public Uploaded 17-11-2022 by Eddie Bergman
0 likes downloaded by 0 people , 0 total downloads 0 issues 0 downvotes
Issue #Downvotes for this reason By


Loading wiki
Help us complete this description Edit
Subsampling of the dataset jasmine (41143) with seed=1 args.nrows=2000 args.ncols=100 args.nclasses=10 args.no_stratify=True Generated with the following source code: ```python def subsample( self, seed: int, nrows_max: int = 2_000, ncols_max: int = 100, nclasses_max: int = 10, stratified: bool = True, ) -> Dataset: rng = np.random.default_rng(seed) x = self.x y = self.y # Uniformly sample classes = y.unique() if len(classes) > nclasses_max: vcs = y.value_counts() selected_classes = rng.choice( classes, size=nclasses_max, replace=False, p=vcs / sum(vcs), ) # Select the indices where one of these classes is present idxs = y.index[y.isin(classes)] x = x.iloc[idxs] y = y.iloc[idxs] # Uniformly sample columns if required if len(x.columns) > ncols_max: columns_idxs = rng.choice( list(range(len(x.columns))), size=ncols_max, replace=False ) sorted_column_idxs = sorted(columns_idxs) selected_columns = list(x.columns[sorted_column_idxs]) x = x[selected_columns] else: sorted_column_idxs = list(range(len(x.columns))) if len(x) > nrows_max: # Stratify accordingly target_name = y.name data = pd.concat((x, y), axis="columns") _, subset = train_test_split( data, test_size=nrows_max, stratify=data[target_name], shuffle=True, random_state=seed, ) x = subset.drop(target_name, axis="columns") y = subset[target_name] # We need to convert categorical columns to string for openml categorical_mask = [self.categorical_mask[i] for i in sorted_column_idxs] columns = list(x.columns) return Dataset( # Technically this is not the same but it's where it was derived from dataset=self.dataset, x=x, y=y, categorical_mask=categorical_mask, columns=columns, ) ```

101 features

class (target)nominal2 unique values
0 missing
V2nominal2 unique values
0 missing
V5nominal2 unique values
0 missing
V6nominal2 unique values
0 missing
V8nominal2 unique values
0 missing
V9nominal2 unique values
0 missing
V10nominal2 unique values
0 missing
V11nominal2 unique values
0 missing
V13numeric108 unique values
0 missing
V14nominal2 unique values
0 missing
V16nominal2 unique values
0 missing
V17nominal2 unique values
0 missing
V22nominal2 unique values
0 missing
V23numeric98 unique values
0 missing
V24nominal2 unique values
0 missing
V25nominal2 unique values
0 missing
V28nominal2 unique values
0 missing
V29nominal2 unique values
0 missing
V31nominal2 unique values
0 missing
V32nominal2 unique values
0 missing
V33nominal2 unique values
0 missing
V34nominal2 unique values
0 missing
V35nominal2 unique values
0 missing
V36nominal2 unique values
0 missing
V37nominal2 unique values
0 missing
V40nominal2 unique values
0 missing
V42nominal2 unique values
0 missing
V44nominal2 unique values
0 missing
V45numeric14 unique values
0 missing
V46nominal2 unique values
0 missing
V47nominal2 unique values
0 missing
V48nominal2 unique values
0 missing
V49nominal2 unique values
0 missing
V50nominal2 unique values
0 missing
V51nominal2 unique values
0 missing
V54nominal2 unique values
0 missing
V55nominal2 unique values
0 missing
V56numeric1149 unique values
0 missing
V57nominal2 unique values
0 missing
V58nominal2 unique values
0 missing
V59numeric119 unique values
0 missing
V64nominal2 unique values
0 missing
V65nominal2 unique values
0 missing
V66nominal2 unique values
0 missing
V67nominal2 unique values
0 missing
V69nominal2 unique values
0 missing
V70nominal2 unique values
0 missing
V71nominal2 unique values
0 missing
V72nominal2 unique values
0 missing
V74nominal2 unique values
0 missing
V75nominal2 unique values
0 missing
V76nominal2 unique values
0 missing
V77nominal2 unique values
0 missing
V78nominal2 unique values
0 missing
V79nominal2 unique values
0 missing
V80nominal2 unique values
0 missing
V81nominal2 unique values
0 missing
V82nominal2 unique values
0 missing
V83nominal2 unique values
0 missing
V84nominal2 unique values
0 missing
V85nominal2 unique values
0 missing
V86nominal2 unique values
0 missing
V87nominal2 unique values
0 missing
V88nominal2 unique values
0 missing
V89nominal2 unique values
0 missing
V91nominal2 unique values
0 missing
V92nominal2 unique values
0 missing
V93nominal2 unique values
0 missing
V94nominal2 unique values
0 missing
V96nominal2 unique values
0 missing
V98nominal2 unique values
0 missing
V99nominal2 unique values
0 missing
V101nominal2 unique values
0 missing
V102nominal2 unique values
0 missing
V104nominal2 unique values
0 missing
V105nominal2 unique values
0 missing
V106nominal2 unique values
0 missing
V107nominal2 unique values
0 missing
V110nominal2 unique values
0 missing
V112nominal2 unique values
0 missing
V113nominal2 unique values
0 missing
V114nominal2 unique values
0 missing
V117nominal2 unique values
0 missing
V119nominal2 unique values
0 missing
V120nominal2 unique values
0 missing
V122nominal2 unique values
0 missing
V123nominal2 unique values
0 missing
V124nominal2 unique values
0 missing
V128nominal2 unique values
0 missing
V129nominal2 unique values
0 missing
V131numeric105 unique values
0 missing
V132nominal2 unique values
0 missing
V133nominal2 unique values
0 missing
V134nominal2 unique values
0 missing
V136nominal2 unique values
0 missing
V137nominal2 unique values
0 missing
V139nominal2 unique values
0 missing
V140nominal2 unique values
0 missing
V141nominal2 unique values
0 missing
V142nominal2 unique values
0 missing
V143nominal2 unique values
0 missing

19 properties

2000
Number of instances (rows) of the dataset.
101
Number of attributes (columns) of the dataset.
2
Number of distinct values of the target attribute (if it is nominal).
0
Number of missing values in the dataset.
0
Number of instances with at least one value missing.
6
Number of numeric attributes.
95
Number of nominal attributes.
50
Percentage of instances belonging to the most frequent class.
94.06
Percentage of nominal attributes.
1000
Number of instances belonging to the most frequent class.
50
Percentage of instances belonging to the least frequent class.
1000
Number of instances belonging to the least frequent class.
95
Number of binary attributes.
94.06
Percentage of binary attributes.
0
Percentage of instances having missing values.
0.49
Average class difference between consecutive instances.
0
Percentage of missing values.
0.05
Number of attributes divided by the number of instances.
5.94
Percentage of numeric attributes.

0 tasks

Define a new task