Data
jasmine_seed_0_nrows_2000_nclasses_10_ncols_100_stratify_True

jasmine_seed_0_nrows_2000_nclasses_10_ncols_100_stratify_True

active ARFF Publicly available Visibility: public Uploaded 17-11-2022 by Eddie Bergman
0 likes downloaded by 0 people , 0 total downloads 0 issues 0 downvotes
Issue #Downvotes for this reason By


Loading wiki
Help us complete this description Edit
Subsampling of the dataset jasmine (41143) with seed=0 args.nrows=2000 args.ncols=100 args.nclasses=10 args.no_stratify=True Generated with the following source code: ```python def subsample( self, seed: int, nrows_max: int = 2_000, ncols_max: int = 100, nclasses_max: int = 10, stratified: bool = True, ) -> Dataset: rng = np.random.default_rng(seed) x = self.x y = self.y # Uniformly sample classes = y.unique() if len(classes) > nclasses_max: vcs = y.value_counts() selected_classes = rng.choice( classes, size=nclasses_max, replace=False, p=vcs / sum(vcs), ) # Select the indices where one of these classes is present idxs = y.index[y.isin(classes)] x = x.iloc[idxs] y = y.iloc[idxs] # Uniformly sample columns if required if len(x.columns) > ncols_max: columns_idxs = rng.choice( list(range(len(x.columns))), size=ncols_max, replace=False ) sorted_column_idxs = sorted(columns_idxs) selected_columns = list(x.columns[sorted_column_idxs]) x = x[selected_columns] else: sorted_column_idxs = list(range(len(x.columns))) if len(x) > nrows_max: # Stratify accordingly target_name = y.name data = pd.concat((x, y), axis="columns") _, subset = train_test_split( data, test_size=nrows_max, stratify=data[target_name], shuffle=True, random_state=seed, ) x = subset.drop(target_name, axis="columns") y = subset[target_name] # We need to convert categorical columns to string for openml categorical_mask = [self.categorical_mask[i] for i in sorted_column_idxs] columns = list(x.columns) return Dataset( # Technically this is not the same but it's where it was derived from dataset=self.dataset, x=x, y=y, categorical_mask=categorical_mask, columns=columns, ) ```

101 features

class (target)nominal2 unique values
0 missing
V1nominal2 unique values
0 missing
V2nominal2 unique values
0 missing
V3nominal2 unique values
0 missing
V4nominal2 unique values
0 missing
V7nominal2 unique values
0 missing
V9nominal2 unique values
0 missing
V10nominal2 unique values
0 missing
V11nominal2 unique values
0 missing
V12nominal2 unique values
0 missing
V13numeric105 unique values
0 missing
V14nominal2 unique values
0 missing
V15nominal2 unique values
0 missing
V16nominal2 unique values
0 missing
V19nominal2 unique values
0 missing
V24nominal2 unique values
0 missing
V25nominal2 unique values
0 missing
V28nominal2 unique values
0 missing
V29nominal2 unique values
0 missing
V30nominal2 unique values
0 missing
V32nominal2 unique values
0 missing
V33nominal2 unique values
0 missing
V34nominal2 unique values
0 missing
V35nominal2 unique values
0 missing
V36nominal2 unique values
0 missing
V37nominal2 unique values
0 missing
V39nominal2 unique values
0 missing
V40nominal2 unique values
0 missing
V42nominal2 unique values
0 missing
V43numeric79 unique values
0 missing
V44nominal2 unique values
0 missing
V45numeric15 unique values
0 missing
V48nominal2 unique values
0 missing
V50nominal2 unique values
0 missing
V52nominal2 unique values
0 missing
V53nominal2 unique values
0 missing
V54nominal2 unique values
0 missing
V56numeric1149 unique values
0 missing
V58nominal2 unique values
0 missing
V59numeric119 unique values
0 missing
V60nominal2 unique values
0 missing
V61nominal2 unique values
0 missing
V63nominal2 unique values
0 missing
V64nominal2 unique values
0 missing
V65nominal2 unique values
0 missing
V66nominal2 unique values
0 missing
V67nominal2 unique values
0 missing
V68nominal2 unique values
0 missing
V70nominal2 unique values
0 missing
V72nominal2 unique values
0 missing
V73nominal2 unique values
0 missing
V74nominal2 unique values
0 missing
V75nominal2 unique values
0 missing
V76nominal2 unique values
0 missing
V78nominal2 unique values
0 missing
V80nominal2 unique values
0 missing
V81nominal2 unique values
0 missing
V82nominal2 unique values
0 missing
V83nominal2 unique values
0 missing
V84nominal2 unique values
0 missing
V85nominal2 unique values
0 missing
V86nominal2 unique values
0 missing
V87nominal2 unique values
0 missing
V88nominal2 unique values
0 missing
V89nominal2 unique values
0 missing
V90nominal2 unique values
0 missing
V92nominal2 unique values
0 missing
V93nominal2 unique values
0 missing
V94nominal2 unique values
0 missing
V96nominal2 unique values
0 missing
V97nominal2 unique values
0 missing
V98nominal2 unique values
0 missing
V99nominal2 unique values
0 missing
V100nominal2 unique values
0 missing
V101nominal2 unique values
0 missing
V102nominal2 unique values
0 missing
V103nominal2 unique values
0 missing
V104nominal2 unique values
0 missing
V105nominal2 unique values
0 missing
V106nominal2 unique values
0 missing
V107nominal2 unique values
0 missing
V112nominal2 unique values
0 missing
V113nominal2 unique values
0 missing
V114nominal2 unique values
0 missing
V115nominal2 unique values
0 missing
V116nominal2 unique values
0 missing
V118nominal2 unique values
0 missing
V119nominal2 unique values
0 missing
V120nominal2 unique values
0 missing
V122nominal2 unique values
0 missing
V124nominal2 unique values
0 missing
V126numeric107 unique values
0 missing
V130nominal2 unique values
0 missing
V132nominal2 unique values
0 missing
V133nominal2 unique values
0 missing
V135nominal2 unique values
0 missing
V136nominal2 unique values
0 missing
V137nominal2 unique values
0 missing
V139nominal2 unique values
0 missing
V141nominal2 unique values
0 missing
V143nominal2 unique values
0 missing

19 properties

2000
Number of instances (rows) of the dataset.
101
Number of attributes (columns) of the dataset.
2
Number of distinct values of the target attribute (if it is nominal).
0
Number of missing values in the dataset.
0
Number of instances with at least one value missing.
6
Number of numeric attributes.
95
Number of nominal attributes.
94.06
Percentage of binary attributes.
0
Percentage of instances having missing values.
0.52
Average class difference between consecutive instances.
0
Percentage of missing values.
0.05
Number of attributes divided by the number of instances.
5.94
Percentage of numeric attributes.
50
Percentage of instances belonging to the most frequent class.
94.06
Percentage of nominal attributes.
1000
Number of instances belonging to the most frequent class.
50
Percentage of instances belonging to the least frequent class.
1000
Number of instances belonging to the least frequent class.
95
Number of binary attributes.

0 tasks

Define a new task