Data
jasmine_seed_4_nrows_2000_nclasses_10_ncols_100_stratify_True

jasmine_seed_4_nrows_2000_nclasses_10_ncols_100_stratify_True

active ARFF Publicly available Visibility: public Uploaded 17-11-2022 by Eddie Bergman
0 likes downloaded by 0 people , 0 total downloads 0 issues 0 downvotes
Issue #Downvotes for this reason By


Loading wiki
Help us complete this description Edit
Subsampling of the dataset jasmine (41143) with seed=4 args.nrows=2000 args.ncols=100 args.nclasses=10 args.no_stratify=True Generated with the following source code: ```python def subsample( self, seed: int, nrows_max: int = 2_000, ncols_max: int = 100, nclasses_max: int = 10, stratified: bool = True, ) -> Dataset: rng = np.random.default_rng(seed) x = self.x y = self.y # Uniformly sample classes = y.unique() if len(classes) > nclasses_max: vcs = y.value_counts() selected_classes = rng.choice( classes, size=nclasses_max, replace=False, p=vcs / sum(vcs), ) # Select the indices where one of these classes is present idxs = y.index[y.isin(classes)] x = x.iloc[idxs] y = y.iloc[idxs] # Uniformly sample columns if required if len(x.columns) > ncols_max: columns_idxs = rng.choice( list(range(len(x.columns))), size=ncols_max, replace=False ) sorted_column_idxs = sorted(columns_idxs) selected_columns = list(x.columns[sorted_column_idxs]) x = x[selected_columns] else: sorted_column_idxs = list(range(len(x.columns))) if len(x) > nrows_max: # Stratify accordingly target_name = y.name data = pd.concat((x, y), axis="columns") _, subset = train_test_split( data, test_size=nrows_max, stratify=data[target_name], shuffle=True, random_state=seed, ) x = subset.drop(target_name, axis="columns") y = subset[target_name] # We need to convert categorical columns to string for openml categorical_mask = [self.categorical_mask[i] for i in sorted_column_idxs] columns = list(x.columns) return Dataset( # Technically this is not the same but it's where it was derived from dataset=self.dataset, x=x, y=y, categorical_mask=categorical_mask, columns=columns, ) ```

101 features

class (target)nominal2 unique values
0 missing
V4nominal2 unique values
0 missing
V5nominal2 unique values
0 missing
V9nominal2 unique values
0 missing
V10nominal2 unique values
0 missing
V11nominal2 unique values
0 missing
V13numeric106 unique values
0 missing
V14nominal2 unique values
0 missing
V15nominal2 unique values
0 missing
V16nominal2 unique values
0 missing
V18nominal2 unique values
0 missing
V19nominal2 unique values
0 missing
V21nominal2 unique values
0 missing
V22nominal2 unique values
0 missing
V23numeric101 unique values
0 missing
V24nominal2 unique values
0 missing
V25nominal2 unique values
0 missing
V26nominal2 unique values
0 missing
V27nominal2 unique values
0 missing
V29nominal2 unique values
0 missing
V31nominal2 unique values
0 missing
V33nominal2 unique values
0 missing
V35nominal2 unique values
0 missing
V36nominal2 unique values
0 missing
V37nominal2 unique values
0 missing
V39nominal2 unique values
0 missing
V42nominal2 unique values
0 missing
V44nominal2 unique values
0 missing
V46nominal2 unique values
0 missing
V47nominal2 unique values
0 missing
V48nominal2 unique values
0 missing
V49nominal2 unique values
0 missing
V50nominal2 unique values
0 missing
V52nominal2 unique values
0 missing
V53nominal2 unique values
0 missing
V54nominal2 unique values
0 missing
V55nominal2 unique values
0 missing
V56numeric1158 unique values
0 missing
V57nominal2 unique values
0 missing
V58nominal2 unique values
0 missing
V59numeric119 unique values
0 missing
V60nominal2 unique values
0 missing
V61nominal2 unique values
0 missing
V62nominal2 unique values
0 missing
V64nominal1 unique values
0 missing
V65nominal2 unique values
0 missing
V66nominal2 unique values
0 missing
V67nominal2 unique values
0 missing
V68nominal2 unique values
0 missing
V70nominal2 unique values
0 missing
V71nominal2 unique values
0 missing
V73nominal2 unique values
0 missing
V75nominal2 unique values
0 missing
V76nominal2 unique values
0 missing
V77nominal2 unique values
0 missing
V82nominal2 unique values
0 missing
V83nominal2 unique values
0 missing
V86nominal2 unique values
0 missing
V87nominal2 unique values
0 missing
V88nominal2 unique values
0 missing
V90nominal2 unique values
0 missing
V93nominal2 unique values
0 missing
V94nominal2 unique values
0 missing
V95nominal2 unique values
0 missing
V96nominal2 unique values
0 missing
V97nominal2 unique values
0 missing
V98nominal2 unique values
0 missing
V99nominal2 unique values
0 missing
V100nominal2 unique values
0 missing
V101nominal2 unique values
0 missing
V102nominal2 unique values
0 missing
V104nominal2 unique values
0 missing
V107nominal2 unique values
0 missing
V108nominal2 unique values
0 missing
V109nominal2 unique values
0 missing
V111nominal2 unique values
0 missing
V112nominal2 unique values
0 missing
V113nominal2 unique values
0 missing
V114nominal2 unique values
0 missing
V115nominal2 unique values
0 missing
V116nominal2 unique values
0 missing
V119nominal2 unique values
0 missing
V121nominal2 unique values
0 missing
V122nominal2 unique values
0 missing
V123nominal2 unique values
0 missing
V125nominal2 unique values
0 missing
V126numeric110 unique values
0 missing
V129nominal2 unique values
0 missing
V130nominal2 unique values
0 missing
V131numeric101 unique values
0 missing
V132nominal2 unique values
0 missing
V133nominal2 unique values
0 missing
V134nominal2 unique values
0 missing
V135nominal2 unique values
0 missing
V136nominal2 unique values
0 missing
V137nominal2 unique values
0 missing
V138nominal2 unique values
0 missing
V140nominal2 unique values
0 missing
V141nominal2 unique values
0 missing
V143nominal2 unique values
0 missing
V144nominal2 unique values
0 missing

19 properties

2000
Number of instances (rows) of the dataset.
101
Number of attributes (columns) of the dataset.
2
Number of distinct values of the target attribute (if it is nominal).
0
Number of missing values in the dataset.
0
Number of instances with at least one value missing.
6
Number of numeric attributes.
95
Number of nominal attributes.
0.05
Number of attributes divided by the number of instances.
5.94
Percentage of numeric attributes.
50
Percentage of instances belonging to the most frequent class.
94.06
Percentage of nominal attributes.
1000
Number of instances belonging to the most frequent class.
50
Percentage of instances belonging to the least frequent class.
1000
Number of instances belonging to the least frequent class.
95
Number of binary attributes.
94.06
Percentage of binary attributes.
0
Percentage of instances having missing values.
0.5
Average class difference between consecutive instances.
0
Percentage of missing values.

0 tasks

Define a new task