Data
cnae-9_seed_1_nrows_2000_nclasses_10_ncols_100_stratify_True

cnae-9_seed_1_nrows_2000_nclasses_10_ncols_100_stratify_True

active ARFF Publicly available Visibility: public Uploaded 17-11-2022 by Eddie Bergman
0 likes downloaded by 0 people , 0 total downloads 0 issues 0 downvotes
Issue #Downvotes for this reason By


Loading wiki
Help us complete this description Edit
Subsampling of the dataset cnae-9 (1468) with seed=1 args.nrows=2000 args.ncols=100 args.nclasses=10 args.no_stratify=True Generated with the following source code: ```python def subsample( self, seed: int, nrows_max: int = 2_000, ncols_max: int = 100, nclasses_max: int = 10, stratified: bool = True, ) -> Dataset: rng = np.random.default_rng(seed) x = self.x y = self.y # Uniformly sample classes = y.unique() if len(classes) > nclasses_max: vcs = y.value_counts() selected_classes = rng.choice( classes, size=nclasses_max, replace=False, p=vcs / sum(vcs), ) # Select the indices where one of these classes is present idxs = y.index[y.isin(classes)] x = x.iloc[idxs] y = y.iloc[idxs] # Uniformly sample columns if required if len(x.columns) > ncols_max: columns_idxs = rng.choice( list(range(len(x.columns))), size=ncols_max, replace=False ) sorted_column_idxs = sorted(columns_idxs) selected_columns = list(x.columns[sorted_column_idxs]) x = x[selected_columns] else: sorted_column_idxs = list(range(len(x.columns))) if len(x) > nrows_max: # Stratify accordingly target_name = y.name data = pd.concat((x, y), axis="columns") _, subset = train_test_split( data, test_size=nrows_max, stratify=data[target_name], shuffle=True, random_state=seed, ) x = subset.drop(target_name, axis="columns") y = subset[target_name] # We need to convert categorical columns to string for openml categorical_mask = [self.categorical_mask[i] for i in sorted_column_idxs] columns = list(x.columns) return Dataset( # Technically this is not the same but it's where it was derived from dataset=self.dataset, x=x, y=y, categorical_mask=categorical_mask, columns=columns, ) ```

101 features

Class (target)nominal9 unique values
0 missing
V16numeric2 unique values
0 missing
V22numeric3 unique values
0 missing
V27numeric2 unique values
0 missing
V33numeric2 unique values
0 missing
V47numeric2 unique values
0 missing
V50numeric2 unique values
0 missing
V53numeric2 unique values
0 missing
V67numeric2 unique values
0 missing
V75numeric3 unique values
0 missing
V94numeric2 unique values
0 missing
V96numeric2 unique values
0 missing
V98numeric2 unique values
0 missing
V106numeric2 unique values
0 missing
V110numeric2 unique values
0 missing
V127numeric2 unique values
0 missing
V131numeric2 unique values
0 missing
V162numeric2 unique values
0 missing
V181numeric2 unique values
0 missing
V191numeric3 unique values
0 missing
V199numeric3 unique values
0 missing
V209numeric2 unique values
0 missing
V211numeric2 unique values
0 missing
V220numeric2 unique values
0 missing
V225numeric2 unique values
0 missing
V238numeric2 unique values
0 missing
V239numeric2 unique values
0 missing
V240numeric2 unique values
0 missing
V258numeric3 unique values
0 missing
V263numeric2 unique values
0 missing
V292numeric2 unique values
0 missing
V300numeric2 unique values
0 missing
V304numeric2 unique values
0 missing
V308numeric2 unique values
0 missing
V316numeric3 unique values
0 missing
V320numeric2 unique values
0 missing
V326numeric2 unique values
0 missing
V345numeric2 unique values
0 missing
V349numeric2 unique values
0 missing
V355numeric2 unique values
0 missing
V358numeric2 unique values
0 missing
V359numeric3 unique values
0 missing
V375numeric2 unique values
0 missing
V383numeric3 unique values
0 missing
V384numeric3 unique values
0 missing
V388numeric2 unique values
0 missing
V390numeric2 unique values
0 missing
V400numeric2 unique values
0 missing
V420numeric2 unique values
0 missing
V423numeric3 unique values
0 missing
V426numeric2 unique values
0 missing
V433numeric2 unique values
0 missing
V435numeric2 unique values
0 missing
V439numeric2 unique values
0 missing
V440numeric2 unique values
0 missing
V467numeric2 unique values
0 missing
V493numeric2 unique values
0 missing
V498numeric2 unique values
0 missing
V500numeric2 unique values
0 missing
V507numeric2 unique values
0 missing
V513numeric3 unique values
0 missing
V538numeric2 unique values
0 missing
V572numeric2 unique values
0 missing
V574numeric2 unique values
0 missing
V586numeric2 unique values
0 missing
V587numeric2 unique values
0 missing
V591numeric2 unique values
0 missing
V598numeric2 unique values
0 missing
V599numeric2 unique values
0 missing
V604numeric2 unique values
0 missing
V619numeric3 unique values
0 missing
V628numeric2 unique values
0 missing
V638numeric2 unique values
0 missing
V639numeric2 unique values
0 missing
V640numeric2 unique values
0 missing
V642numeric2 unique values
0 missing
V647numeric2 unique values
0 missing
V653numeric2 unique values
0 missing
V667numeric2 unique values
0 missing
V673numeric3 unique values
0 missing
V679numeric2 unique values
0 missing
V702numeric2 unique values
0 missing
V711numeric2 unique values
0 missing
V717numeric2 unique values
0 missing
V723numeric2 unique values
0 missing
V725numeric2 unique values
0 missing
V727numeric2 unique values
0 missing
V751numeric3 unique values
0 missing
V760numeric2 unique values
0 missing
V771numeric2 unique values
0 missing
V776numeric2 unique values
0 missing
V786numeric2 unique values
0 missing
V787numeric3 unique values
0 missing
V789numeric2 unique values
0 missing
V792numeric2 unique values
0 missing
V801numeric2 unique values
0 missing
V812numeric2 unique values
0 missing
V827numeric2 unique values
0 missing
V829numeric2 unique values
0 missing
V835numeric2 unique values
0 missing
V840numeric2 unique values
0 missing

19 properties

1080
Number of instances (rows) of the dataset.
101
Number of attributes (columns) of the dataset.
9
Number of distinct values of the target attribute (if it is nominal).
0
Number of missing values in the dataset.
0
Number of instances with at least one value missing.
100
Number of numeric attributes.
1
Number of nominal attributes.
120
Number of instances belonging to the least frequent class.
0
Number of binary attributes.
0
Percentage of binary attributes.
0
Percentage of instances having missing values.
0
Average class difference between consecutive instances.
0
Percentage of missing values.
0.09
Number of attributes divided by the number of instances.
99.01
Percentage of numeric attributes.
11.11
Percentage of instances belonging to the most frequent class.
0.99
Percentage of nominal attributes.
120
Number of instances belonging to the most frequent class.
11.11
Percentage of instances belonging to the least frequent class.

0 tasks

Define a new task