Data
cnae-9_seed_2_nrows_2000_nclasses_10_ncols_100_stratify_True

cnae-9_seed_2_nrows_2000_nclasses_10_ncols_100_stratify_True

active ARFF Publicly available Visibility: public Uploaded 17-11-2022 by Eddie Bergman
0 likes downloaded by 0 people , 0 total downloads 0 issues 0 downvotes
Issue #Downvotes for this reason By


Loading wiki
Help us complete this description Edit
Subsampling of the dataset cnae-9 (1468) with seed=2 args.nrows=2000 args.ncols=100 args.nclasses=10 args.no_stratify=True Generated with the following source code: ```python def subsample( self, seed: int, nrows_max: int = 2_000, ncols_max: int = 100, nclasses_max: int = 10, stratified: bool = True, ) -> Dataset: rng = np.random.default_rng(seed) x = self.x y = self.y # Uniformly sample classes = y.unique() if len(classes) > nclasses_max: vcs = y.value_counts() selected_classes = rng.choice( classes, size=nclasses_max, replace=False, p=vcs / sum(vcs), ) # Select the indices where one of these classes is present idxs = y.index[y.isin(classes)] x = x.iloc[idxs] y = y.iloc[idxs] # Uniformly sample columns if required if len(x.columns) > ncols_max: columns_idxs = rng.choice( list(range(len(x.columns))), size=ncols_max, replace=False ) sorted_column_idxs = sorted(columns_idxs) selected_columns = list(x.columns[sorted_column_idxs]) x = x[selected_columns] else: sorted_column_idxs = list(range(len(x.columns))) if len(x) > nrows_max: # Stratify accordingly target_name = y.name data = pd.concat((x, y), axis="columns") _, subset = train_test_split( data, test_size=nrows_max, stratify=data[target_name], shuffle=True, random_state=seed, ) x = subset.drop(target_name, axis="columns") y = subset[target_name] # We need to convert categorical columns to string for openml categorical_mask = [self.categorical_mask[i] for i in sorted_column_idxs] columns = list(x.columns) return Dataset( # Technically this is not the same but it's where it was derived from dataset=self.dataset, x=x, y=y, categorical_mask=categorical_mask, columns=columns, ) ```

101 features

Class (target)nominal9 unique values
0 missing
V32numeric2 unique values
0 missing
V34numeric2 unique values
0 missing
V43numeric2 unique values
0 missing
V49numeric2 unique values
0 missing
V65numeric2 unique values
0 missing
V71numeric4 unique values
0 missing
V82numeric2 unique values
0 missing
V83numeric2 unique values
0 missing
V86numeric2 unique values
0 missing
V87numeric2 unique values
0 missing
V88numeric2 unique values
0 missing
V118numeric2 unique values
0 missing
V145numeric2 unique values
0 missing
V150numeric2 unique values
0 missing
V157numeric2 unique values
0 missing
V166numeric2 unique values
0 missing
V173numeric2 unique values
0 missing
V178numeric2 unique values
0 missing
V182numeric2 unique values
0 missing
V185numeric2 unique values
0 missing
V199numeric3 unique values
0 missing
V203numeric2 unique values
0 missing
V213numeric2 unique values
0 missing
V221numeric2 unique values
0 missing
V227numeric2 unique values
0 missing
V238numeric2 unique values
0 missing
V257numeric2 unique values
0 missing
V266numeric2 unique values
0 missing
V277numeric2 unique values
0 missing
V288numeric3 unique values
0 missing
V302numeric2 unique values
0 missing
V311numeric2 unique values
0 missing
V315numeric3 unique values
0 missing
V333numeric2 unique values
0 missing
V337numeric2 unique values
0 missing
V339numeric2 unique values
0 missing
V345numeric2 unique values
0 missing
V367numeric2 unique values
0 missing
V368numeric2 unique values
0 missing
V370numeric2 unique values
0 missing
V375numeric2 unique values
0 missing
V376numeric2 unique values
0 missing
V382numeric2 unique values
0 missing
V389numeric2 unique values
0 missing
V393numeric2 unique values
0 missing
V407numeric2 unique values
0 missing
V409numeric2 unique values
0 missing
V411numeric3 unique values
0 missing
V421numeric3 unique values
0 missing
V429numeric2 unique values
0 missing
V432numeric2 unique values
0 missing
V438numeric2 unique values
0 missing
V445numeric2 unique values
0 missing
V460numeric2 unique values
0 missing
V462numeric2 unique values
0 missing
V464numeric2 unique values
0 missing
V491numeric2 unique values
0 missing
V493numeric2 unique values
0 missing
V494numeric2 unique values
0 missing
V499numeric3 unique values
0 missing
V511numeric2 unique values
0 missing
V516numeric2 unique values
0 missing
V525numeric2 unique values
0 missing
V532numeric2 unique values
0 missing
V533numeric2 unique values
0 missing
V537numeric2 unique values
0 missing
V541numeric3 unique values
0 missing
V555numeric4 unique values
0 missing
V559numeric2 unique values
0 missing
V560numeric2 unique values
0 missing
V564numeric2 unique values
0 missing
V570numeric2 unique values
0 missing
V586numeric2 unique values
0 missing
V621numeric2 unique values
0 missing
V624numeric2 unique values
0 missing
V635numeric2 unique values
0 missing
V653numeric2 unique values
0 missing
V654numeric2 unique values
0 missing
V679numeric2 unique values
0 missing
V687numeric2 unique values
0 missing
V699numeric2 unique values
0 missing
V700numeric2 unique values
0 missing
V705numeric4 unique values
0 missing
V715numeric3 unique values
0 missing
V720numeric2 unique values
0 missing
V726numeric3 unique values
0 missing
V733numeric2 unique values
0 missing
V738numeric2 unique values
0 missing
V743numeric2 unique values
0 missing
V747numeric2 unique values
0 missing
V748numeric2 unique values
0 missing
V764numeric2 unique values
0 missing
V765numeric2 unique values
0 missing
V783numeric3 unique values
0 missing
V787numeric3 unique values
0 missing
V804numeric2 unique values
0 missing
V806numeric2 unique values
0 missing
V830numeric2 unique values
0 missing
V846numeric2 unique values
0 missing
V855numeric3 unique values
0 missing

19 properties

1080
Number of instances (rows) of the dataset.
101
Number of attributes (columns) of the dataset.
9
Number of distinct values of the target attribute (if it is nominal).
0
Number of missing values in the dataset.
0
Number of instances with at least one value missing.
100
Number of numeric attributes.
1
Number of nominal attributes.
0
Percentage of binary attributes.
0
Percentage of instances having missing values.
0
Average class difference between consecutive instances.
0
Percentage of missing values.
0.09
Number of attributes divided by the number of instances.
99.01
Percentage of numeric attributes.
11.11
Percentage of instances belonging to the most frequent class.
0.99
Percentage of nominal attributes.
120
Number of instances belonging to the most frequent class.
11.11
Percentage of instances belonging to the least frequent class.
120
Number of instances belonging to the least frequent class.
0
Number of binary attributes.

0 tasks

Define a new task