Data
cnae-9_seed_4_nrows_2000_nclasses_10_ncols_100_stratify_True

cnae-9_seed_4_nrows_2000_nclasses_10_ncols_100_stratify_True

active ARFF Publicly available Visibility: public Uploaded 17-11-2022 by Eddie Bergman
0 likes downloaded by 0 people , 0 total downloads 0 issues 0 downvotes
Issue #Downvotes for this reason By


Loading wiki
Help us complete this description Edit
Subsampling of the dataset cnae-9 (1468) with seed=4 args.nrows=2000 args.ncols=100 args.nclasses=10 args.no_stratify=True Generated with the following source code: ```python def subsample( self, seed: int, nrows_max: int = 2_000, ncols_max: int = 100, nclasses_max: int = 10, stratified: bool = True, ) -> Dataset: rng = np.random.default_rng(seed) x = self.x y = self.y # Uniformly sample classes = y.unique() if len(classes) > nclasses_max: vcs = y.value_counts() selected_classes = rng.choice( classes, size=nclasses_max, replace=False, p=vcs / sum(vcs), ) # Select the indices where one of these classes is present idxs = y.index[y.isin(classes)] x = x.iloc[idxs] y = y.iloc[idxs] # Uniformly sample columns if required if len(x.columns) > ncols_max: columns_idxs = rng.choice( list(range(len(x.columns))), size=ncols_max, replace=False ) sorted_column_idxs = sorted(columns_idxs) selected_columns = list(x.columns[sorted_column_idxs]) x = x[selected_columns] else: sorted_column_idxs = list(range(len(x.columns))) if len(x) > nrows_max: # Stratify accordingly target_name = y.name data = pd.concat((x, y), axis="columns") _, subset = train_test_split( data, test_size=nrows_max, stratify=data[target_name], shuffle=True, random_state=seed, ) x = subset.drop(target_name, axis="columns") y = subset[target_name] # We need to convert categorical columns to string for openml categorical_mask = [self.categorical_mask[i] for i in sorted_column_idxs] columns = list(x.columns) return Dataset( # Technically this is not the same but it's where it was derived from dataset=self.dataset, x=x, y=y, categorical_mask=categorical_mask, columns=columns, ) ```

101 features

Class (target)nominal9 unique values
0 missing
V23numeric2 unique values
0 missing
V30numeric2 unique values
0 missing
V48numeric2 unique values
0 missing
V62numeric2 unique values
0 missing
V66numeric2 unique values
0 missing
V73numeric3 unique values
0 missing
V97numeric2 unique values
0 missing
V108numeric2 unique values
0 missing
V110numeric2 unique values
0 missing
V135numeric3 unique values
0 missing
V142numeric2 unique values
0 missing
V145numeric2 unique values
0 missing
V151numeric3 unique values
0 missing
V161numeric3 unique values
0 missing
V164numeric2 unique values
0 missing
V167numeric2 unique values
0 missing
V171numeric2 unique values
0 missing
V182numeric2 unique values
0 missing
V183numeric2 unique values
0 missing
V189numeric2 unique values
0 missing
V218numeric2 unique values
0 missing
V222numeric2 unique values
0 missing
V243numeric2 unique values
0 missing
V263numeric2 unique values
0 missing
V285numeric2 unique values
0 missing
V290numeric2 unique values
0 missing
V292numeric2 unique values
0 missing
V296numeric2 unique values
0 missing
V307numeric2 unique values
0 missing
V308numeric2 unique values
0 missing
V327numeric2 unique values
0 missing
V328numeric2 unique values
0 missing
V337numeric2 unique values
0 missing
V347numeric2 unique values
0 missing
V369numeric2 unique values
0 missing
V373numeric3 unique values
0 missing
V389numeric2 unique values
0 missing
V398numeric2 unique values
0 missing
V399numeric2 unique values
0 missing
V402numeric2 unique values
0 missing
V403numeric3 unique values
0 missing
V405numeric2 unique values
0 missing
V406numeric2 unique values
0 missing
V409numeric2 unique values
0 missing
V414numeric2 unique values
0 missing
V420numeric2 unique values
0 missing
V423numeric3 unique values
0 missing
V428numeric2 unique values
0 missing
V440numeric2 unique values
0 missing
V448numeric2 unique values
0 missing
V453numeric2 unique values
0 missing
V454numeric2 unique values
0 missing
V466numeric2 unique values
0 missing
V468numeric2 unique values
0 missing
V472numeric2 unique values
0 missing
V480numeric2 unique values
0 missing
V483numeric2 unique values
0 missing
V485numeric2 unique values
0 missing
V523numeric2 unique values
0 missing
V526numeric2 unique values
0 missing
V534numeric2 unique values
0 missing
V550numeric2 unique values
0 missing
V558numeric2 unique values
0 missing
V563numeric2 unique values
0 missing
V579numeric2 unique values
0 missing
V581numeric3 unique values
0 missing
V618numeric4 unique values
0 missing
V619numeric3 unique values
0 missing
V648numeric3 unique values
0 missing
V669numeric4 unique values
0 missing
V675numeric2 unique values
0 missing
V678numeric2 unique values
0 missing
V685numeric2 unique values
0 missing
V693numeric2 unique values
0 missing
V701numeric3 unique values
0 missing
V702numeric2 unique values
0 missing
V715numeric3 unique values
0 missing
V716numeric2 unique values
0 missing
V719numeric2 unique values
0 missing
V732numeric2 unique values
0 missing
V736numeric2 unique values
0 missing
V739numeric2 unique values
0 missing
V741numeric2 unique values
0 missing
V744numeric2 unique values
0 missing
V755numeric2 unique values
0 missing
V756numeric2 unique values
0 missing
V758numeric2 unique values
0 missing
V766numeric2 unique values
0 missing
V770numeric2 unique values
0 missing
V772numeric2 unique values
0 missing
V774numeric2 unique values
0 missing
V779numeric2 unique values
0 missing
V817numeric2 unique values
0 missing
V821numeric2 unique values
0 missing
V825numeric2 unique values
0 missing
V832numeric4 unique values
0 missing
V844numeric2 unique values
0 missing
V846numeric2 unique values
0 missing
V847numeric2 unique values
0 missing
V853numeric2 unique values
0 missing

19 properties

1080
Number of instances (rows) of the dataset.
101
Number of attributes (columns) of the dataset.
9
Number of distinct values of the target attribute (if it is nominal).
0
Number of missing values in the dataset.
0
Number of instances with at least one value missing.
100
Number of numeric attributes.
1
Number of nominal attributes.
11.11
Percentage of instances belonging to the most frequent class.
0.99
Percentage of nominal attributes.
120
Number of instances belonging to the most frequent class.
11.11
Percentage of instances belonging to the least frequent class.
120
Number of instances belonging to the least frequent class.
0
Number of binary attributes.
0
Percentage of binary attributes.
0
Percentage of instances having missing values.
0
Average class difference between consecutive instances.
0
Percentage of missing values.
0.09
Number of attributes divided by the number of instances.
99.01
Percentage of numeric attributes.

0 tasks

Define a new task