Data
dna_seed_1_nrows_2000_nclasses_10_ncols_100_stratify_True

dna_seed_1_nrows_2000_nclasses_10_ncols_100_stratify_True

active ARFF public Visibility: public Uploaded 17-11-2022 by Eddie Bergman
0 likes downloaded by 0 people , 0 total downloads 0 issues 0 downvotes
Issue #Downvotes for this reason By


Loading wiki
Help us complete this description Edit
Subsampling of the dataset dna (40670) with seed=1 args.nrows=2000 args.ncols=100 args.nclasses=10 args.no_stratify=True Generated with the following source code: ```python def subsample( self, seed: int, nrows_max: int = 2_000, ncols_max: int = 100, nclasses_max: int = 10, stratified: bool = True, ) -> Dataset: rng = np.random.default_rng(seed) x = self.x y = self.y # Uniformly sample classes = y.unique() if len(classes) > nclasses_max: vcs = y.value_counts() selected_classes = rng.choice( classes, size=nclasses_max, replace=False, p=vcs / sum(vcs), ) # Select the indices where one of these classes is present idxs = y.index[y.isin(classes)] x = x.iloc[idxs] y = y.iloc[idxs] # Uniformly sample columns if required if len(x.columns) > ncols_max: columns_idxs = rng.choice( list(range(len(x.columns))), size=ncols_max, replace=False ) sorted_column_idxs = sorted(columns_idxs) selected_columns = list(x.columns[sorted_column_idxs]) x = x[selected_columns] else: sorted_column_idxs = list(range(len(x.columns))) if len(x) > nrows_max: # Stratify accordingly target_name = y.name data = pd.concat((x, y), axis="columns") _, subset = train_test_split( data, test_size=nrows_max, stratify=data[target_name], shuffle=True, random_state=seed, ) x = subset.drop(target_name, axis="columns") y = subset[target_name] # We need to convert categorical columns to string for openml categorical_mask = [self.categorical_mask[i] for i in sorted_column_idxs] columns = list(x.columns) return Dataset( # Technically this is not the same but it's where it was derived from dataset=self.dataset, x=x, y=y, categorical_mask=categorical_mask, columns=columns, ) ```

101 features

class (target)nominal3 unique values
0 missing
A2nominal2 unique values
0 missing
A6nominal2 unique values
0 missing
A7nominal2 unique values
0 missing
A8nominal2 unique values
0 missing
A9nominal2 unique values
0 missing
A12nominal2 unique values
0 missing
A13nominal2 unique values
0 missing
A14nominal2 unique values
0 missing
A15nominal2 unique values
0 missing
A16nominal2 unique values
0 missing
A22nominal2 unique values
0 missing
A24nominal2 unique values
0 missing
A25nominal2 unique values
0 missing
A26nominal2 unique values
0 missing
A28nominal2 unique values
0 missing
A31nominal2 unique values
0 missing
A33nominal2 unique values
0 missing
A34nominal2 unique values
0 missing
A35nominal2 unique values
0 missing
A37nominal2 unique values
0 missing
A38nominal2 unique values
0 missing
A39nominal2 unique values
0 missing
A41nominal2 unique values
0 missing
A43nominal2 unique values
0 missing
A44nominal2 unique values
0 missing
A46nominal2 unique values
0 missing
A48nominal2 unique values
0 missing
A50nominal2 unique values
0 missing
A53nominal2 unique values
0 missing
A54nominal2 unique values
0 missing
A55nominal2 unique values
0 missing
A58nominal2 unique values
0 missing
A59nominal2 unique values
0 missing
A61nominal2 unique values
0 missing
A62nominal2 unique values
0 missing
A65nominal2 unique values
0 missing
A66nominal2 unique values
0 missing
A71nominal2 unique values
0 missing
A72nominal2 unique values
0 missing
A73nominal2 unique values
0 missing
A76nominal2 unique values
0 missing
A77nominal2 unique values
0 missing
A79nominal2 unique values
0 missing
A82nominal2 unique values
0 missing
A83nominal2 unique values
0 missing
A85nominal2 unique values
0 missing
A86nominal2 unique values
0 missing
A87nominal2 unique values
0 missing
A88nominal2 unique values
0 missing
A90nominal2 unique values
0 missing
A91nominal2 unique values
0 missing
A95nominal2 unique values
0 missing
A96nominal2 unique values
0 missing
A97nominal2 unique values
0 missing
A98nominal2 unique values
0 missing
A99nominal2 unique values
0 missing
A103nominal2 unique values
0 missing
A105nominal2 unique values
0 missing
A107nominal2 unique values
0 missing
A110nominal2 unique values
0 missing
A111nominal2 unique values
0 missing
A112nominal2 unique values
0 missing
A114nominal2 unique values
0 missing
A116nominal2 unique values
0 missing
A117nominal2 unique values
0 missing
A120nominal2 unique values
0 missing
A121nominal2 unique values
0 missing
A123nominal2 unique values
0 missing
A124nominal2 unique values
0 missing
A125nominal2 unique values
0 missing
A128nominal2 unique values
0 missing
A129nominal2 unique values
0 missing
A130nominal2 unique values
0 missing
A132nominal2 unique values
0 missing
A134nominal2 unique values
0 missing
A135nominal2 unique values
0 missing
A136nominal2 unique values
0 missing
A137nominal2 unique values
0 missing
A138nominal2 unique values
0 missing
A139nominal2 unique values
0 missing
A140nominal2 unique values
0 missing
A142nominal2 unique values
0 missing
A144nominal2 unique values
0 missing
A145nominal2 unique values
0 missing
A147nominal2 unique values
0 missing
A149nominal2 unique values
0 missing
A156nominal2 unique values
0 missing
A157nominal2 unique values
0 missing
A160nominal2 unique values
0 missing
A162nominal2 unique values
0 missing
A163nominal2 unique values
0 missing
A166nominal2 unique values
0 missing
A168nominal2 unique values
0 missing
A169nominal2 unique values
0 missing
A170nominal2 unique values
0 missing
A171nominal2 unique values
0 missing
A172nominal2 unique values
0 missing
A174nominal2 unique values
0 missing
A175nominal2 unique values
0 missing
A176nominal2 unique values
0 missing

19 properties

2000
Number of instances (rows) of the dataset.
101
Number of attributes (columns) of the dataset.
3
Number of distinct values of the target attribute (if it is nominal).
0
Number of missing values in the dataset.
0
Number of instances with at least one value missing.
0
Number of numeric attributes.
101
Number of nominal attributes.
0
Percentage of missing values.
0.39
Average class difference between consecutive instances.
0
Percentage of numeric attributes.
0.05
Number of attributes divided by the number of instances.
100
Percentage of nominal attributes.
51.9
Percentage of instances belonging to the most frequent class.
1038
Number of instances belonging to the most frequent class.
24
Percentage of instances belonging to the least frequent class.
480
Number of instances belonging to the least frequent class.
100
Number of binary attributes.
99.01
Percentage of binary attributes.
0
Percentage of instances having missing values.

0 tasks

Define a new task