Data
kr-vs-kp_seed_0_nrows_2000_nclasses_10_ncols_100_stratify_True

kr-vs-kp_seed_0_nrows_2000_nclasses_10_ncols_100_stratify_True

active ARFF Publicly available Visibility: public Uploaded 17-11-2022 by Eddie Bergman
0 likes downloaded by 0 people , 0 total downloads 0 issues 0 downvotes
Issue #Downvotes for this reason By


Loading wiki
Help us complete this description Edit
Subsampling of the dataset kr-vs-kp (3) with seed=0 args.nrows=2000 args.ncols=100 args.nclasses=10 args.no_stratify=True Generated with the following source code: ```python def subsample( self, seed: int, nrows_max: int = 2_000, ncols_max: int = 100, nclasses_max: int = 10, stratified: bool = True, ) -> Dataset: rng = np.random.default_rng(seed) x = self.x y = self.y # Uniformly sample classes = y.unique() if len(classes) > nclasses_max: vcs = y.value_counts() selected_classes = rng.choice( classes, size=nclasses_max, replace=False, p=vcs / sum(vcs), ) # Select the indices where one of these classes is present idxs = y.index[y.isin(classes)] x = x.iloc[idxs] y = y.iloc[idxs] # Uniformly sample columns if required if len(x.columns) > ncols_max: columns_idxs = rng.choice( list(range(len(x.columns))), size=ncols_max, replace=False ) sorted_column_idxs = sorted(columns_idxs) selected_columns = list(x.columns[sorted_column_idxs]) x = x[selected_columns] else: sorted_column_idxs = list(range(len(x.columns))) if len(x) > nrows_max: # Stratify accordingly target_name = y.name data = pd.concat((x, y), axis="columns") _, subset = train_test_split( data, test_size=nrows_max, stratify=data[target_name], shuffle=True, random_state=seed, ) x = subset.drop(target_name, axis="columns") y = subset[target_name] # We need to convert categorical columns to string for openml categorical_mask = [self.categorical_mask[i] for i in sorted_column_idxs] columns = list(x.columns) return Dataset( # Technically this is not the same but it's where it was derived from dataset=self.dataset, x=x, y=y, categorical_mask=categorical_mask, columns=columns, ) ```

37 features

class (target)nominal2 unique values
0 missing
bkblknominal2 unique values
0 missing
bknwynominal2 unique values
0 missing
bkon8nominal2 unique values
0 missing
bkonanominal2 unique values
0 missing
bksprnominal2 unique values
0 missing
bkxbqnominal2 unique values
0 missing
bkxcrnominal2 unique values
0 missing
bkxwpnominal2 unique values
0 missing
blxwpnominal2 unique values
0 missing
bxqsqnominal2 unique values
0 missing
cntxtnominal2 unique values
0 missing
dsoppnominal2 unique values
0 missing
dwipdnominal2 unique values
0 missing
hdchknominal2 unique values
0 missing
katrinominal3 unique values
0 missing
mulchnominal2 unique values
0 missing
qxmsqnominal2 unique values
0 missing
r2ar8nominal2 unique values
0 missing
reskdnominal2 unique values
0 missing
reskrnominal2 unique values
0 missing
rimmxnominal2 unique values
0 missing
rkxwpnominal2 unique values
0 missing
rxmsqnominal2 unique values
0 missing
simplnominal2 unique values
0 missing
skachnominal2 unique values
0 missing
skewrnominal2 unique values
0 missing
skrxpnominal2 unique values
0 missing
spcopnominal1 unique values
0 missing
stlmtnominal2 unique values
0 missing
thrsknominal2 unique values
0 missing
wkctinominal2 unique values
0 missing
wkna8nominal2 unique values
0 missing
wkncknominal2 unique values
0 missing
wkovlnominal2 unique values
0 missing
wkposnominal2 unique values
0 missing
wtoegnominal2 unique values
0 missing

19 properties

2000
Number of instances (rows) of the dataset.
37
Number of attributes (columns) of the dataset.
2
Number of distinct values of the target attribute (if it is nominal).
0
Number of missing values in the dataset.
0
Number of instances with at least one value missing.
0
Number of numeric attributes.
37
Number of nominal attributes.
94.59
Percentage of binary attributes.
0
Percentage of instances having missing values.
0.48
Average class difference between consecutive instances.
0
Percentage of missing values.
0.02
Number of attributes divided by the number of instances.
0
Percentage of numeric attributes.
52.2
Percentage of instances belonging to the most frequent class.
100
Percentage of nominal attributes.
1044
Number of instances belonging to the most frequent class.
47.8
Percentage of instances belonging to the least frequent class.
956
Number of instances belonging to the least frequent class.
35
Number of binary attributes.

0 tasks

Define a new task