Data
APSFailure_seed_0_nrows_2000_nclasses_10_ncols_100_stratify_True

APSFailure_seed_0_nrows_2000_nclasses_10_ncols_100_stratify_True

active ARFF Public Domain (CC0) Visibility: public Uploaded 17-11-2022 by Eddie Bergman
0 likes downloaded by 0 people , 0 total downloads 0 issues 0 downvotes
Issue #Downvotes for this reason By


Loading wiki
Help us complete this description Edit
Subsampling of the dataset APSFailure (41138) with seed=0 args.nrows=2000 args.ncols=100 args.nclasses=10 args.no_stratify=True Generated with the following source code: ```python def subsample( self, seed: int, nrows_max: int = 2_000, ncols_max: int = 100, nclasses_max: int = 10, stratified: bool = True, ) -> Dataset: rng = np.random.default_rng(seed) x = self.x y = self.y # Uniformly sample classes = y.unique() if len(classes) > nclasses_max: vcs = y.value_counts() selected_classes = rng.choice( classes, size=nclasses_max, replace=False, p=vcs / sum(vcs), ) # Select the indices where one of these classes is present idxs = y.index[y.isin(classes)] x = x.iloc[idxs] y = y.iloc[idxs] # Uniformly sample columns if required if len(x.columns) > ncols_max: columns_idxs = rng.choice( list(range(len(x.columns))), size=ncols_max, replace=False ) sorted_column_idxs = sorted(columns_idxs) selected_columns = list(x.columns[sorted_column_idxs]) x = x[selected_columns] else: sorted_column_idxs = list(range(len(x.columns))) if len(x) > nrows_max: # Stratify accordingly target_name = y.name data = pd.concat((x, y), axis="columns") _, subset = train_test_split( data, test_size=nrows_max, stratify=data[target_name], shuffle=True, random_state=seed, ) x = subset.drop(target_name, axis="columns") y = subset[target_name] # We need to convert categorical columns to string for openml categorical_mask = [self.categorical_mask[i] for i in sorted_column_idxs] columns = list(x.columns) return Dataset( # Technically this is not the same but it's where it was derived from dataset=self.dataset, x=x, y=y, categorical_mask=categorical_mask, columns=columns, ) ```

101 features

class (target)nominal2 unique values
0 missing
aa_000numeric1575 unique values
0 missing
ab_000numeric6 unique values
1532 missing
ac_000numeric482 unique values
131 missing
ad_000numeric469 unique values
475 missing
af_000numeric36 unique values
105 missing
ag_002numeric92 unique values
24 missing
ag_003numeric389 unique values
24 missing
ag_004numeric1487 unique values
24 missing
ag_007numeric1408 unique values
24 missing
ag_008numeric1068 unique values
24 missing
ai_000numeric193 unique values
20 missing
aj_000numeric163 unique values
20 missing
ak_000numeric9 unique values
164 missing
ao_000numeric1965 unique values
20 missing
aq_000numeric1887 unique values
20 missing
av_000numeric677 unique values
104 missing
ay_000numeric14 unique values
24 missing
ay_005numeric900 unique values
24 missing
ay_006numeric1369 unique values
24 missing
ay_009numeric17 unique values
24 missing
az_001numeric1051 unique values
24 missing
az_002numeric1177 unique values
24 missing
az_003numeric1455 unique values
24 missing
az_004numeric1751 unique values
24 missing
az_005numeric1918 unique values
24 missing
az_006numeric1009 unique values
24 missing
az_007numeric197 unique values
24 missing
az_008numeric74 unique values
24 missing
ba_000numeric1929 unique values
25 missing
ba_001numeric1858 unique values
25 missing
ba_003numeric1702 unique values
25 missing
ba_004numeric1621 unique values
25 missing
ba_005numeric1561 unique values
25 missing
ba_008numeric739 unique values
25 missing
ba_009numeric425 unique values
25 missing
bb_000numeric1968 unique values
24 missing
bc_000numeric397 unique values
109 missing
be_000numeric653 unique values
105 missing
bi_000numeric1955 unique values
20 missing
bl_000numeric903 unique values
934 missing
bm_000numeric503 unique values
1348 missing
bn_000numeric344 unique values
1483 missing
bo_000numeric265 unique values
1557 missing
bp_000numeric216 unique values
1596 missing
bq_000numeric177 unique values
1628 missing
bs_000numeric1712 unique values
25 missing
bv_000numeric1966 unique values
26 missing
by_000numeric1518 unique values
16 missing
bz_000numeric1133 unique values
109 missing
ca_000numeric1762 unique values
168 missing
cb_000numeric1905 unique values
25 missing
cd_000numeric1 unique values
21 missing
ce_000numeric1409 unique values
104 missing
cg_000numeric187 unique values
475 missing
ch_000numeric1 unique values
475 missing
ci_000numeric1971 unique values
9 missing
cj_000numeric398 unique values
9 missing
cl_000numeric119 unique values
323 missing
cm_000numeric348 unique values
332 missing
cn_000numeric63 unique values
25 missing
cn_001numeric286 unique values
25 missing
cn_002numeric900 unique values
25 missing
cn_004numeric1914 unique values
25 missing
cn_005numeric1849 unique values
25 missing
cn_009numeric511 unique values
25 missing
cp_000numeric349 unique values
109 missing
cq_000numeric1966 unique values
26 missing
cr_000numeric4 unique values
1532 missing
cs_001numeric675 unique values
24 missing
cs_004numeric1813 unique values
24 missing
cs_005numeric1912 unique values
24 missing
cs_007numeric1598 unique values
24 missing
cs_008numeric277 unique values
24 missing
cs_009numeric10 unique values
24 missing
ct_000numeric596 unique values
425 missing
cv_000numeric1521 unique values
425 missing
da_000numeric17 unique values
425 missing
de_000numeric435 unique values
109 missing
dj_000numeric5 unique values
151 missing
dk_000numeric10 unique values
151 missing
dl_000numeric10 unique values
151 missing
dm_000numeric10 unique values
151 missing
dn_000numeric1619 unique values
26 missing
dp_000numeric1132 unique values
109 missing
dq_000numeric448 unique values
109 missing
dt_000numeric1321 unique values
109 missing
du_000numeric1512 unique values
109 missing
dz_000numeric6 unique values
109 missing
ea_000numeric15 unique values
109 missing
eb_000numeric1087 unique values
151 missing
ec_00numeric1550 unique values
350 missing
ed_000numeric847 unique values
323 missing
ee_000numeric1918 unique values
24 missing
ee_001numeric1890 unique values
24 missing
ee_002numeric1781 unique values
24 missing
ee_005numeric1659 unique values
24 missing
ee_006numeric1580 unique values
24 missing
ee_008numeric1200 unique values
24 missing
ee_009numeric686 unique values
24 missing
eg_000numeric8 unique values
109 missing

19 properties

2000
Number of instances (rows) of the dataset.
101
Number of attributes (columns) of the dataset.
2
Number of distinct values of the target attribute (if it is nominal).
19825
Number of missing values in the dataset.
1974
Number of instances with at least one value missing.
100
Number of numeric attributes.
1
Number of nominal attributes.
0.99
Percentage of binary attributes.
98.7
Percentage of instances having missing values.
9.81
Percentage of missing values.
0.96
Average class difference between consecutive instances.
99.01
Percentage of numeric attributes.
0.05
Number of attributes divided by the number of instances.
0.99
Percentage of nominal attributes.
98.2
Percentage of instances belonging to the most frequent class.
1964
Number of instances belonging to the most frequent class.
1.8
Percentage of instances belonging to the least frequent class.
36
Number of instances belonging to the least frequent class.
1
Number of binary attributes.

0 tasks

Define a new task