OpenML
APSFailure_seed_3_nrows_2000_nclasses_10_ncols_100_stratify_True

APSFailure_seed_3_nrows_2000_nclasses_10_ncols_100_stratify_True

active ARFF Public Domain (CC0) Visibility: public Uploaded 17-11-2022 by Eddie Bergman
0 likes downloaded by 0 people , 0 total downloads 0 issues 0 downvotes
Issue #Downvotes for this reason By


Loading wiki
Help us complete this description Edit
Subsampling of the dataset APSFailure (41138) with seed=3 args.nrows=2000 args.ncols=100 args.nclasses=10 args.no_stratify=True Generated with the following source code: ```python def subsample( self, seed: int, nrows_max: int = 2_000, ncols_max: int = 100, nclasses_max: int = 10, stratified: bool = True, ) -> Dataset: rng = np.random.default_rng(seed) x = self.x y = self.y # Uniformly sample classes = y.unique() if len(classes) > nclasses_max: vcs = y.value_counts() selected_classes = rng.choice( classes, size=nclasses_max, replace=False, p=vcs / sum(vcs), ) # Select the indices where one of these classes is present idxs = y.index[y.isin(classes)] x = x.iloc[idxs] y = y.iloc[idxs] # Uniformly sample columns if required if len(x.columns) > ncols_max: columns_idxs = rng.choice( list(range(len(x.columns))), size=ncols_max, replace=False ) sorted_column_idxs = sorted(columns_idxs) selected_columns = list(x.columns[sorted_column_idxs]) x = x[selected_columns] else: sorted_column_idxs = list(range(len(x.columns))) if len(x) > nrows_max: # Stratify accordingly target_name = y.name data = pd.concat((x, y), axis="columns") _, subset = train_test_split( data, test_size=nrows_max, stratify=data[target_name], shuffle=True, random_state=seed, ) x = subset.drop(target_name, axis="columns") y = subset[target_name] # We need to convert categorical columns to string for openml categorical_mask = [self.categorical_mask[i] for i in sorted_column_idxs] columns = list(x.columns) return Dataset( # Technically this is not the same but it's where it was derived from dataset=self.dataset, x=x, y=y, categorical_mask=categorical_mask, columns=columns, ) ```

101 features

class (target)nominal2 unique values
0 missing
aa_000numeric1603 unique values
0 missing
ac_000numeric509 unique values
114 missing
ad_000numeric499 unique values
506 missing
af_000numeric34 unique values
84 missing
ag_000numeric13 unique values
15 missing
ag_001numeric38 unique values
15 missing
ag_002numeric102 unique values
15 missing
ag_004numeric1529 unique values
15 missing
ag_006numeric1889 unique values
15 missing
ag_007numeric1434 unique values
15 missing
ah_000numeric1955 unique values
13 missing
ai_000numeric184 unique values
11 missing
an_000numeric1964 unique values
13 missing
ao_000numeric1964 unique values
10 missing
ar_000numeric18 unique values
92 missing
au_000numeric3 unique values
11 missing
av_000numeric703 unique values
84 missing
ay_000numeric19 unique values
15 missing
ay_001numeric34 unique values
15 missing
ay_002numeric34 unique values
15 missing
ay_003numeric37 unique values
15 missing
ay_005numeric987 unique values
15 missing
ay_007numeric1712 unique values
15 missing
ay_008numeric1773 unique values
15 missing
ay_009numeric26 unique values
15 missing
az_000numeric1430 unique values
15 missing
az_003numeric1537 unique values
15 missing
az_006numeric1052 unique values
15 missing
az_007numeric223 unique values
15 missing
az_009numeric26 unique values
15 missing
ba_001numeric1865 unique values
17 missing
ba_005numeric1597 unique values
17 missing
ba_006numeric1576 unique values
17 missing
ba_008numeric768 unique values
17 missing
bc_000numeric417 unique values
93 missing
bd_000numeric589 unique values
93 missing
bf_000numeric177 unique values
84 missing
bh_000numeric1700 unique values
13 missing
bi_000numeric1963 unique values
10 missing
bj_000numeric1933 unique values
10 missing
bk_000numeric1142 unique values
715 missing
bm_000numeric560 unique values
1281 missing
bn_000numeric377 unique values
1438 missing
bo_000numeric290 unique values
1516 missing
bp_000numeric244 unique values
1555 missing
bt_000numeric1968 unique values
7 missing
bv_000numeric1968 unique values
15 missing
bz_000numeric1198 unique values
92 missing
ca_000numeric1790 unique values
148 missing
cd_000numeric1 unique values
16 missing
cf_000numeric51 unique values
506 missing
ci_000numeric1962 unique values
10 missing
cj_000numeric379 unique values
10 missing
ck_000numeric1959 unique values
10 missing
cl_000numeric119 unique values
329 missing
cm_000numeric330 unique values
338 missing
cn_000numeric63 unique values
17 missing
cn_002numeric916 unique values
17 missing
cn_005numeric1850 unique values
17 missing
cn_007numeric1421 unique values
17 missing
cn_008numeric1174 unique values
17 missing
cn_009numeric509 unique values
17 missing
cp_000numeric367 unique values
92 missing
cs_007numeric1641 unique values
15 missing
cs_009numeric11 unique values
15 missing
cv_000numeric1484 unique values
472 missing
cx_000numeric1277 unique values
472 missing
cy_000numeric96 unique values
472 missing
cz_000numeric831 unique values
472 missing
da_000numeric18 unique values
472 missing
db_000numeric54 unique values
472 missing
dd_000numeric1131 unique values
84 missing
df_000numeric37 unique values
131 missing
dh_000numeric117 unique values
131 missing
di_000numeric288 unique values
131 missing
dj_000numeric8 unique values
131 missing
dk_000numeric11 unique values
131 missing
dl_000numeric8 unique values
131 missing
dm_000numeric9 unique values
131 missing
dn_000numeric1658 unique values
15 missing
dp_000numeric1173 unique values
93 missing
dq_000numeric398 unique values
93 missing
dr_000numeric367 unique values
93 missing
dt_000numeric1369 unique values
93 missing
du_000numeric1581 unique values
93 missing
dv_000numeric1628 unique values
93 missing
dx_000numeric601 unique values
92 missing
dy_000numeric404 unique values
92 missing
dz_000numeric3 unique values
92 missing
ea_000numeric14 unique values
92 missing
eb_000numeric1184 unique values
131 missing
ec_00numeric1542 unique values
357 missing
ed_000numeric855 unique values
329 missing
ee_000numeric1918 unique values
15 missing
ee_001numeric1901 unique values
15 missing
ee_002numeric1783 unique values
15 missing
ee_007numeric1519 unique values
15 missing
ee_009numeric761 unique values
15 missing
ef_000numeric4 unique values
92 missing
eg_000numeric6 unique values
92 missing

19 properties

2000
Number of instances (rows) of the dataset.
101
Number of attributes (columns) of the dataset.
2
Number of distinct values of the target attribute (if it is nominal).
15654
Number of missing values in the dataset.
1885
Number of instances with at least one value missing.
100
Number of numeric attributes.
1
Number of nominal attributes.
0.99
Percentage of binary attributes.
94.25
Percentage of instances having missing values.
0.96
Average class difference between consecutive instances.
7.75
Percentage of missing values.
0.05
Number of attributes divided by the number of instances.
99.01
Percentage of numeric attributes.
98.2
Percentage of instances belonging to the most frequent class.
0.99
Percentage of nominal attributes.
1964
Number of instances belonging to the most frequent class.
1.8
Percentage of instances belonging to the least frequent class.
36
Number of instances belonging to the least frequent class.
1
Number of binary attributes.

0 tasks

Define a new task