Data
APSFailure_seed_1_nrows_2000_nclasses_10_ncols_100_stratify_True

APSFailure_seed_1_nrows_2000_nclasses_10_ncols_100_stratify_True

active ARFF Public Domain (CC0) Visibility: public Uploaded 17-11-2022 by Eddie Bergman
0 likes downloaded by 0 people , 0 total downloads 0 issues 0 downvotes
Issue #Downvotes for this reason By


Loading wiki
Help us complete this description Edit
Subsampling of the dataset APSFailure (41138) with seed=1 args.nrows=2000 args.ncols=100 args.nclasses=10 args.no_stratify=True Generated with the following source code: ```python def subsample( self, seed: int, nrows_max: int = 2_000, ncols_max: int = 100, nclasses_max: int = 10, stratified: bool = True, ) -> Dataset: rng = np.random.default_rng(seed) x = self.x y = self.y # Uniformly sample classes = y.unique() if len(classes) > nclasses_max: vcs = y.value_counts() selected_classes = rng.choice( classes, size=nclasses_max, replace=False, p=vcs / sum(vcs), ) # Select the indices where one of these classes is present idxs = y.index[y.isin(classes)] x = x.iloc[idxs] y = y.iloc[idxs] # Uniformly sample columns if required if len(x.columns) > ncols_max: columns_idxs = rng.choice( list(range(len(x.columns))), size=ncols_max, replace=False ) sorted_column_idxs = sorted(columns_idxs) selected_columns = list(x.columns[sorted_column_idxs]) x = x[selected_columns] else: sorted_column_idxs = list(range(len(x.columns))) if len(x) > nrows_max: # Stratify accordingly target_name = y.name data = pd.concat((x, y), axis="columns") _, subset = train_test_split( data, test_size=nrows_max, stratify=data[target_name], shuffle=True, random_state=seed, ) x = subset.drop(target_name, axis="columns") y = subset[target_name] # We need to convert categorical columns to string for openml categorical_mask = [self.categorical_mask[i] for i in sorted_column_idxs] columns = list(x.columns) return Dataset( # Technically this is not the same but it's where it was derived from dataset=self.dataset, x=x, y=y, categorical_mask=categorical_mask, columns=columns, ) ```

101 features

class (target)nominal2 unique values
0 missing
ac_000numeric489 unique values
115 missing
af_000numeric42 unique values
87 missing
ag_001numeric25 unique values
19 missing
ag_003numeric415 unique values
19 missing
ag_004numeric1525 unique values
19 missing
ag_005numeric1886 unique values
19 missing
ag_006numeric1878 unique values
19 missing
ag_007numeric1389 unique values
19 missing
ag_009numeric552 unique values
19 missing
ak_000numeric8 unique values
145 missing
al_000numeric682 unique values
24 missing
am_0numeric690 unique values
23 missing
an_000numeric1956 unique values
24 missing
ap_000numeric1944 unique values
24 missing
at_000numeric206 unique values
23 missing
av_000numeric691 unique values
87 missing
ax_000numeric458 unique values
87 missing
ay_001numeric42 unique values
19 missing
ay_002numeric44 unique values
19 missing
ay_003numeric46 unique values
19 missing
ay_004numeric77 unique values
19 missing
ay_006numeric1394 unique values
19 missing
ay_008numeric1761 unique values
19 missing
ay_009numeric24 unique values
19 missing
az_000numeric1404 unique values
19 missing
az_001numeric1060 unique values
19 missing
az_004numeric1767 unique values
19 missing
az_006numeric997 unique values
19 missing
az_008numeric75 unique values
19 missing
ba_002numeric1766 unique values
18 missing
ba_003numeric1718 unique values
18 missing
ba_004numeric1649 unique values
18 missing
ba_005numeric1564 unique values
18 missing
ba_009numeric429 unique values
18 missing
bb_000numeric1957 unique values
27 missing
bc_000numeric395 unique values
97 missing
bg_000numeric1936 unique values
24 missing
bh_000numeric1675 unique values
24 missing
bi_000numeric1941 unique values
22 missing
bj_000numeric1917 unique values
22 missing
bm_000numeric526 unique values
1321 missing
bq_000numeric182 unique values
1628 missing
br_000numeric171 unique values
1639 missing
bt_000numeric1970 unique values
9 missing
bv_000numeric1953 unique values
31 missing
bx_000numeric1900 unique values
95 missing
by_000numeric1525 unique values
11 missing
bz_000numeric1158 unique values
97 missing
cc_000numeric1843 unique values
95 missing
cd_000numeric1 unique values
27 missing
cf_000numeric52 unique values
471 missing
cg_000numeric180 unique values
471 missing
ch_000numeric1 unique values
471 missing
ci_000numeric1960 unique values
11 missing
ck_000numeric1959 unique values
11 missing
cl_000numeric129 unique values
318 missing
cn_001numeric305 unique values
18 missing
cn_003numeric1838 unique values
18 missing
cn_005numeric1847 unique values
18 missing
cn_008numeric1134 unique values
18 missing
cn_009numeric496 unique values
18 missing
co_000numeric320 unique values
471 missing
cp_000numeric367 unique values
97 missing
cr_000numeric4 unique values
1565 missing
cs_000numeric1492 unique values
18 missing
cs_002numeric1513 unique values
18 missing
cs_003numeric1752 unique values
18 missing
cs_004numeric1815 unique values
18 missing
cs_005numeric1906 unique values
18 missing
cs_006numeric1903 unique values
18 missing
cs_009numeric11 unique values
18 missing
cu_000numeric685 unique values
431 missing
cv_000numeric1500 unique values
431 missing
cx_000numeric1289 unique values
431 missing
cz_000numeric796 unique values
431 missing
db_000numeric49 unique values
431 missing
dc_000numeric1495 unique values
431 missing
df_000numeric31 unique values
133 missing
dg_000numeric60 unique values
133 missing
di_000numeric299 unique values
133 missing
dj_000numeric10 unique values
133 missing
dk_000numeric18 unique values
133 missing
dm_000numeric8 unique values
133 missing
dq_000numeric423 unique values
97 missing
ds_000numeric1564 unique values
97 missing
dt_000numeric1332 unique values
97 missing
dv_000numeric1607 unique values
97 missing
dx_000numeric585 unique values
97 missing
dy_000numeric398 unique values
97 missing
ea_000numeric18 unique values
97 missing
eb_000numeric1146 unique values
133 missing
ec_00numeric1553 unique values
333 missing
ed_000numeric818 unique values
318 missing
ee_000numeric1907 unique values
19 missing
ee_001numeric1893 unique values
19 missing
ee_003numeric1651 unique values
19 missing
ee_007numeric1500 unique values
19 missing
ee_008numeric1241 unique values
19 missing
ee_009numeric720 unique values
19 missing
ef_000numeric4 unique values
97 missing

19 properties

2000
Number of instances (rows) of the dataset.
101
Number of attributes (columns) of the dataset.
2
Number of distinct values of the target attribute (if it is nominal).
15419
Number of missing values in the dataset.
1984
Number of instances with at least one value missing.
100
Number of numeric attributes.
1
Number of nominal attributes.
0.99
Percentage of binary attributes.
99.2
Percentage of instances having missing values.
7.63
Percentage of missing values.
0.96
Average class difference between consecutive instances.
99.01
Percentage of numeric attributes.
0.05
Number of attributes divided by the number of instances.
0.99
Percentage of nominal attributes.
98.2
Percentage of instances belonging to the most frequent class.
1964
Number of instances belonging to the most frequent class.
1.8
Percentage of instances belonging to the least frequent class.
36
Number of instances belonging to the least frequent class.
1
Number of binary attributes.

0 tasks

Define a new task