Data
KDDCup09-Upselling_seed_0_nrows_2000_nclasses_10_ncols_100_stratify_True

KDDCup09-Upselling_seed_0_nrows_2000_nclasses_10_ncols_100_stratify_True

active ARFF Publicly available Visibility: public Uploaded 17-11-2022 by Eddie Bergman
0 likes downloaded by 0 people , 0 total downloads 0 issues 0 downvotes
Issue #Downvotes for this reason By


Loading wiki
Help us complete this description Edit
Subsampling of the dataset KDDCup09-Upselling (43072) with seed=0 args.nrows=2000 args.ncols=100 args.nclasses=10 args.no_stratify=True Generated with the following source code: ```python def subsample( self, seed: int, nrows_max: int = 2_000, ncols_max: int = 100, nclasses_max: int = 10, stratified: bool = True, ) -> Dataset: rng = np.random.default_rng(seed) x = self.x y = self.y # Uniformly sample classes = y.unique() if len(classes) > nclasses_max: vcs = y.value_counts() selected_classes = rng.choice( classes, size=nclasses_max, replace=False, p=vcs / sum(vcs), ) # Select the indices where one of these classes is present idxs = y.index[y.isin(classes)] x = x.iloc[idxs] y = y.iloc[idxs] # Uniformly sample columns if required if len(x.columns) > ncols_max: columns_idxs = rng.choice( list(range(len(x.columns))), size=ncols_max, replace=False ) sorted_column_idxs = sorted(columns_idxs) selected_columns = list(x.columns[sorted_column_idxs]) x = x[selected_columns] else: sorted_column_idxs = list(range(len(x.columns))) if len(x) > nrows_max: # Stratify accordingly target_name = y.name data = pd.concat((x, y), axis="columns") _, subset = train_test_split( data, test_size=nrows_max, stratify=data[target_name], shuffle=True, random_state=seed, ) x = subset.drop(target_name, axis="columns") y = subset[target_name] # We need to convert categorical columns to string for openml categorical_mask = [self.categorical_mask[i] for i in sorted_column_idxs] columns = list(x.columns) return Dataset( # Technically this is not the same but it's where it was derived from dataset=self.dataset, x=x, y=y, categorical_mask=categorical_mask, columns=columns, ) ```

101 features

upselling (target)nominal2 unique values
0 missing
Var41numeric1 unique values
0 missing
Var81numeric2 unique values
0 missing
Var124numeric2 unique values
0 missing
Var247numeric2 unique values
0 missing
Var331numeric2 unique values
0 missing
Var424numeric2 unique values
0 missing
Var501numeric10 unique values
0 missing
Var610numeric1 unique values
0 missing
Var726numeric2 unique values
0 missing
Var1088numeric1 unique values
0 missing
Var1118numeric2 unique values
0 missing
Var1197numeric2 unique values
0 missing
Var1255numeric1 unique values
0 missing
Var1329numeric31 unique values
0 missing
Var1849numeric2 unique values
0 missing
Var2013numeric10 unique values
0 missing
Var2601numeric1 unique values
0 missing
Var2611numeric17 unique values
0 missing
Var3390numeric1320 unique values
194 missing
Var3797numeric19 unique values
0 missing
Var3828numeric1 unique values
0 missing
Var3938numeric3 unique values
0 missing
Var4001numeric1566 unique values
0 missing
Var4118numeric1 unique values
0 missing
Var4456numeric2 unique values
0 missing
Var4566numeric295 unique values
0 missing
Var4623numeric4 unique values
0 missing
Var4800numeric1 unique values
0 missing
Var4896numeric2 unique values
0 missing
Var5040numeric2 unique values
0 missing
Var5335numeric2 unique values
0 missing
Var5594numeric7 unique values
0 missing
Var5630numeric2 unique values
0 missing
Var5650numeric1 unique values
0 missing
Var5709numeric35 unique values
0 missing
Var5792numeric6 unique values
0 missing
Var5842numeric3 unique values
0 missing
Var5854numeric1 unique values
0 missing
Var5995numeric2 unique values
0 missing
Var6284numeric2 unique values
0 missing
Var6302numeric1 unique values
1941 missing
Var6857numeric2 unique values
0 missing
Var7152numeric2 unique values
0 missing
Var7240numeric11 unique values
0 missing
Var7475numeric2 unique values
0 missing
Var7524numeric4 unique values
0 missing
Var7581numeric2 unique values
0 missing
Var7819numeric60 unique values
0 missing
Var7827numeric1 unique values
0 missing
Var7922numeric1 unique values
0 missing
Var8049numeric2 unique values
0 missing
Var8071numeric68 unique values
0 missing
Var8234numeric1 unique values
0 missing
Var8313numeric1 unique values
0 missing
Var8521numeric1 unique values
0 missing
Var8622numeric2 unique values
0 missing
Var8862numeric2 unique values
0 missing
Var9002numeric2 unique values
0 missing
Var9154numeric7 unique values
0 missing
Var9299numeric2 unique values
0 missing
Var9384numeric1 unique values
0 missing
Var9444numeric1600 unique values
0 missing
Var9625numeric24 unique values
0 missing
Var9634numeric2 unique values
0 missing
Var9681numeric2 unique values
0 missing
Var9961numeric1 unique values
0 missing
Var9973numeric1 unique values
0 missing
Var10027numeric9 unique values
0 missing
Var10202numeric7 unique values
0 missing
Var10248numeric2 unique values
0 missing
Var10480numeric5 unique values
0 missing
Var10654numeric1747 unique values
0 missing
Var10709numeric2 unique values
0 missing
Var10726numeric5 unique values
0 missing
Var10743numeric10 unique values
0 missing
Var10825numeric65 unique values
0 missing
Var10838numeric460 unique values
0 missing
Var11343numeric1 unique values
0 missing
Var11360numeric6 unique values
0 missing
Var11365numeric2 unique values
0 missing
Var11980numeric2 unique values
0 missing
Var12067numeric2 unique values
0 missing
Var12115numeric1 unique values
0 missing
Var12429numeric61 unique values
0 missing
Var12511numeric1 unique values
0 missing
Var12578numeric2 unique values
0 missing
Var12594numeric3 unique values
0 missing
Var12613numeric2 unique values
0 missing
Var12735numeric2 unique values
0 missing
Var12828numeric2 unique values
0 missing
Var13033numeric2 unique values
0 missing
Var13258numeric3 unique values
0 missing
Var13283numeric1 unique values
0 missing
Var13546numeric1 unique values
0 missing
Var13885numeric1 unique values
0 missing
Var13924numeric1 unique values
0 missing
Var14145numeric1 unique values
0 missing
Var14411numeric64 unique values
0 missing
Var14601numeric7 unique values
0 missing
Var14874nominal2 unique values
1435 missing

19 properties

2000
Number of instances (rows) of the dataset.
101
Number of attributes (columns) of the dataset.
2
Number of distinct values of the target attribute (if it is nominal).
3570
Number of missing values in the dataset.
2000
Number of instances with at least one value missing.
99
Number of numeric attributes.
2
Number of nominal attributes.
1.98
Percentage of binary attributes.
100
Percentage of instances having missing values.
0.87
Average class difference between consecutive instances.
1.77
Percentage of missing values.
0.05
Number of attributes divided by the number of instances.
98.02
Percentage of numeric attributes.
92.65
Percentage of instances belonging to the most frequent class.
1.98
Percentage of nominal attributes.
1853
Number of instances belonging to the most frequent class.
7.35
Percentage of instances belonging to the least frequent class.
147
Number of instances belonging to the least frequent class.
2
Number of binary attributes.

0 tasks

Define a new task