Data
KDDCup09-Upselling_seed_3_nrows_2000_nclasses_10_ncols_100_stratify_True

KDDCup09-Upselling_seed_3_nrows_2000_nclasses_10_ncols_100_stratify_True

active ARFF Publicly available Visibility: public Uploaded 17-11-2022 by Eddie Bergman
0 likes downloaded by 0 people , 0 total downloads 0 issues 0 downvotes
Issue #Downvotes for this reason By


Loading wiki
Help us complete this description Edit
Subsampling of the dataset KDDCup09-Upselling (43072) with seed=3 args.nrows=2000 args.ncols=100 args.nclasses=10 args.no_stratify=True Generated with the following source code: ```python def subsample( self, seed: int, nrows_max: int = 2_000, ncols_max: int = 100, nclasses_max: int = 10, stratified: bool = True, ) -> Dataset: rng = np.random.default_rng(seed) x = self.x y = self.y # Uniformly sample classes = y.unique() if len(classes) > nclasses_max: vcs = y.value_counts() selected_classes = rng.choice( classes, size=nclasses_max, replace=False, p=vcs / sum(vcs), ) # Select the indices where one of these classes is present idxs = y.index[y.isin(classes)] x = x.iloc[idxs] y = y.iloc[idxs] # Uniformly sample columns if required if len(x.columns) > ncols_max: columns_idxs = rng.choice( list(range(len(x.columns))), size=ncols_max, replace=False ) sorted_column_idxs = sorted(columns_idxs) selected_columns = list(x.columns[sorted_column_idxs]) x = x[selected_columns] else: sorted_column_idxs = list(range(len(x.columns))) if len(x) > nrows_max: # Stratify accordingly target_name = y.name data = pd.concat((x, y), axis="columns") _, subset = train_test_split( data, test_size=nrows_max, stratify=data[target_name], shuffle=True, random_state=seed, ) x = subset.drop(target_name, axis="columns") y = subset[target_name] # We need to convert categorical columns to string for openml categorical_mask = [self.categorical_mask[i] for i in sorted_column_idxs] columns = list(x.columns) return Dataset( # Technically this is not the same but it's where it was derived from dataset=self.dataset, x=x, y=y, categorical_mask=categorical_mask, columns=columns, ) ```

101 features

upselling (target)nominal2 unique values
0 missing
Var23numeric2 unique values
0 missing
Var72numeric1 unique values
0 missing
Var454numeric11 unique values
0 missing
Var488numeric3 unique values
0 missing
Var587numeric2 unique values
0 missing
Var643numeric2 unique values
0 missing
Var1145numeric2 unique values
0 missing
Var1272numeric1 unique values
0 missing
Var1355numeric2 unique values
0 missing
Var1399numeric2 unique values
0 missing
Var1529numeric1 unique values
0 missing
Var1689numeric2 unique values
0 missing
Var2072numeric2 unique values
0 missing
Var2372numeric2 unique values
0 missing
Var2569numeric1 unique values
0 missing
Var2662numeric1 unique values
0 missing
Var2691numeric4 unique values
0 missing
Var2833numeric1 unique values
0 missing
Var3087numeric5 unique values
0 missing
Var3261numeric1938 unique values
0 missing
Var3358numeric4 unique values
0 missing
Var3513numeric1 unique values
0 missing
Var3641numeric1 unique values
0 missing
Var3695numeric73 unique values
0 missing
Var3777numeric5 unique values
0 missing
Var3788numeric6 unique values
0 missing
Var3930numeric1 unique values
0 missing
Var4224numeric2 unique values
0 missing
Var4353numeric1 unique values
0 missing
Var4361numeric2 unique values
0 missing
Var4439numeric2 unique values
0 missing
Var4444numeric1 unique values
0 missing
Var4671numeric1 unique values
0 missing
Var4756numeric1 unique values
0 missing
Var4859numeric2 unique values
0 missing
Var4930numeric1351 unique values
0 missing
Var5573numeric1 unique values
0 missing
Var5739numeric9 unique values
0 missing
Var5782numeric1 unique values
0 missing
Var5810numeric1 unique values
0 missing
Var5880numeric12 unique values
0 missing
Var6240numeric2 unique values
0 missing
Var6393numeric2 unique values
0 missing
Var6396numeric2 unique values
0 missing
Var6427numeric2 unique values
0 missing
Var6712numeric2 unique values
0 missing
Var7013numeric35 unique values
0 missing
Var7062numeric2 unique values
0 missing
Var7109numeric3 unique values
0 missing
Var7263numeric2 unique values
0 missing
Var7675numeric1 unique values
0 missing
Var7738numeric2 unique values
0 missing
Var7749numeric1 unique values
0 missing
Var8636numeric1 unique values
0 missing
Var8706numeric1 unique values
0 missing
Var8716numeric2 unique values
0 missing
Var8855numeric3 unique values
0 missing
Var9060numeric2 unique values
0 missing
Var9063numeric2 unique values
0 missing
Var9218numeric5 unique values
0 missing
Var9387numeric2 unique values
0 missing
Var9638numeric2 unique values
0 missing
Var9665numeric1 unique values
0 missing
Var9743numeric3 unique values
0 missing
Var9795numeric1 unique values
0 missing
Var9804numeric1 unique values
0 missing
Var9835numeric2 unique values
0 missing
Var9857numeric1 unique values
0 missing
Var9895numeric2 unique values
0 missing
Var10180numeric1 unique values
0 missing
Var10261numeric2 unique values
0 missing
Var10347numeric3 unique values
0 missing
Var10522numeric1 unique values
0 missing
Var10760numeric2 unique values
0 missing
Var10902numeric2 unique values
0 missing
Var10959numeric2 unique values
0 missing
Var11051numeric550 unique values
0 missing
Var11241numeric2 unique values
0 missing
Var11316numeric1 unique values
0 missing
Var11506numeric1 unique values
0 missing
Var11682numeric1 unique values
0 missing
Var11886numeric2 unique values
0 missing
Var12033numeric2 unique values
0 missing
Var12232numeric7 unique values
0 missing
Var12373numeric1 unique values
0 missing
Var12564numeric1 unique values
0 missing
Var12683numeric2 unique values
0 missing
Var12894numeric2 unique values
0 missing
Var12923numeric2 unique values
0 missing
Var13108numeric2 unique values
0 missing
Var13184numeric1 unique values
0 missing
Var13267numeric1554 unique values
0 missing
Var13629numeric2 unique values
0 missing
Var13825numeric7 unique values
0 missing
Var13876numeric118 unique values
0 missing
Var13959numeric7 unique values
0 missing
Var14039numeric1 unique values
0 missing
Var14212numeric5 unique values
0 missing
Var14279numeric2 unique values
0 missing
Var14480numeric2 unique values
0 missing

19 properties

2000
Number of instances (rows) of the dataset.
101
Number of attributes (columns) of the dataset.
2
Number of distinct values of the target attribute (if it is nominal).
0
Number of missing values in the dataset.
0
Number of instances with at least one value missing.
100
Number of numeric attributes.
1
Number of nominal attributes.
0.99
Percentage of binary attributes.
0
Percentage of instances having missing values.
0
Percentage of missing values.
0.86
Average class difference between consecutive instances.
99.01
Percentage of numeric attributes.
0.05
Number of attributes divided by the number of instances.
0.99
Percentage of nominal attributes.
92.65
Percentage of instances belonging to the most frequent class.
1853
Number of instances belonging to the most frequent class.
7.35
Percentage of instances belonging to the least frequent class.
147
Number of instances belonging to the least frequent class.
1
Number of binary attributes.

0 tasks

Define a new task