OpenML
amazon-commerce-reviews_seed_0_nrows_2000_nclasses_10_ncols_100_stratify_True

amazon-commerce-reviews_seed_0_nrows_2000_nclasses_10_ncols_100_stratify_True

active ARFF Publicly available Visibility: public Uploaded 17-11-2022 by Eddie Bergman
0 likes downloaded by 0 people , 0 total downloads 0 issues 0 downvotes
Issue #Downvotes for this reason By


Loading wiki
Help us complete this description Edit
Subsampling of the dataset amazon-commerce-reviews (1457) with seed=0 args.nrows=2000 args.ncols=100 args.nclasses=10 args.no_stratify=True Generated with the following source code: ```python def subsample( self, seed: int, nrows_max: int = 2_000, ncols_max: int = 100, nclasses_max: int = 10, stratified: bool = True, ) -> Dataset: rng = np.random.default_rng(seed) x = self.x y = self.y # Uniformly sample classes = y.unique() if len(classes) > nclasses_max: vcs = y.value_counts() selected_classes = rng.choice( classes, size=nclasses_max, replace=False, p=vcs / sum(vcs), ) # Select the indices where one of these classes is present idxs = y.index[y.isin(classes)] x = x.iloc[idxs] y = y.iloc[idxs] # Uniformly sample columns if required if len(x.columns) > ncols_max: columns_idxs = rng.choice( list(range(len(x.columns))), size=ncols_max, replace=False ) sorted_column_idxs = sorted(columns_idxs) selected_columns = list(x.columns[sorted_column_idxs]) x = x[selected_columns] else: sorted_column_idxs = list(range(len(x.columns))) if len(x) > nrows_max: # Stratify accordingly target_name = y.name data = pd.concat((x, y), axis="columns") _, subset = train_test_split( data, test_size=nrows_max, stratify=data[target_name], shuffle=True, random_state=seed, ) x = subset.drop(target_name, axis="columns") y = subset[target_name] # We need to convert categorical columns to string for openml categorical_mask = [self.categorical_mask[i] for i in sorted_column_idxs] columns = list(x.columns) return Dataset( # Technically this is not the same but it's where it was derived from dataset=self.dataset, x=x, y=y, categorical_mask=categorical_mask, columns=columns, ) ```

101 features

Class (target)nominal50 unique values
0 missing
V28numeric9 unique values
0 missing
V54numeric6 unique values
0 missing
V83numeric8 unique values
0 missing
V220numeric4 unique values
0 missing
V281numeric4 unique values
0 missing
V333numeric4 unique values
0 missing
V485numeric3 unique values
0 missing
V521numeric4 unique values
0 missing
V585numeric4 unique values
0 missing
V727numeric6 unique values
0 missing
V792numeric5 unique values
0 missing
V798numeric3 unique values
0 missing
V839numeric4 unique values
0 missing
V886numeric5 unique values
0 missing
V1234numeric3 unique values
0 missing
V1344numeric4 unique values
0 missing
V1502numeric3 unique values
0 missing
V1742numeric2 unique values
0 missing
V2266numeric2 unique values
0 missing
V2306numeric2 unique values
0 missing
V2390numeric3 unique values
0 missing
V2537numeric2 unique values
0 missing
V2555numeric3 unique values
0 missing
V2633numeric2 unique values
0 missing
V2747numeric3 unique values
0 missing
V2973numeric3 unique values
0 missing
V3088numeric3 unique values
0 missing
V3160numeric2 unique values
0 missing
V3208numeric3 unique values
0 missing
V3273numeric2 unique values
0 missing
V3358numeric2 unique values
0 missing
V3369numeric2 unique values
0 missing
V3564numeric6 unique values
0 missing
V3736numeric5 unique values
0 missing
V3764numeric4 unique values
0 missing
V3772numeric4 unique values
0 missing
V3811numeric4 unique values
0 missing
V3868numeric3 unique values
0 missing
V3905numeric3 unique values
0 missing
V4001numeric3 unique values
0 missing
V4194numeric3 unique values
0 missing
V4210numeric3 unique values
0 missing
V4501numeric2 unique values
0 missing
V4579numeric2 unique values
0 missing
V4772numeric4 unique values
0 missing
V4837numeric2 unique values
0 missing
V5028numeric2 unique values
0 missing
V5218numeric2 unique values
0 missing
V5228numeric2 unique values
0 missing
V5292numeric3 unique values
0 missing
V5370numeric3 unique values
0 missing
V5492numeric2 unique values
0 missing
V5694numeric2 unique values
0 missing
V5732numeric3 unique values
0 missing
V5759numeric2 unique values
0 missing
V5923numeric2 unique values
0 missing
V6111numeric2 unique values
0 missing
V6217numeric2 unique values
0 missing
V6426numeric2 unique values
0 missing
V6466numeric2 unique values
0 missing
V6644numeric24 unique values
0 missing
V6657numeric19 unique values
0 missing
V6702numeric18 unique values
0 missing
V6706numeric17 unique values
0 missing
V6813numeric13 unique values
0 missing
V6845numeric14 unique values
0 missing
V7000numeric11 unique values
0 missing
V7052numeric11 unique values
0 missing
V7124numeric10 unique values
0 missing
V7156numeric13 unique values
0 missing
V7170numeric9 unique values
0 missing
V7178numeric9 unique values
0 missing
V7231numeric14 unique values
0 missing
V7580numeric6 unique values
0 missing
V7584numeric8 unique values
0 missing
V7588numeric5 unique values
0 missing
V7670numeric9 unique values
0 missing
V7857numeric6 unique values
0 missing
V7906numeric10 unique values
0 missing
V7961numeric9 unique values
0 missing
V7998numeric7 unique values
0 missing
V8079numeric5 unique values
0 missing
V8310numeric6 unique values
0 missing
V8354numeric6 unique values
0 missing
V8391numeric4 unique values
0 missing
V8412numeric7 unique values
0 missing
V8494numeric6 unique values
0 missing
V8558numeric14 unique values
0 missing
V8595numeric7 unique values
0 missing
V8704numeric5 unique values
0 missing
V8753numeric7 unique values
0 missing
V8856numeric6 unique values
0 missing
V8878numeric5 unique values
0 missing
V8946numeric5 unique values
0 missing
V9302numeric4 unique values
0 missing
V9444numeric6 unique values
0 missing
V9746numeric9 unique values
0 missing
V9907numeric5 unique values
0 missing
V9970numeric6 unique values
0 missing
V9981numeric5 unique values
0 missing

19 properties

1500
Number of instances (rows) of the dataset.
101
Number of attributes (columns) of the dataset.
50
Number of distinct values of the target attribute (if it is nominal).
0
Number of missing values in the dataset.
0
Number of instances with at least one value missing.
100
Number of numeric attributes.
1
Number of nominal attributes.
2
Percentage of instances belonging to the most frequent class.
0.99
Percentage of nominal attributes.
30
Number of instances belonging to the most frequent class.
2
Percentage of instances belonging to the least frequent class.
30
Number of instances belonging to the least frequent class.
0
Number of binary attributes.
0
Percentage of binary attributes.
0
Percentage of instances having missing values.
0.97
Average class difference between consecutive instances.
0
Percentage of missing values.
0.07
Number of attributes divided by the number of instances.
99.01
Percentage of numeric attributes.

0 tasks

Define a new task