Data
Higgs_seed_2_nrows_2000_nclasses_10_ncols_100_stratify_True

Higgs_seed_2_nrows_2000_nclasses_10_ncols_100_stratify_True

active ARFF Publicly available Visibility: public Uploaded 17-11-2022 by Eddie Bergman
0 likes downloaded by 0 people , 0 total downloads 0 issues 0 downvotes
Issue #Downvotes for this reason By


Loading wiki
Help us complete this description Edit
Subsampling of the dataset Higgs (44129) with seed=2 args.nrows=2000 args.ncols=100 args.nclasses=10 args.no_stratify=True Generated with the following source code: ```python def subsample( self, seed: int, nrows_max: int = 2_000, ncols_max: int = 100, nclasses_max: int = 10, stratified: bool = True, ) -> Dataset: rng = np.random.default_rng(seed) x = self.x y = self.y # Uniformly sample classes = y.unique() if len(classes) > nclasses_max: vcs = y.value_counts() selected_classes = rng.choice( classes, size=nclasses_max, replace=False, p=vcs / sum(vcs), ) # Select the indices where one of these classes is present idxs = y.index[y.isin(classes)] x = x.iloc[idxs] y = y.iloc[idxs] # Uniformly sample columns if required if len(x.columns) > ncols_max: columns_idxs = rng.choice( list(range(len(x.columns))), size=ncols_max, replace=False ) sorted_column_idxs = sorted(columns_idxs) selected_columns = list(x.columns[sorted_column_idxs]) x = x[selected_columns] else: sorted_column_idxs = list(range(len(x.columns))) if len(x) > nrows_max: # Stratify accordingly target_name = y.name data = pd.concat((x, y), axis="columns") _, subset = train_test_split( data, test_size=nrows_max, stratify=data[target_name], shuffle=True, random_state=seed, ) x = subset.drop(target_name, axis="columns") y = subset[target_name] # We need to convert categorical columns to string for openml categorical_mask = [self.categorical_mask[i] for i in sorted_column_idxs] columns = list(x.columns) return Dataset( # Technically this is not the same but it's where it was derived from dataset=self.dataset, x=x, y=y, categorical_mask=categorical_mask, columns=columns, ) ```

25 features

target (target)nominal2 unique values
0 missing
lepton_pTnumeric1773 unique values
0 missing
lepton_etanumeric1553 unique values
0 missing
lepton_phinumeric1716 unique values
0 missing
missing_energy_magnitudenumeric1998 unique values
0 missing
missing_energy_phinumeric1997 unique values
0 missing
jet_1_ptnumeric1870 unique values
0 missing
jet_1_etanumeric1512 unique values
0 missing
jet_1_phinumeric1695 unique values
0 missing
jet_2_ptnumeric1828 unique values
0 missing
jet_2_etanumeric1538 unique values
0 missing
jet_2_phinumeric1689 unique values
0 missing
jet_3_ptnumeric1770 unique values
0 missing
jet_3_etanumeric1578 unique values
0 missing
jet_3_phinumeric1738 unique values
0 missing
jet_4_ptnumeric1674 unique values
0 missing
jet_4_etanumeric1620 unique values
0 missing
jet_4_phinumeric1712 unique values
0 missing
m_jjnumeric1996 unique values
0 missing
m_jjjnumeric1981 unique values
0 missing
m_lvnumeric1922 unique values
0 missing
m_jlvnumeric1990 unique values
0 missing
m_bbnumeric1987 unique values
0 missing
m_wbbnumeric1994 unique values
0 missing
m_wwbbnumeric1991 unique values
0 missing

19 properties

2000
Number of instances (rows) of the dataset.
25
Number of attributes (columns) of the dataset.
2
Number of distinct values of the target attribute (if it is nominal).
0
Number of missing values in the dataset.
0
Number of instances with at least one value missing.
24
Number of numeric attributes.
1
Number of nominal attributes.
1000
Number of instances belonging to the least frequent class.
1
Number of binary attributes.
4
Percentage of binary attributes.
0
Percentage of instances having missing values.
0.5
Average class difference between consecutive instances.
0
Percentage of missing values.
0.01
Number of attributes divided by the number of instances.
96
Percentage of numeric attributes.
50
Percentage of instances belonging to the most frequent class.
4
Percentage of nominal attributes.
1000
Number of instances belonging to the most frequent class.
50
Percentage of instances belonging to the least frequent class.

0 tasks

Define a new task