Data
MiniBooNE_seed_1_nrows_2000_nclasses_10_ncols_100_stratify_True

MiniBooNE_seed_1_nrows_2000_nclasses_10_ncols_100_stratify_True

active ARFF Public Domain (CC0) Visibility: public Uploaded 17-11-2022 by Eddie Bergman
0 likes downloaded by 0 people , 0 total downloads 0 issues 0 downvotes
Issue #Downvotes for this reason By


Loading wiki
Help us complete this description Edit
Subsampling of the dataset MiniBooNE (44128) with seed=1 args.nrows=2000 args.ncols=100 args.nclasses=10 args.no_stratify=True Generated with the following source code: ```python def subsample( self, seed: int, nrows_max: int = 2_000, ncols_max: int = 100, nclasses_max: int = 10, stratified: bool = True, ) -> Dataset: rng = np.random.default_rng(seed) x = self.x y = self.y # Uniformly sample classes = y.unique() if len(classes) > nclasses_max: vcs = y.value_counts() selected_classes = rng.choice( classes, size=nclasses_max, replace=False, p=vcs / sum(vcs), ) # Select the indices where one of these classes is present idxs = y.index[y.isin(classes)] x = x.iloc[idxs] y = y.iloc[idxs] # Uniformly sample columns if required if len(x.columns) > ncols_max: columns_idxs = rng.choice( list(range(len(x.columns))), size=ncols_max, replace=False ) sorted_column_idxs = sorted(columns_idxs) selected_columns = list(x.columns[sorted_column_idxs]) x = x[selected_columns] else: sorted_column_idxs = list(range(len(x.columns))) if len(x) > nrows_max: # Stratify accordingly target_name = y.name data = pd.concat((x, y), axis="columns") _, subset = train_test_split( data, test_size=nrows_max, stratify=data[target_name], shuffle=True, random_state=seed, ) x = subset.drop(target_name, axis="columns") y = subset[target_name] # We need to convert categorical columns to string for openml categorical_mask = [self.categorical_mask[i] for i in sorted_column_idxs] columns = list(x.columns) return Dataset( # Technically this is not the same but it's where it was derived from dataset=self.dataset, x=x, y=y, categorical_mask=categorical_mask, columns=columns, ) ```

51 features

signal (target)nominal2 unique values
0 missing
ParticleID_0numeric1991 unique values
0 missing
ParticleID_1numeric1987 unique values
0 missing
ParticleID_2numeric1994 unique values
0 missing
ParticleID_3numeric1980 unique values
0 missing
ParticleID_4numeric1412 unique values
0 missing
ParticleID_5numeric1683 unique values
0 missing
ParticleID_6numeric1983 unique values
0 missing
ParticleID_7numeric1986 unique values
0 missing
ParticleID_8numeric1977 unique values
0 missing
ParticleID_9numeric1984 unique values
0 missing
ParticleID_10numeric1972 unique values
0 missing
ParticleID_11numeric1992 unique values
0 missing
ParticleID_12numeric1992 unique values
0 missing
ParticleID_13numeric1964 unique values
0 missing
ParticleID_14numeric1987 unique values
0 missing
ParticleID_15numeric1990 unique values
0 missing
ParticleID_16numeric1988 unique values
0 missing
ParticleID_17numeric1993 unique values
0 missing
ParticleID_18numeric1889 unique values
0 missing
ParticleID_19numeric1989 unique values
0 missing
ParticleID_20numeric1995 unique values
0 missing
ParticleID_21numeric1878 unique values
0 missing
ParticleID_22numeric1993 unique values
0 missing
ParticleID_23numeric1988 unique values
0 missing
ParticleID_24numeric1971 unique values
0 missing
ParticleID_25numeric1994 unique values
0 missing
ParticleID_26numeric1943 unique values
0 missing
ParticleID_27numeric1967 unique values
0 missing
ParticleID_28numeric1983 unique values
0 missing
ParticleID_29numeric1992 unique values
0 missing
ParticleID_30numeric1991 unique values
0 missing
ParticleID_31numeric1985 unique values
0 missing
ParticleID_32numeric1992 unique values
0 missing
ParticleID_33numeric1988 unique values
0 missing
ParticleID_34numeric1990 unique values
0 missing
ParticleID_35numeric1988 unique values
0 missing
ParticleID_36numeric1995 unique values
0 missing
ParticleID_37numeric1992 unique values
0 missing
ParticleID_38numeric1977 unique values
0 missing
ParticleID_39numeric1991 unique values
0 missing
ParticleID_40numeric1977 unique values
0 missing
ParticleID_41numeric1991 unique values
0 missing
ParticleID_42numeric1990 unique values
0 missing
ParticleID_43numeric1992 unique values
0 missing
ParticleID_44numeric646 unique values
0 missing
ParticleID_45numeric1988 unique values
0 missing
ParticleID_46numeric1994 unique values
0 missing
ParticleID_47numeric1992 unique values
0 missing
ParticleID_48numeric1994 unique values
0 missing
ParticleID_49numeric1988 unique values
0 missing

19 properties

2000
Number of instances (rows) of the dataset.
51
Number of attributes (columns) of the dataset.
2
Number of distinct values of the target attribute (if it is nominal).
0
Number of missing values in the dataset.
0
Number of instances with at least one value missing.
50
Number of numeric attributes.
1
Number of nominal attributes.
1.96
Percentage of binary attributes.
0
Percentage of instances having missing values.
0.49
Average class difference between consecutive instances.
0
Percentage of missing values.
0.03
Number of attributes divided by the number of instances.
98.04
Percentage of numeric attributes.
50
Percentage of instances belonging to the most frequent class.
1.96
Percentage of nominal attributes.
1000
Number of instances belonging to the most frequent class.
50
Percentage of instances belonging to the least frequent class.
1000
Number of instances belonging to the least frequent class.
1
Number of binary attributes.

0 tasks

Define a new task