Data
MiniBooNE_seed_4_nrows_2000_nclasses_10_ncols_100_stratify_True

MiniBooNE_seed_4_nrows_2000_nclasses_10_ncols_100_stratify_True

active ARFF Public Domain (CC0) Visibility: public Uploaded 17-11-2022 by Eddie Bergman
0 likes downloaded by 0 people , 0 total downloads 0 issues 0 downvotes
Issue #Downvotes for this reason By


Loading wiki
Help us complete this description Edit
Subsampling of the dataset MiniBooNE (44128) with seed=4 args.nrows=2000 args.ncols=100 args.nclasses=10 args.no_stratify=True Generated with the following source code: ```python def subsample( self, seed: int, nrows_max: int = 2_000, ncols_max: int = 100, nclasses_max: int = 10, stratified: bool = True, ) -> Dataset: rng = np.random.default_rng(seed) x = self.x y = self.y # Uniformly sample classes = y.unique() if len(classes) > nclasses_max: vcs = y.value_counts() selected_classes = rng.choice( classes, size=nclasses_max, replace=False, p=vcs / sum(vcs), ) # Select the indices where one of these classes is present idxs = y.index[y.isin(classes)] x = x.iloc[idxs] y = y.iloc[idxs] # Uniformly sample columns if required if len(x.columns) > ncols_max: columns_idxs = rng.choice( list(range(len(x.columns))), size=ncols_max, replace=False ) sorted_column_idxs = sorted(columns_idxs) selected_columns = list(x.columns[sorted_column_idxs]) x = x[selected_columns] else: sorted_column_idxs = list(range(len(x.columns))) if len(x) > nrows_max: # Stratify accordingly target_name = y.name data = pd.concat((x, y), axis="columns") _, subset = train_test_split( data, test_size=nrows_max, stratify=data[target_name], shuffle=True, random_state=seed, ) x = subset.drop(target_name, axis="columns") y = subset[target_name] # We need to convert categorical columns to string for openml categorical_mask = [self.categorical_mask[i] for i in sorted_column_idxs] columns = list(x.columns) return Dataset( # Technically this is not the same but it's where it was derived from dataset=self.dataset, x=x, y=y, categorical_mask=categorical_mask, columns=columns, ) ```

51 features

signal (target)nominal2 unique values
0 missing
ParticleID_0numeric1989 unique values
0 missing
ParticleID_1numeric1984 unique values
0 missing
ParticleID_2numeric1990 unique values
0 missing
ParticleID_3numeric1972 unique values
0 missing
ParticleID_4numeric1393 unique values
0 missing
ParticleID_5numeric1675 unique values
0 missing
ParticleID_6numeric1983 unique values
0 missing
ParticleID_7numeric1985 unique values
0 missing
ParticleID_8numeric1959 unique values
0 missing
ParticleID_9numeric1979 unique values
0 missing
ParticleID_10numeric1959 unique values
0 missing
ParticleID_11numeric1990 unique values
0 missing
ParticleID_12numeric1989 unique values
0 missing
ParticleID_13numeric1971 unique values
0 missing
ParticleID_14numeric1986 unique values
0 missing
ParticleID_15numeric1989 unique values
0 missing
ParticleID_16numeric1993 unique values
0 missing
ParticleID_17numeric1988 unique values
0 missing
ParticleID_18numeric1877 unique values
0 missing
ParticleID_19numeric1987 unique values
0 missing
ParticleID_20numeric1993 unique values
0 missing
ParticleID_21numeric1879 unique values
0 missing
ParticleID_22numeric1992 unique values
0 missing
ParticleID_23numeric1989 unique values
0 missing
ParticleID_24numeric1969 unique values
0 missing
ParticleID_25numeric1993 unique values
0 missing
ParticleID_26numeric1938 unique values
0 missing
ParticleID_27numeric1977 unique values
0 missing
ParticleID_28numeric1982 unique values
0 missing
ParticleID_29numeric1992 unique values
0 missing
ParticleID_30numeric1989 unique values
0 missing
ParticleID_31numeric1983 unique values
0 missing
ParticleID_32numeric1989 unique values
0 missing
ParticleID_33numeric1983 unique values
0 missing
ParticleID_34numeric1980 unique values
0 missing
ParticleID_35numeric1988 unique values
0 missing
ParticleID_36numeric1993 unique values
0 missing
ParticleID_37numeric1989 unique values
0 missing
ParticleID_38numeric1972 unique values
0 missing
ParticleID_39numeric1986 unique values
0 missing
ParticleID_40numeric1976 unique values
0 missing
ParticleID_41numeric1993 unique values
0 missing
ParticleID_42numeric1993 unique values
0 missing
ParticleID_43numeric1990 unique values
0 missing
ParticleID_44numeric633 unique values
0 missing
ParticleID_45numeric1985 unique values
0 missing
ParticleID_46numeric1992 unique values
0 missing
ParticleID_47numeric1991 unique values
0 missing
ParticleID_48numeric1993 unique values
0 missing
ParticleID_49numeric1981 unique values
0 missing

19 properties

2000
Number of instances (rows) of the dataset.
51
Number of attributes (columns) of the dataset.
2
Number of distinct values of the target attribute (if it is nominal).
0
Number of missing values in the dataset.
0
Number of instances with at least one value missing.
50
Number of numeric attributes.
1
Number of nominal attributes.
1.96
Percentage of binary attributes.
0
Percentage of instances having missing values.
0
Percentage of missing values.
0.5
Average class difference between consecutive instances.
98.04
Percentage of numeric attributes.
0.03
Number of attributes divided by the number of instances.
1.96
Percentage of nominal attributes.
50
Percentage of instances belonging to the most frequent class.
1000
Number of instances belonging to the most frequent class.
50
Percentage of instances belonging to the least frequent class.
1000
Number of instances belonging to the least frequent class.
1
Number of binary attributes.

0 tasks

Define a new task