OpenML
gina_seed_1_nrows_2000_nclasses_10_ncols_100_stratify_True

gina_seed_1_nrows_2000_nclasses_10_ncols_100_stratify_True

active ARFF Publicly available Visibility: public Uploaded 17-11-2022 by Eddie Bergman
0 likes downloaded by 0 people , 0 total downloads 0 issues 0 downvotes
Issue #Downvotes for this reason By


Loading wiki
Help us complete this description Edit
Subsampling of the dataset gina (41158) with seed=1 args.nrows=2000 args.ncols=100 args.nclasses=10 args.no_stratify=True Generated with the following source code: ```python def subsample( self, seed: int, nrows_max: int = 2_000, ncols_max: int = 100, nclasses_max: int = 10, stratified: bool = True, ) -> Dataset: rng = np.random.default_rng(seed) x = self.x y = self.y # Uniformly sample classes = y.unique() if len(classes) > nclasses_max: vcs = y.value_counts() selected_classes = rng.choice( classes, size=nclasses_max, replace=False, p=vcs / sum(vcs), ) # Select the indices where one of these classes is present idxs = y.index[y.isin(classes)] x = x.iloc[idxs] y = y.iloc[idxs] # Uniformly sample columns if required if len(x.columns) > ncols_max: columns_idxs = rng.choice( list(range(len(x.columns))), size=ncols_max, replace=False ) sorted_column_idxs = sorted(columns_idxs) selected_columns = list(x.columns[sorted_column_idxs]) x = x[selected_columns] else: sorted_column_idxs = list(range(len(x.columns))) if len(x) > nrows_max: # Stratify accordingly target_name = y.name data = pd.concat((x, y), axis="columns") _, subset = train_test_split( data, test_size=nrows_max, stratify=data[target_name], shuffle=True, random_state=seed, ) x = subset.drop(target_name, axis="columns") y = subset[target_name] # We need to convert categorical columns to string for openml categorical_mask = [self.categorical_mask[i] for i in sorted_column_idxs] columns = list(x.columns) return Dataset( # Technically this is not the same but it's where it was derived from dataset=self.dataset, x=x, y=y, categorical_mask=categorical_mask, columns=columns, ) ```

101 features

class (target)nominal2 unique values
0 missing
V19numeric229 unique values
0 missing
V25numeric112 unique values
0 missing
V31numeric218 unique values
0 missing
V38numeric227 unique values
0 missing
V53numeric225 unique values
0 missing
V57numeric237 unique values
0 missing
V60numeric224 unique values
0 missing
V77numeric185 unique values
0 missing
V85numeric32 unique values
0 missing
V108numeric221 unique values
0 missing
V109numeric186 unique values
0 missing
V112numeric93 unique values
0 missing
V113numeric227 unique values
0 missing
V122numeric243 unique values
0 missing
V127numeric229 unique values
0 missing
V144numeric236 unique values
0 missing
V150numeric118 unique values
0 missing
V185numeric152 unique values
0 missing
V206numeric132 unique values
0 missing
V220numeric44 unique values
0 missing
V228numeric231 unique values
0 missing
V239numeric116 unique values
0 missing
V242numeric235 unique values
0 missing
V250numeric233 unique values
0 missing
V257numeric234 unique values
0 missing
V271numeric198 unique values
0 missing
V273numeric102 unique values
0 missing
V274numeric27 unique values
0 missing
V275numeric83 unique values
0 missing
V296numeric214 unique values
0 missing
V300numeric66 unique values
0 missing
V331numeric170 unique values
0 missing
V341numeric237 unique values
0 missing
V347numeric237 unique values
0 missing
V350numeric241 unique values
0 missing
V363numeric241 unique values
0 missing
V366numeric216 unique values
0 missing
V374numeric236 unique values
0 missing
V393numeric60 unique values
0 missing
V398numeric192 unique values
0 missing
V403numeric223 unique values
0 missing
V407numeric231 unique values
0 missing
V410numeric189 unique values
0 missing
V413numeric229 unique values
0 missing
V427numeric66 unique values
0 missing
V435numeric124 unique values
0 missing
V436numeric228 unique values
0 missing
V445numeric172 unique values
0 missing
V447numeric34 unique values
0 missing
V456numeric42 unique values
0 missing
V457numeric239 unique values
0 missing
V481numeric99 unique values
0 missing
V482numeric206 unique values
0 missing
V489numeric227 unique values
0 missing
V491numeric43 unique values
0 missing
V493numeric187 unique values
0 missing
V501numeric149 unique values
0 missing
V529numeric150 unique values
0 missing
V559numeric239 unique values
0 missing
V567numeric238 unique values
0 missing
V572numeric216 unique values
0 missing
V577numeric66 unique values
0 missing
V584numeric32 unique values
0 missing
V611numeric228 unique values
0 missing
V649numeric239 unique values
0 missing
V660numeric217 unique values
0 missing
V669numeric224 unique values
0 missing
V673numeric127 unique values
0 missing
V680numeric34 unique values
0 missing
V685numeric236 unique values
0 missing
V689numeric57 unique values
0 missing
V709numeric196 unique values
0 missing
V722numeric232 unique values
0 missing
V728numeric102 unique values
0 missing
V729numeric108 unique values
0 missing
V732numeric42 unique values
0 missing
V734numeric35 unique values
0 missing
V749numeric225 unique values
0 missing
V766numeric198 unique values
0 missing
V770numeric225 unique values
0 missing
V772numeric96 unique values
0 missing
V796numeric148 unique values
0 missing
V807numeric210 unique values
0 missing
V814numeric66 unique values
0 missing
V820numeric160 unique values
0 missing
V826numeric57 unique values
0 missing
V831numeric231 unique values
0 missing
V833numeric243 unique values
0 missing
V856numeric230 unique values
0 missing
V865numeric231 unique values
0 missing
V883numeric142 unique values
0 missing
V885numeric231 unique values
0 missing
V895numeric38 unique values
0 missing
V901numeric151 unique values
0 missing
V903numeric91 unique values
0 missing
V926numeric31 unique values
0 missing
V927numeric235 unique values
0 missing
V946numeric238 unique values
0 missing
V947numeric23 unique values
0 missing
V966numeric44 unique values
0 missing

19 properties

2000
Number of instances (rows) of the dataset.
101
Number of attributes (columns) of the dataset.
2
Number of distinct values of the target attribute (if it is nominal).
0
Number of missing values in the dataset.
0
Number of instances with at least one value missing.
100
Number of numeric attributes.
1
Number of nominal attributes.
0.51
Average class difference between consecutive instances.
0
Percentage of missing values.
0.05
Number of attributes divided by the number of instances.
99.01
Percentage of numeric attributes.
50.85
Percentage of instances belonging to the most frequent class.
0.99
Percentage of nominal attributes.
1017
Number of instances belonging to the most frequent class.
49.15
Percentage of instances belonging to the least frequent class.
983
Number of instances belonging to the least frequent class.
1
Number of binary attributes.
0.99
Percentage of binary attributes.
0
Percentage of instances having missing values.

0 tasks

Define a new task