Data
volkert_seed_4_nrows_2000_nclasses_10_ncols_100_stratify_True

volkert_seed_4_nrows_2000_nclasses_10_ncols_100_stratify_True

active ARFF Publicly available Visibility: public Uploaded 17-11-2022 by Eddie Bergman
0 likes downloaded by 0 people , 0 total downloads 0 issues 0 downvotes
Issue #Downvotes for this reason By


Loading wiki
Help us complete this description Edit
Subsampling of the dataset volkert (41166) with seed=4 args.nrows=2000 args.ncols=100 args.nclasses=10 args.no_stratify=True Generated with the following source code: ```python def subsample( self, seed: int, nrows_max: int = 2_000, ncols_max: int = 100, nclasses_max: int = 10, stratified: bool = True, ) -> Dataset: rng = np.random.default_rng(seed) x = self.x y = self.y # Uniformly sample classes = y.unique() if len(classes) > nclasses_max: vcs = y.value_counts() selected_classes = rng.choice( classes, size=nclasses_max, replace=False, p=vcs / sum(vcs), ) # Select the indices where one of these classes is present idxs = y.index[y.isin(classes)] x = x.iloc[idxs] y = y.iloc[idxs] # Uniformly sample columns if required if len(x.columns) > ncols_max: columns_idxs = rng.choice( list(range(len(x.columns))), size=ncols_max, replace=False ) sorted_column_idxs = sorted(columns_idxs) selected_columns = list(x.columns[sorted_column_idxs]) x = x[selected_columns] else: sorted_column_idxs = list(range(len(x.columns))) if len(x) > nrows_max: # Stratify accordingly target_name = y.name data = pd.concat((x, y), axis="columns") _, subset = train_test_split( data, test_size=nrows_max, stratify=data[target_name], shuffle=True, random_state=seed, ) x = subset.drop(target_name, axis="columns") y = subset[target_name] # We need to convert categorical columns to string for openml categorical_mask = [self.categorical_mask[i] for i in sorted_column_idxs] columns = list(x.columns) return Dataset( # Technically this is not the same but it's where it was derived from dataset=self.dataset, x=x, y=y, categorical_mask=categorical_mask, columns=columns, ) ```

101 features

class (target)nominal10 unique values
0 missing
V5numeric1 unique values
0 missing
V7numeric1 unique values
0 missing
V8numeric1 unique values
0 missing
V12numeric1 unique values
0 missing
V15numeric1 unique values
0 missing
V17numeric1 unique values
0 missing
V18numeric804 unique values
0 missing
V19numeric1 unique values
0 missing
V21numeric1 unique values
0 missing
V22numeric1 unique values
0 missing
V23numeric1 unique values
0 missing
V24numeric1 unique values
0 missing
V26numeric1 unique values
0 missing
V28numeric1 unique values
0 missing
V30numeric1 unique values
0 missing
V32numeric1 unique values
0 missing
V35numeric1 unique values
0 missing
V37numeric782 unique values
0 missing
V39numeric905 unique values
0 missing
V41numeric749 unique values
0 missing
V42numeric653 unique values
0 missing
V43numeric602 unique values
0 missing
V46numeric395 unique values
0 missing
V49numeric318 unique values
0 missing
V50numeric256 unique values
0 missing
V51numeric231 unique values
0 missing
V55numeric151 unique values
0 missing
V56numeric144 unique values
0 missing
V59numeric110 unique values
0 missing
V62numeric93 unique values
0 missing
V65numeric76 unique values
0 missing
V66numeric60 unique values
0 missing
V68numeric148 unique values
0 missing
V69numeric169 unique values
0 missing
V74numeric847 unique values
0 missing
V75numeric871 unique values
0 missing
V76numeric823 unique values
0 missing
V78numeric655 unique values
0 missing
V79numeric583 unique values
0 missing
V80numeric502 unique values
0 missing
V81numeric414 unique values
0 missing
V83numeric264 unique values
0 missing
V84numeric167 unique values
0 missing
V85numeric1852 unique values
0 missing
V86numeric1854 unique values
0 missing
V87numeric1859 unique values
0 missing
V89numeric1889 unique values
0 missing
V93numeric1615 unique values
0 missing
V95numeric1561 unique values
0 missing
V100numeric1439 unique values
0 missing
V101numeric1706 unique values
0 missing
V102numeric1733 unique values
0 missing
V104numeric1733 unique values
0 missing
V106numeric1881 unique values
0 missing
V108numeric1881 unique values
0 missing
V109numeric439 unique values
0 missing
V111numeric312 unique values
0 missing
V113numeric300 unique values
0 missing
V114numeric303 unique values
0 missing
V115numeric292 unique values
0 missing
V116numeric311 unique values
0 missing
V117numeric287 unique values
0 missing
V119numeric371 unique values
0 missing
V120numeric279 unique values
0 missing
V122numeric273 unique values
0 missing
V125numeric276 unique values
0 missing
V126numeric376 unique values
0 missing
V127numeric342 unique values
0 missing
V131numeric287 unique values
0 missing
V132numeric288 unique values
0 missing
V134numeric362 unique values
0 missing
V135numeric306 unique values
0 missing
V136numeric292 unique values
0 missing
V137numeric319 unique values
0 missing
V138numeric327 unique values
0 missing
V141numeric370 unique values
0 missing
V142numeric382 unique values
0 missing
V143numeric383 unique values
0 missing
V144numeric488 unique values
0 missing
V145numeric438 unique values
0 missing
V149numeric347 unique values
0 missing
V150numeric344 unique values
0 missing
V151numeric327 unique values
0 missing
V154numeric316 unique values
0 missing
V155numeric377 unique values
0 missing
V156numeric300 unique values
0 missing
V158numeric287 unique values
0 missing
V159numeric277 unique values
0 missing
V160numeric279 unique values
0 missing
V161numeric283 unique values
0 missing
V166numeric270 unique values
0 missing
V167numeric284 unique values
0 missing
V168numeric280 unique values
0 missing
V170numeric354 unique values
0 missing
V171numeric289 unique values
0 missing
V172numeric285 unique values
0 missing
V173numeric303 unique values
0 missing
V178numeric319 unique values
0 missing
V179numeric291 unique values
0 missing
V180numeric488 unique values
0 missing

19 properties

2000
Number of instances (rows) of the dataset.
101
Number of attributes (columns) of the dataset.
10
Number of distinct values of the target attribute (if it is nominal).
0
Number of missing values in the dataset.
0
Number of instances with at least one value missing.
100
Number of numeric attributes.
1
Number of nominal attributes.
0
Percentage of binary attributes.
0
Percentage of instances having missing values.
0
Percentage of missing values.
0.14
Average class difference between consecutive instances.
99.01
Percentage of numeric attributes.
0.05
Number of attributes divided by the number of instances.
0.99
Percentage of nominal attributes.
21.95
Percentage of instances belonging to the most frequent class.
439
Number of instances belonging to the most frequent class.
2.35
Percentage of instances belonging to the least frequent class.
47
Number of instances belonging to the least frequent class.
0
Number of binary attributes.

0 tasks

Define a new task