Data
robert_seed_1_nrows_2000_nclasses_10_ncols_100_stratify_True

robert_seed_1_nrows_2000_nclasses_10_ncols_100_stratify_True

active ARFF Publicly available Visibility: public Uploaded 17-11-2022 by Eddie Bergman
0 likes downloaded by 0 people , 0 total downloads 0 issues 0 downvotes
Issue #Downvotes for this reason By


Loading wiki
Help us complete this description Edit
Subsampling of the dataset robert (41165) with seed=1 args.nrows=2000 args.ncols=100 args.nclasses=10 args.no_stratify=True Generated with the following source code: ```python def subsample( self, seed: int, nrows_max: int = 2_000, ncols_max: int = 100, nclasses_max: int = 10, stratified: bool = True, ) -> Dataset: rng = np.random.default_rng(seed) x = self.x y = self.y # Uniformly sample classes = y.unique() if len(classes) > nclasses_max: vcs = y.value_counts() selected_classes = rng.choice( classes, size=nclasses_max, replace=False, p=vcs / sum(vcs), ) # Select the indices where one of these classes is present idxs = y.index[y.isin(classes)] x = x.iloc[idxs] y = y.iloc[idxs] # Uniformly sample columns if required if len(x.columns) > ncols_max: columns_idxs = rng.choice( list(range(len(x.columns))), size=ncols_max, replace=False ) sorted_column_idxs = sorted(columns_idxs) selected_columns = list(x.columns[sorted_column_idxs]) x = x[selected_columns] else: sorted_column_idxs = list(range(len(x.columns))) if len(x) > nrows_max: # Stratify accordingly target_name = y.name data = pd.concat((x, y), axis="columns") _, subset = train_test_split( data, test_size=nrows_max, stratify=data[target_name], shuffle=True, random_state=seed, ) x = subset.drop(target_name, axis="columns") y = subset[target_name] # We need to convert categorical columns to string for openml categorical_mask = [self.categorical_mask[i] for i in sorted_column_idxs] columns = list(x.columns) return Dataset( # Technically this is not the same but it's where it was derived from dataset=self.dataset, x=x, y=y, categorical_mask=categorical_mask, columns=columns, ) ```

101 features

class (target)nominal10 unique values
0 missing
V142numeric621 unique values
0 missing
V197numeric254 unique values
0 missing
V248numeric255 unique values
0 missing
V285numeric543 unique values
0 missing
V390numeric615 unique values
0 missing
V444numeric579 unique values
0 missing
V448numeric652 unique values
0 missing
V611numeric568 unique values
0 missing
V659numeric683 unique values
0 missing
V831numeric579 unique values
0 missing
V836numeric254 unique values
0 missing
V884numeric254 unique values
0 missing
V888numeric617 unique values
0 missing
V957numeric641 unique values
0 missing
V1025numeric515 unique values
0 missing
V1065numeric649 unique values
0 missing
V1150numeric550 unique values
0 missing
V1453numeric634 unique values
0 missing
V1544numeric566 unique values
0 missing
V1772numeric552 unique values
0 missing
V1829numeric627 unique values
0 missing
V1870numeric554 unique values
0 missing
V1873numeric255 unique values
0 missing
V1944numeric693 unique values
0 missing
V1982numeric627 unique values
0 missing
V2004numeric521 unique values
0 missing
V2097numeric252 unique values
0 missing
V2099numeric700 unique values
0 missing
V2162numeric649 unique values
0 missing
V2218numeric560 unique values
0 missing
V2310numeric736 unique values
0 missing
V2350numeric549 unique values
0 missing
V2478numeric626 unique values
0 missing
V2602numeric649 unique values
0 missing
V2643numeric253 unique values
0 missing
V2735numeric534 unique values
0 missing
V2877numeric255 unique values
0 missing
V2912numeric252 unique values
0 missing
V3011numeric661 unique values
0 missing
V3016numeric513 unique values
0 missing
V3043numeric711 unique values
0 missing
V3066numeric256 unique values
0 missing
V3227numeric566 unique values
0 missing
V3235numeric586 unique values
0 missing
V3266numeric625 unique values
0 missing
V3283numeric692 unique values
0 missing
V3298numeric578 unique values
0 missing
V3361numeric724 unique values
0 missing
V3468numeric534 unique values
0 missing
V3560numeric604 unique values
0 missing
V3586numeric622 unique values
0 missing
V3635numeric550 unique values
0 missing
V3665numeric538 unique values
0 missing
V3676numeric504 unique values
0 missing
V3697numeric254 unique values
0 missing
V3794numeric614 unique values
0 missing
V3834numeric557 unique values
0 missing
V3872numeric489 unique values
0 missing
V3913numeric631 unique values
0 missing
V3924numeric497 unique values
0 missing
V4183numeric254 unique values
0 missing
V4261numeric519 unique values
0 missing
V4396numeric513 unique values
0 missing
V4468numeric650 unique values
0 missing
V4583numeric736 unique values
0 missing
V4607numeric523 unique values
0 missing
V4846numeric674 unique values
0 missing
V5156numeric632 unique values
0 missing
V5184numeric644 unique values
0 missing
V5196numeric554 unique values
0 missing
V5356numeric556 unique values
0 missing
V5360numeric600 unique values
0 missing
V5364numeric252 unique values
0 missing
V5367numeric553 unique values
0 missing
V5419numeric657 unique values
0 missing
V5532numeric585 unique values
0 missing
V5542numeric615 unique values
0 missing
V5568numeric625 unique values
0 missing
V5620numeric251 unique values
0 missing
V5782numeric675 unique values
0 missing
V5826numeric462 unique values
0 missing
V5849numeric642 unique values
0 missing
V5889numeric553 unique values
0 missing
V5902numeric568 unique values
0 missing
V5969numeric750 unique values
0 missing
V6039numeric613 unique values
0 missing
V6126numeric559 unique values
0 missing
V6164numeric640 unique values
0 missing
V6180numeric249 unique values
0 missing
V6260numeric548 unique values
0 missing
V6451numeric618 unique values
0 missing
V6579numeric256 unique values
0 missing
V6618numeric662 unique values
0 missing
V6744numeric501 unique values
0 missing
V6753numeric251 unique values
0 missing
V6876numeric662 unique values
0 missing
V6945numeric520 unique values
0 missing
V6969numeric724 unique values
0 missing
V7011numeric485 unique values
0 missing
V7058numeric687 unique values
0 missing

19 properties

2000
Number of instances (rows) of the dataset.
101
Number of attributes (columns) of the dataset.
10
Number of distinct values of the target attribute (if it is nominal).
0
Number of missing values in the dataset.
0
Number of instances with at least one value missing.
100
Number of numeric attributes.
1
Number of nominal attributes.
0
Percentage of binary attributes.
0
Percentage of instances having missing values.
0.1
Average class difference between consecutive instances.
0
Percentage of missing values.
0.05
Number of attributes divided by the number of instances.
99.01
Percentage of numeric attributes.
10.45
Percentage of instances belonging to the most frequent class.
0.99
Percentage of nominal attributes.
209
Number of instances belonging to the most frequent class.
9.55
Percentage of instances belonging to the least frequent class.
191
Number of instances belonging to the least frequent class.
0
Number of binary attributes.

0 tasks

Define a new task