Data
robert_seed_3_nrows_2000_nclasses_10_ncols_100_stratify_True

robert_seed_3_nrows_2000_nclasses_10_ncols_100_stratify_True

active ARFF Publicly available Visibility: public Uploaded 17-11-2022 by Eddie Bergman
0 likes downloaded by 0 people , 0 total downloads 0 issues 0 downvotes
Issue #Downvotes for this reason By


Loading wiki
Help us complete this description Edit
Subsampling of the dataset robert (41165) with seed=3 args.nrows=2000 args.ncols=100 args.nclasses=10 args.no_stratify=True Generated with the following source code: ```python def subsample( self, seed: int, nrows_max: int = 2_000, ncols_max: int = 100, nclasses_max: int = 10, stratified: bool = True, ) -> Dataset: rng = np.random.default_rng(seed) x = self.x y = self.y # Uniformly sample classes = y.unique() if len(classes) > nclasses_max: vcs = y.value_counts() selected_classes = rng.choice( classes, size=nclasses_max, replace=False, p=vcs / sum(vcs), ) # Select the indices where one of these classes is present idxs = y.index[y.isin(classes)] x = x.iloc[idxs] y = y.iloc[idxs] # Uniformly sample columns if required if len(x.columns) > ncols_max: columns_idxs = rng.choice( list(range(len(x.columns))), size=ncols_max, replace=False ) sorted_column_idxs = sorted(columns_idxs) selected_columns = list(x.columns[sorted_column_idxs]) x = x[selected_columns] else: sorted_column_idxs = list(range(len(x.columns))) if len(x) > nrows_max: # Stratify accordingly target_name = y.name data = pd.concat((x, y), axis="columns") _, subset = train_test_split( data, test_size=nrows_max, stratify=data[target_name], shuffle=True, random_state=seed, ) x = subset.drop(target_name, axis="columns") y = subset[target_name] # We need to convert categorical columns to string for openml categorical_mask = [self.categorical_mask[i] for i in sorted_column_idxs] columns = list(x.columns) return Dataset( # Technically this is not the same but it's where it was derived from dataset=self.dataset, x=x, y=y, categorical_mask=categorical_mask, columns=columns, ) ```

101 features

class (target)nominal10 unique values
0 missing
V11numeric629 unique values
0 missing
V34numeric473 unique values
0 missing
V218numeric590 unique values
0 missing
V233numeric573 unique values
0 missing
V281numeric519 unique values
0 missing
V309numeric549 unique values
0 missing
V550numeric610 unique values
0 missing
V609numeric552 unique values
0 missing
V651numeric757 unique values
0 missing
V670numeric599 unique values
0 missing
V737numeric574 unique values
0 missing
V810numeric570 unique values
0 missing
V996numeric634 unique values
0 missing
V1137numeric731 unique values
0 missing
V1233numeric610 unique values
0 missing
V1275numeric252 unique values
0 missing
V1289numeric598 unique values
0 missing
V1362numeric630 unique values
0 missing
V1486numeric540 unique values
0 missing
V1571numeric652 unique values
0 missing
V1618numeric246 unique values
0 missing
V1683numeric634 unique values
0 missing
V1751numeric538 unique values
0 missing
V1779numeric718 unique values
0 missing
V1817numeric254 unique values
0 missing
V1822numeric626 unique values
0 missing
V1884numeric479 unique values
0 missing
V2028numeric662 unique values
0 missing
V2091numeric552 unique values
0 missing
V2100numeric520 unique values
0 missing
V2133numeric614 unique values
0 missing
V2140numeric616 unique values
0 missing
V2245numeric650 unique values
0 missing
V2284numeric597 unique values
0 missing
V2343numeric522 unique values
0 missing
V2363numeric557 unique values
0 missing
V2681numeric622 unique values
0 missing
V2768numeric559 unique values
0 missing
V2786numeric604 unique values
0 missing
V2787numeric553 unique values
0 missing
V2837numeric662 unique values
0 missing
V2994numeric550 unique values
0 missing
V3069numeric720 unique values
0 missing
V3081numeric611 unique values
0 missing
V3082numeric610 unique values
0 missing
V3220numeric608 unique values
0 missing
V3372numeric637 unique values
0 missing
V3396numeric574 unique values
0 missing
V3408numeric529 unique values
0 missing
V3497numeric718 unique values
0 missing
V3682numeric600 unique values
0 missing
V3722numeric583 unique values
0 missing
V3729numeric548 unique values
0 missing
V4139numeric623 unique values
0 missing
V4183numeric255 unique values
0 missing
V4186numeric485 unique values
0 missing
V4271numeric553 unique values
0 missing
V4361numeric562 unique values
0 missing
V4362numeric249 unique values
0 missing
V4419numeric253 unique values
0 missing
V4520numeric617 unique values
0 missing
V4629numeric537 unique values
0 missing
V4642numeric253 unique values
0 missing
V4694numeric254 unique values
0 missing
V4725numeric549 unique values
0 missing
V4726numeric541 unique values
0 missing
V4734numeric643 unique values
0 missing
V4739numeric543 unique values
0 missing
V4749numeric667 unique values
0 missing
V4907numeric577 unique values
0 missing
V4921numeric641 unique values
0 missing
V4970numeric611 unique values
0 missing
V5062numeric639 unique values
0 missing
V5184numeric625 unique values
0 missing
V5229numeric702 unique values
0 missing
V5261numeric517 unique values
0 missing
V5323numeric575 unique values
0 missing
V5397numeric567 unique values
0 missing
V5457numeric551 unique values
0 missing
V5534numeric476 unique values
0 missing
V5608numeric624 unique values
0 missing
V5694numeric733 unique values
0 missing
V5763numeric647 unique values
0 missing
V5895numeric506 unique values
0 missing
V5961numeric612 unique values
0 missing
V6053numeric646 unique values
0 missing
V6117numeric550 unique values
0 missing
V6178numeric682 unique values
0 missing
V6206numeric622 unique values
0 missing
V6320numeric558 unique values
0 missing
V6324numeric721 unique values
0 missing
V6376numeric722 unique values
0 missing
V6569numeric622 unique values
0 missing
V6653numeric653 unique values
0 missing
V6677numeric249 unique values
0 missing
V6704numeric618 unique values
0 missing
V6745numeric618 unique values
0 missing
V6821numeric665 unique values
0 missing
V6882numeric611 unique values
0 missing
V6955numeric668 unique values
0 missing

19 properties

2000
Number of instances (rows) of the dataset.
101
Number of attributes (columns) of the dataset.
10
Number of distinct values of the target attribute (if it is nominal).
0
Number of missing values in the dataset.
0
Number of instances with at least one value missing.
100
Number of numeric attributes.
1
Number of nominal attributes.
0
Percentage of binary attributes.
0
Percentage of instances having missing values.
0.1
Average class difference between consecutive instances.
0
Percentage of missing values.
0.05
Number of attributes divided by the number of instances.
99.01
Percentage of numeric attributes.
10.4
Percentage of instances belonging to the most frequent class.
0.99
Percentage of nominal attributes.
208
Number of instances belonging to the most frequent class.
9.6
Percentage of instances belonging to the least frequent class.
192
Number of instances belonging to the least frequent class.
0
Number of binary attributes.

0 tasks

Define a new task