Data
dilbert_seed_1_nrows_2000_nclasses_10_ncols_100_stratify_True

dilbert_seed_1_nrows_2000_nclasses_10_ncols_100_stratify_True

active ARFF Publicly available Visibility: public Uploaded 17-11-2022 by Eddie Bergman
0 likes downloaded by 0 people , 0 total downloads 0 issues 0 downvotes
Issue #Downvotes for this reason By


Loading wiki
Help us complete this description Edit
Subsampling of the dataset dilbert (41163) with seed=1 args.nrows=2000 args.ncols=100 args.nclasses=10 args.no_stratify=True Generated with the following source code: ```python def subsample( self, seed: int, nrows_max: int = 2_000, ncols_max: int = 100, nclasses_max: int = 10, stratified: bool = True, ) -> Dataset: rng = np.random.default_rng(seed) x = self.x y = self.y # Uniformly sample classes = y.unique() if len(classes) > nclasses_max: vcs = y.value_counts() selected_classes = rng.choice( classes, size=nclasses_max, replace=False, p=vcs / sum(vcs), ) # Select the indices where one of these classes is present idxs = y.index[y.isin(classes)] x = x.iloc[idxs] y = y.iloc[idxs] # Uniformly sample columns if required if len(x.columns) > ncols_max: columns_idxs = rng.choice( list(range(len(x.columns))), size=ncols_max, replace=False ) sorted_column_idxs = sorted(columns_idxs) selected_columns = list(x.columns[sorted_column_idxs]) x = x[selected_columns] else: sorted_column_idxs = list(range(len(x.columns))) if len(x) > nrows_max: # Stratify accordingly target_name = y.name data = pd.concat((x, y), axis="columns") _, subset = train_test_split( data, test_size=nrows_max, stratify=data[target_name], shuffle=True, random_state=seed, ) x = subset.drop(target_name, axis="columns") y = subset[target_name] # We need to convert categorical columns to string for openml categorical_mask = [self.categorical_mask[i] for i in sorted_column_idxs] columns = list(x.columns) return Dataset( # Technically this is not the same but it's where it was derived from dataset=self.dataset, x=x, y=y, categorical_mask=categorical_mask, columns=columns, ) ```

101 features

class (target)nominal5 unique values
0 missing
V39numeric1994 unique values
0 missing
V53numeric1998 unique values
0 missing
V67numeric1997 unique values
0 missing
V79numeric1997 unique values
0 missing
V109numeric1997 unique values
0 missing
V121numeric1996 unique values
0 missing
V124numeric1996 unique values
0 missing
V165numeric1995 unique values
0 missing
V180numeric1995 unique values
0 missing
V228numeric1998 unique values
0 missing
V240numeric1998 unique values
0 missing
V241numeric1997 unique values
0 missing
V260numeric1997 unique values
0 missing
V275numeric1998 unique values
0 missing
V296numeric2000 unique values
0 missing
V315numeric1998 unique values
0 missing
V395numeric1998 unique values
0 missing
V427numeric1997 unique values
0 missing
V476numeric1998 unique values
0 missing
V493numeric1997 unique values
0 missing
V509numeric1999 unique values
0 missing
V518numeric1998 unique values
0 missing
V523numeric1996 unique values
0 missing
V542numeric1997 unique values
0 missing
V546numeric1996 unique values
0 missing
V573numeric2000 unique values
0 missing
V576numeric2000 unique values
0 missing
V586numeric1998 unique values
0 missing
V596numeric1996 unique values
0 missing
V632numeric1998 unique values
0 missing
V636numeric1999 unique values
0 missing
V686numeric1994 unique values
0 missing
V715numeric1999 unique values
0 missing
V729numeric1999 unique values
0 missing
V742numeric1998 unique values
0 missing
V781numeric1998 unique values
0 missing
V785numeric1997 unique values
0 missing
V810numeric1998 unique values
0 missing
V826numeric1997 unique values
0 missing
V835numeric1998 unique values
0 missing
V844numeric1993 unique values
0 missing
V873numeric1993 unique values
0 missing
V877numeric1995 unique values
0 missing
V897numeric1991 unique values
0 missing
V900numeric1994 unique values
0 missing
V906numeric1994 unique values
0 missing
V909numeric1993 unique values
0 missing
V945numeric1993 unique values
0 missing
V970numeric1997 unique values
0 missing
V974numeric1997 unique values
0 missing
V1013numeric1993 unique values
0 missing
V1015numeric1991 unique values
0 missing
V1019numeric1998 unique values
0 missing
V1036numeric1994 unique values
0 missing
V1045numeric1992 unique values
0 missing
V1055numeric1996 unique values
0 missing
V1058numeric1996 unique values
0 missing
V1090numeric1998 unique values
0 missing
V1158numeric1996 unique values
0 missing
V1178numeric1997 unique values
0 missing
V1208numeric1998 unique values
0 missing
V1226numeric1998 unique values
0 missing
V1235numeric1997 unique values
0 missing
V1272numeric1998 unique values
0 missing
V1343numeric2000 unique values
0 missing
V1415numeric1999 unique values
0 missing
V1420numeric1993 unique values
0 missing
V1421numeric1996 unique values
0 missing
V1438numeric1995 unique values
0 missing
V1449numeric1990 unique values
0 missing
V1458numeric1995 unique values
0 missing
V1461numeric2000 unique values
0 missing
V1504numeric1997 unique values
0 missing
V1521numeric1999 unique values
0 missing
V1524numeric1999 unique values
0 missing
V1528numeric1996 unique values
0 missing
V1529numeric1997 unique values
0 missing
V1570numeric1995 unique values
0 missing
V1574numeric1994 unique values
0 missing
V1585numeric1998 unique values
0 missing
V1599numeric1992 unique values
0 missing
V1612numeric1995 unique values
0 missing
V1640numeric1991 unique values
0 missing
V1661numeric1996 unique values
0 missing
V1663numeric1994 unique values
0 missing
V1672numeric1991 unique values
0 missing
V1692numeric1995 unique values
0 missing
V1725numeric1997 unique values
0 missing
V1751numeric2000 unique values
0 missing
V1809numeric1994 unique values
0 missing
V1810numeric1993 unique values
0 missing
V1811numeric1998 unique values
0 missing
V1876numeric1998 unique values
0 missing
V1889numeric1997 unique values
0 missing
V1902numeric1998 unique values
0 missing
V1911numeric1998 unique values
0 missing
V1939numeric1994 unique values
0 missing
V1957numeric1998 unique values
0 missing
V1964numeric1996 unique values
0 missing
V1972numeric1998 unique values
0 missing

19 properties

2000
Number of instances (rows) of the dataset.
101
Number of attributes (columns) of the dataset.
5
Number of distinct values of the target attribute (if it is nominal).
0
Number of missing values in the dataset.
0
Number of instances with at least one value missing.
100
Number of numeric attributes.
1
Number of nominal attributes.
0
Percentage of binary attributes.
0
Percentage of instances having missing values.
0.2
Average class difference between consecutive instances.
0
Percentage of missing values.
0.05
Number of attributes divided by the number of instances.
99.01
Percentage of numeric attributes.
20.5
Percentage of instances belonging to the most frequent class.
0.99
Percentage of nominal attributes.
410
Number of instances belonging to the most frequent class.
19.15
Percentage of instances belonging to the least frequent class.
383
Number of instances belonging to the least frequent class.
0
Number of binary attributes.

0 tasks

Define a new task