Data
dilbert_seed_0_nrows_2000_nclasses_10_ncols_100_stratify_True

dilbert_seed_0_nrows_2000_nclasses_10_ncols_100_stratify_True

active ARFF Publicly available Visibility: public Uploaded 17-11-2022 by Eddie Bergman
0 likes downloaded by 0 people , 0 total downloads 0 issues 0 downvotes
Issue #Downvotes for this reason By


Loading wiki
Help us complete this description Edit
Subsampling of the dataset dilbert (41163) with seed=0 args.nrows=2000 args.ncols=100 args.nclasses=10 args.no_stratify=True Generated with the following source code: ```python def subsample( self, seed: int, nrows_max: int = 2_000, ncols_max: int = 100, nclasses_max: int = 10, stratified: bool = True, ) -> Dataset: rng = np.random.default_rng(seed) x = self.x y = self.y # Uniformly sample classes = y.unique() if len(classes) > nclasses_max: vcs = y.value_counts() selected_classes = rng.choice( classes, size=nclasses_max, replace=False, p=vcs / sum(vcs), ) # Select the indices where one of these classes is present idxs = y.index[y.isin(classes)] x = x.iloc[idxs] y = y.iloc[idxs] # Uniformly sample columns if required if len(x.columns) > ncols_max: columns_idxs = rng.choice( list(range(len(x.columns))), size=ncols_max, replace=False ) sorted_column_idxs = sorted(columns_idxs) selected_columns = list(x.columns[sorted_column_idxs]) x = x[selected_columns] else: sorted_column_idxs = list(range(len(x.columns))) if len(x) > nrows_max: # Stratify accordingly target_name = y.name data = pd.concat((x, y), axis="columns") _, subset = train_test_split( data, test_size=nrows_max, stratify=data[target_name], shuffle=True, random_state=seed, ) x = subset.drop(target_name, axis="columns") y = subset[target_name] # We need to convert categorical columns to string for openml categorical_mask = [self.categorical_mask[i] for i in sorted_column_idxs] columns = list(x.columns) return Dataset( # Technically this is not the same but it's where it was derived from dataset=self.dataset, x=x, y=y, categorical_mask=categorical_mask, columns=columns, ) ```

101 features

class (target)nominal5 unique values
0 missing
V6numeric1997 unique values
0 missing
V11numeric1998 unique values
0 missing
V17numeric1994 unique values
0 missing
V32numeric1994 unique values
0 missing
V43numeric1996 unique values
0 missing
V55numeric1999 unique values
0 missing
V65numeric1999 unique values
0 missing
V79numeric1997 unique values
0 missing
V97numeric1997 unique values
0 missing
V144numeric1996 unique values
0 missing
V145numeric1995 unique values
0 missing
V156numeric1994 unique values
0 missing
V168numeric1997 unique values
0 missing
V173numeric2000 unique values
0 missing
V242numeric1999 unique values
0 missing
V266numeric1996 unique values
0 missing
V335numeric1995 unique values
0 missing
V340numeric1996 unique values
0 missing
V453numeric1998 unique values
0 missing
V502numeric2000 unique values
0 missing
V505numeric2000 unique values
0 missing
V514numeric1995 unique values
0 missing
V527numeric2000 unique values
0 missing
V533numeric1999 unique values
0 missing
V581numeric1998 unique values
0 missing
V587numeric2000 unique values
0 missing
V612numeric1999 unique values
0 missing
V639numeric1998 unique values
0 missing
V654numeric1994 unique values
0 missing
V672numeric1997 unique values
0 missing
V709numeric1998 unique values
0 missing
V740numeric1999 unique values
0 missing
V743numeric1997 unique values
0 missing
V749numeric1999 unique values
0 missing
V754numeric1998 unique values
0 missing
V759numeric1997 unique values
0 missing
V764numeric1997 unique values
0 missing
V780numeric1996 unique values
0 missing
V783numeric1997 unique values
0 missing
V821numeric1997 unique values
0 missing
V835numeric1997 unique values
0 missing
V901numeric1989 unique values
0 missing
V933numeric1994 unique values
0 missing
V960numeric1993 unique values
0 missing
V964numeric1991 unique values
0 missing
V973numeric1993 unique values
0 missing
V1003numeric1997 unique values
0 missing
V1024numeric1993 unique values
0 missing
V1035numeric1994 unique values
0 missing
V1043numeric1990 unique values
0 missing
V1049numeric1996 unique values
0 missing
V1052numeric1996 unique values
0 missing
V1069numeric2000 unique values
0 missing
V1075numeric1994 unique values
0 missing
V1133numeric1995 unique values
0 missing
V1139numeric1998 unique values
0 missing
V1162numeric1997 unique values
0 missing
V1181numeric1998 unique values
0 missing
V1201numeric1998 unique values
0 missing
V1212numeric1998 unique values
0 missing
V1213numeric1997 unique values
0 missing
V1242numeric1999 unique values
0 missing
V1244numeric1996 unique values
0 missing
V1261numeric1998 unique values
0 missing
V1275numeric1997 unique values
0 missing
V1291numeric1997 unique values
0 missing
V1306numeric1988 unique values
0 missing
V1333numeric1994 unique values
0 missing
V1343numeric2000 unique values
0 missing
V1351numeric1994 unique values
0 missing
V1382numeric1999 unique values
0 missing
V1398numeric1998 unique values
0 missing
V1409numeric1999 unique values
0 missing
V1420numeric1991 unique values
0 missing
V1425numeric1995 unique values
0 missing
V1429numeric2000 unique values
0 missing
V1476numeric1992 unique values
0 missing
V1491numeric1996 unique values
0 missing
V1513numeric1995 unique values
0 missing
V1554numeric1997 unique values
0 missing
V1569numeric1990 unique values
0 missing
V1574numeric1996 unique values
0 missing
V1618numeric1993 unique values
0 missing
V1635numeric1993 unique values
0 missing
V1648numeric1992 unique values
0 missing
V1652numeric1996 unique values
0 missing
V1665numeric1993 unique values
0 missing
V1666numeric1992 unique values
0 missing
V1670numeric1993 unique values
0 missing
V1720numeric1996 unique values
0 missing
V1746numeric2000 unique values
0 missing
V1758numeric1998 unique values
0 missing
V1774numeric1991 unique values
0 missing
V1796numeric1999 unique values
0 missing
V1848numeric1997 unique values
0 missing
V1859numeric1999 unique values
0 missing
V1862numeric1992 unique values
0 missing
V1919numeric1998 unique values
0 missing
V1949numeric1995 unique values
0 missing
V1975numeric1997 unique values
0 missing

19 properties

2000
Number of instances (rows) of the dataset.
101
Number of attributes (columns) of the dataset.
5
Number of distinct values of the target attribute (if it is nominal).
0
Number of missing values in the dataset.
0
Number of instances with at least one value missing.
100
Number of numeric attributes.
1
Number of nominal attributes.
0
Percentage of binary attributes.
0
Percentage of instances having missing values.
0.2
Average class difference between consecutive instances.
0
Percentage of missing values.
0.05
Number of attributes divided by the number of instances.
99.01
Percentage of numeric attributes.
20.5
Percentage of instances belonging to the most frequent class.
0.99
Percentage of nominal attributes.
410
Number of instances belonging to the most frequent class.
19.1
Percentage of instances belonging to the least frequent class.
382
Number of instances belonging to the least frequent class.
0
Number of binary attributes.

0 tasks

Define a new task