Data
dilbert_seed_4_nrows_2000_nclasses_10_ncols_100_stratify_True

dilbert_seed_4_nrows_2000_nclasses_10_ncols_100_stratify_True

active ARFF Publicly available Visibility: public Uploaded 17-11-2022 by Eddie Bergman
0 likes downloaded by 0 people , 0 total downloads 0 issues 0 downvotes
Issue #Downvotes for this reason By


Loading wiki
Help us complete this description Edit
Subsampling of the dataset dilbert (41163) with seed=4 args.nrows=2000 args.ncols=100 args.nclasses=10 args.no_stratify=True Generated with the following source code: ```python def subsample( self, seed: int, nrows_max: int = 2_000, ncols_max: int = 100, nclasses_max: int = 10, stratified: bool = True, ) -> Dataset: rng = np.random.default_rng(seed) x = self.x y = self.y # Uniformly sample classes = y.unique() if len(classes) > nclasses_max: vcs = y.value_counts() selected_classes = rng.choice( classes, size=nclasses_max, replace=False, p=vcs / sum(vcs), ) # Select the indices where one of these classes is present idxs = y.index[y.isin(classes)] x = x.iloc[idxs] y = y.iloc[idxs] # Uniformly sample columns if required if len(x.columns) > ncols_max: columns_idxs = rng.choice( list(range(len(x.columns))), size=ncols_max, replace=False ) sorted_column_idxs = sorted(columns_idxs) selected_columns = list(x.columns[sorted_column_idxs]) x = x[selected_columns] else: sorted_column_idxs = list(range(len(x.columns))) if len(x) > nrows_max: # Stratify accordingly target_name = y.name data = pd.concat((x, y), axis="columns") _, subset = train_test_split( data, test_size=nrows_max, stratify=data[target_name], shuffle=True, random_state=seed, ) x = subset.drop(target_name, axis="columns") y = subset[target_name] # We need to convert categorical columns to string for openml categorical_mask = [self.categorical_mask[i] for i in sorted_column_idxs] columns = list(x.columns) return Dataset( # Technically this is not the same but it's where it was derived from dataset=self.dataset, x=x, y=y, categorical_mask=categorical_mask, columns=columns, ) ```

101 features

class (target)nominal5 unique values
0 missing
V54numeric1998 unique values
0 missing
V70numeric1998 unique values
0 missing
V117numeric1999 unique values
0 missing
V155numeric1997 unique values
0 missing
V156numeric1999 unique values
0 missing
V176numeric1998 unique values
0 missing
V228numeric1999 unique values
0 missing
V256numeric1997 unique values
0 missing
V260numeric1994 unique values
0 missing
V269numeric1999 unique values
0 missing
V324numeric1997 unique values
0 missing
V335numeric1999 unique values
0 missing
V341numeric1998 unique values
0 missing
V345numeric1994 unique values
0 missing
V357numeric1998 unique values
0 missing
V379numeric2000 unique values
0 missing
V394numeric1999 unique values
0 missing
V405numeric1999 unique values
0 missing
V422numeric1997 unique values
0 missing
V430numeric1998 unique values
0 missing
V439numeric1997 unique values
0 missing
V442numeric1999 unique values
0 missing
V528numeric1998 unique values
0 missing
V542numeric1998 unique values
0 missing
V590numeric2000 unique values
0 missing
V650numeric1999 unique values
0 missing
V686numeric1998 unique values
0 missing
V712numeric1998 unique values
0 missing
V715numeric1998 unique values
0 missing
V720numeric1998 unique values
0 missing
V721numeric1997 unique values
0 missing
V728numeric1997 unique values
0 missing
V765numeric2000 unique values
0 missing
V794numeric1996 unique values
0 missing
V830numeric1998 unique values
0 missing
V866numeric1988 unique values
0 missing
V900numeric1994 unique values
0 missing
V915numeric1993 unique values
0 missing
V919numeric1991 unique values
0 missing
V942numeric1994 unique values
0 missing
V953numeric1998 unique values
0 missing
V964numeric1988 unique values
0 missing
V971numeric1993 unique values
0 missing
V973numeric1992 unique values
0 missing
V974numeric1996 unique values
0 missing
V978numeric1994 unique values
0 missing
V984numeric1991 unique values
0 missing
V1016numeric1987 unique values
0 missing
V1025numeric1994 unique values
0 missing
V1045numeric1991 unique values
0 missing
V1052numeric1996 unique values
0 missing
V1069numeric1996 unique values
0 missing
V1110numeric1996 unique values
0 missing
V1113numeric1994 unique values
0 missing
V1131numeric1992 unique values
0 missing
V1142numeric1998 unique values
0 missing
V1146numeric1997 unique values
0 missing
V1161numeric1993 unique values
0 missing
V1182numeric1998 unique values
0 missing
V1200numeric1993 unique values
0 missing
V1260numeric1997 unique values
0 missing
V1296numeric1997 unique values
0 missing
V1315numeric1997 unique values
0 missing
V1336numeric1996 unique values
0 missing
V1369numeric1997 unique values
0 missing
V1373numeric1996 unique values
0 missing
V1381numeric1996 unique values
0 missing
V1417numeric1996 unique values
0 missing
V1522numeric1998 unique values
0 missing
V1529numeric1999 unique values
0 missing
V1535numeric1989 unique values
0 missing
V1605numeric1993 unique values
0 missing
V1671numeric1990 unique values
0 missing
V1672numeric1992 unique values
0 missing
V1678numeric1990 unique values
0 missing
V1728numeric1996 unique values
0 missing
V1730numeric1993 unique values
0 missing
V1735numeric1998 unique values
0 missing
V1767numeric1992 unique values
0 missing
V1792numeric1997 unique values
0 missing
V1794numeric1992 unique values
0 missing
V1796numeric1999 unique values
0 missing
V1799numeric1994 unique values
0 missing
V1833numeric2000 unique values
0 missing
V1834numeric1993 unique values
0 missing
V1847numeric1999 unique values
0 missing
V1851numeric1993 unique values
0 missing
V1861numeric1998 unique values
0 missing
V1874numeric1988 unique values
0 missing
V1875numeric1989 unique values
0 missing
V1900numeric1999 unique values
0 missing
V1922numeric1989 unique values
0 missing
V1929numeric1992 unique values
0 missing
V1939numeric1998 unique values
0 missing
V1946numeric1997 unique values
0 missing
V1965numeric1992 unique values
0 missing
V1974numeric1996 unique values
0 missing
V1985numeric1998 unique values
0 missing
V1995numeric1997 unique values
0 missing
V1998numeric1994 unique values
0 missing

19 properties

2000
Number of instances (rows) of the dataset.
101
Number of attributes (columns) of the dataset.
5
Number of distinct values of the target attribute (if it is nominal).
0
Number of missing values in the dataset.
0
Number of instances with at least one value missing.
100
Number of numeric attributes.
1
Number of nominal attributes.
0
Percentage of instances having missing values.
0
Percentage of missing values.
0.19
Average class difference between consecutive instances.
99.01
Percentage of numeric attributes.
0.05
Number of attributes divided by the number of instances.
0.99
Percentage of nominal attributes.
20.5
Percentage of instances belonging to the most frequent class.
410
Number of instances belonging to the most frequent class.
19.1
Percentage of instances belonging to the least frequent class.
382
Number of instances belonging to the least frequent class.
0
Number of binary attributes.
0
Percentage of binary attributes.

0 tasks

Define a new task