Data
christine_seed_2_nrows_2000_nclasses_10_ncols_100_stratify_True

christine_seed_2_nrows_2000_nclasses_10_ncols_100_stratify_True

active ARFF Publicly available Visibility: public Uploaded 17-11-2022 by Eddie Bergman
0 likes downloaded by 0 people , 0 total downloads 0 issues 0 downvotes
Issue #Downvotes for this reason By


Loading wiki
Help us complete this description Edit
Subsampling of the dataset christine (41142) with seed=2 args.nrows=2000 args.ncols=100 args.nclasses=10 args.no_stratify=True Generated with the following source code: ```python def subsample( self, seed: int, nrows_max: int = 2_000, ncols_max: int = 100, nclasses_max: int = 10, stratified: bool = True, ) -> Dataset: rng = np.random.default_rng(seed) x = self.x y = self.y # Uniformly sample classes = y.unique() if len(classes) > nclasses_max: vcs = y.value_counts() selected_classes = rng.choice( classes, size=nclasses_max, replace=False, p=vcs / sum(vcs), ) # Select the indices where one of these classes is present idxs = y.index[y.isin(classes)] x = x.iloc[idxs] y = y.iloc[idxs] # Uniformly sample columns if required if len(x.columns) > ncols_max: columns_idxs = rng.choice( list(range(len(x.columns))), size=ncols_max, replace=False ) sorted_column_idxs = sorted(columns_idxs) selected_columns = list(x.columns[sorted_column_idxs]) x = x[selected_columns] else: sorted_column_idxs = list(range(len(x.columns))) if len(x) > nrows_max: # Stratify accordingly target_name = y.name data = pd.concat((x, y), axis="columns") _, subset = train_test_split( data, test_size=nrows_max, stratify=data[target_name], shuffle=True, random_state=seed, ) x = subset.drop(target_name, axis="columns") y = subset[target_name] # We need to convert categorical columns to string for openml categorical_mask = [self.categorical_mask[i] for i in sorted_column_idxs] columns = list(x.columns) return Dataset( # Technically this is not the same but it's where it was derived from dataset=self.dataset, x=x, y=y, categorical_mask=categorical_mask, columns=columns, ) ```

101 features

class (target)nominal2 unique values
0 missing
V63numeric256 unique values
0 missing
V64numeric465 unique values
0 missing
V86numeric464 unique values
0 missing
V93numeric270 unique values
0 missing
V125numeric24 unique values
0 missing
V142numeric504 unique values
0 missing
V158numeric51 unique values
0 missing
V165numeric457 unique values
0 missing
V167numeric290 unique values
0 missing
V169numeric392 unique values
0 missing
V171numeric197 unique values
0 missing
V235numeric269 unique values
0 missing
V292numeric381 unique values
0 missing
V296numeric286 unique values
0 missing
V314numeric429 unique values
0 missing
V323numeric98 unique values
0 missing
V345numeric353 unique values
0 missing
V353numeric423 unique values
0 missing
V356numeric583 unique values
0 missing
V403numeric265 unique values
0 missing
V406numeric88 unique values
0 missing
V423numeric244 unique values
0 missing
V428numeric308 unique values
0 missing
V460numeric57 unique values
0 missing
V476numeric442 unique values
0 missing
V505numeric518 unique values
0 missing
V518numeric419 unique values
0 missing
V526numeric343 unique values
0 missing
V546nominal1 unique values
0 missing
V552numeric327 unique values
0 missing
V599numeric353 unique values
0 missing
V617numeric137 unique values
0 missing
V638numeric30 unique values
0 missing
V654numeric378 unique values
0 missing
V663numeric423 unique values
0 missing
V676numeric518 unique values
0 missing
V697numeric442 unique values
0 missing
V709numeric501 unique values
0 missing
V713numeric268 unique values
0 missing
V719numeric450 unique values
0 missing
V722numeric395 unique values
0 missing
V730numeric273 unique values
0 missing
V749numeric592 unique values
0 missing
V761numeric385 unique values
0 missing
V787numeric486 unique values
0 missing
V801numeric169 unique values
0 missing
V808numeric41 unique values
0 missing
V809numeric427 unique values
0 missing
V820numeric371 unique values
0 missing
V832numeric367 unique values
0 missing
V837numeric462 unique values
0 missing
V853numeric473 unique values
0 missing
V867numeric377 unique values
0 missing
V877numeric380 unique values
0 missing
V906numeric399 unique values
0 missing
V916numeric62 unique values
0 missing
V928numeric247 unique values
0 missing
V954numeric497 unique values
0 missing
V956numeric362 unique values
0 missing
V957numeric87 unique values
0 missing
V993numeric430 unique values
0 missing
V995numeric527 unique values
0 missing
V1023numeric511 unique values
0 missing
V1030numeric434 unique values
0 missing
V1035numeric72 unique values
0 missing
V1047numeric395 unique values
0 missing
V1061numeric51 unique values
0 missing
V1074numeric350 unique values
0 missing
V1090numeric410 unique values
0 missing
V1095numeric464 unique values
0 missing
V1105numeric310 unique values
0 missing
V1116numeric329 unique values
0 missing
V1127numeric212 unique values
0 missing
V1128numeric429 unique values
0 missing
V1170numeric134 unique values
0 missing
V1229nominal1 unique values
0 missing
V1253numeric387 unique values
0 missing
V1256nominal1 unique values
0 missing
V1259numeric392 unique values
0 missing
V1265numeric487 unique values
0 missing
V1288numeric251 unique values
0 missing
V1346numeric453 unique values
0 missing
V1363numeric465 unique values
0 missing
V1364numeric356 unique values
0 missing
V1366numeric197 unique values
0 missing
V1389numeric236 unique values
0 missing
V1392numeric242 unique values
0 missing
V1410numeric380 unique values
0 missing
V1416numeric345 unique values
0 missing
V1441numeric478 unique values
0 missing
V1444numeric339 unique values
0 missing
V1467numeric81 unique values
0 missing
V1468numeric525 unique values
0 missing
V1480numeric409 unique values
0 missing
V1519numeric419 unique values
0 missing
V1530numeric439 unique values
0 missing
V1538numeric235 unique values
0 missing
V1548numeric130 unique values
0 missing
V1613numeric467 unique values
0 missing
V1615numeric416 unique values
0 missing

19 properties

2000
Number of instances (rows) of the dataset.
101
Number of attributes (columns) of the dataset.
2
Number of distinct values of the target attribute (if it is nominal).
0
Number of missing values in the dataset.
0
Number of instances with at least one value missing.
97
Number of numeric attributes.
4
Number of nominal attributes.
0.99
Percentage of binary attributes.
0
Percentage of instances having missing values.
0
Percentage of missing values.
0.52
Average class difference between consecutive instances.
96.04
Percentage of numeric attributes.
0.05
Number of attributes divided by the number of instances.
3.96
Percentage of nominal attributes.
50
Percentage of instances belonging to the most frequent class.
1000
Number of instances belonging to the most frequent class.
50
Percentage of instances belonging to the least frequent class.
1000
Number of instances belonging to the least frequent class.
1
Number of binary attributes.

0 tasks

Define a new task