Data
KDDCup99_seed_2_nrows_2000_nclasses_10_ncols_100_stratify_True

KDDCup99_seed_2_nrows_2000_nclasses_10_ncols_100_stratify_True

active ARFF Publicly available Visibility: public Uploaded 17-11-2022 by Eddie Bergman
0 likes downloaded by 0 people , 0 total downloads 0 issues 0 downvotes
Issue #Downvotes for this reason By


Loading wiki
Help us complete this description Edit
Subsampling of the dataset KDDCup99 (42746) with seed=2 args.nrows=2000 args.ncols=100 args.nclasses=10 args.no_stratify=True Generated with the following source code: ```python def subsample( self, seed: int, nrows_max: int = 2_000, ncols_max: int = 100, nclasses_max: int = 10, stratified: bool = True, ) -> Dataset: rng = np.random.default_rng(seed) x = self.x y = self.y # Uniformly sample classes = y.unique() if len(classes) > nclasses_max: vcs = y.value_counts() selected_classes = rng.choice( classes, size=nclasses_max, replace=False, p=vcs / sum(vcs), ) # Select the indices where one of these classes is present idxs = y.index[y.isin(classes)] x = x.iloc[idxs] y = y.iloc[idxs] # Uniformly sample columns if required if len(x.columns) > ncols_max: columns_idxs = rng.choice( list(range(len(x.columns))), size=ncols_max, replace=False ) sorted_column_idxs = sorted(columns_idxs) selected_columns = list(x.columns[sorted_column_idxs]) x = x[selected_columns] else: sorted_column_idxs = list(range(len(x.columns))) if len(x) > nrows_max: # Stratify accordingly target_name = y.name data = pd.concat((x, y), axis="columns") _, subset = train_test_split( data, test_size=nrows_max, stratify=data[target_name], shuffle=True, random_state=seed, ) x = subset.drop(target_name, axis="columns") y = subset[target_name] # We need to convert categorical columns to string for openml categorical_mask = [self.categorical_mask[i] for i in sorted_column_idxs] columns = list(x.columns) return Dataset( # Technically this is not the same but it's where it was derived from dataset=self.dataset, x=x, y=y, categorical_mask=categorical_mask, columns=columns, ) ```

42 features

target (target)nominal8 unique values
0 missing
durationnumeric32 unique values
0 missing
protocol_typenominal3 unique values
0 missing
servicenominal24 unique values
0 missing
flagnominal5 unique values
0 missing
src_bytesnumeric218 unique values
0 missing
dst_bytesnumeric272 unique values
0 missing
landnominal1 unique values
0 missing
wrong_fragmentnumeric1 unique values
0 missing
urgentnumeric1 unique values
0 missing
hotnumeric4 unique values
0 missing
num_failed_loginsnumeric2 unique values
0 missing
logged_innominal2 unique values
0 missing
num_compromisednumeric2 unique values
0 missing
root_shellnominal1 unique values
0 missing
su_attemptednominal1 unique values
0 missing
num_rootnumeric2 unique values
0 missing
num_file_creationsnumeric1 unique values
0 missing
num_shellsnumeric1 unique values
0 missing
num_access_filesnumeric1 unique values
0 missing
num_outbound_cmdsnumeric1 unique values
0 missing
is_host_loginnominal1 unique values
0 missing
is_guest_loginnominal2 unique values
0 missing
countnumeric247 unique values
0 missing
srv_countnumeric95 unique values
0 missing
serror_ratenumeric8 unique values
0 missing
srv_serror_ratenumeric5 unique values
0 missing
rerror_ratenumeric9 unique values
0 missing
srv_rerror_ratenumeric5 unique values
0 missing
same_srv_ratenumeric31 unique values
0 missing
diff_srv_ratenumeric16 unique values
0 missing
srv_diff_host_ratenumeric36 unique values
0 missing
dst_host_countnumeric129 unique values
0 missing
dst_host_srv_countnumeric124 unique values
0 missing
dst_host_same_srv_ratenumeric64 unique values
0 missing
dst_host_diff_srv_ratenumeric39 unique values
0 missing
dst_host_same_src_port_ratenumeric46 unique values
0 missing
dst_host_srv_diff_host_ratenumeric24 unique values
0 missing
dst_host_serror_ratenumeric12 unique values
0 missing
dst_host_srv_serror_ratenumeric3 unique values
0 missing
dst_host_rerror_ratenumeric20 unique values
0 missing
dst_host_srv_rerror_ratenumeric20 unique values
0 missing

19 properties

2000
Number of instances (rows) of the dataset.
42
Number of attributes (columns) of the dataset.
8
Number of distinct values of the target attribute (if it is nominal).
0
Number of missing values in the dataset.
0
Number of instances with at least one value missing.
32
Number of numeric attributes.
10
Number of nominal attributes.
11.9
Percentage of binary attributes.
0
Percentage of instances having missing values.
0.42
Average class difference between consecutive instances.
0
Percentage of missing values.
0.02
Number of attributes divided by the number of instances.
76.19
Percentage of numeric attributes.
57.35
Percentage of instances belonging to the most frequent class.
23.81
Percentage of nominal attributes.
1147
Number of instances belonging to the most frequent class.
0.05
Percentage of instances belonging to the least frequent class.
1
Number of instances belonging to the least frequent class.
5
Number of binary attributes.

0 tasks

Define a new task