Data
KDDCup99_seed_1_nrows_2000_nclasses_10_ncols_100_stratify_True

KDDCup99_seed_1_nrows_2000_nclasses_10_ncols_100_stratify_True

active ARFF Publicly available Visibility: public Uploaded 17-11-2022 by Eddie Bergman
0 likes downloaded by 0 people , 0 total downloads 0 issues 0 downvotes
Issue #Downvotes for this reason By


Loading wiki
Help us complete this description Edit
Subsampling of the dataset KDDCup99 (42746) with seed=1 args.nrows=2000 args.ncols=100 args.nclasses=10 args.no_stratify=True Generated with the following source code: ```python def subsample( self, seed: int, nrows_max: int = 2_000, ncols_max: int = 100, nclasses_max: int = 10, stratified: bool = True, ) -> Dataset: rng = np.random.default_rng(seed) x = self.x y = self.y # Uniformly sample classes = y.unique() if len(classes) > nclasses_max: vcs = y.value_counts() selected_classes = rng.choice( classes, size=nclasses_max, replace=False, p=vcs / sum(vcs), ) # Select the indices where one of these classes is present idxs = y.index[y.isin(classes)] x = x.iloc[idxs] y = y.iloc[idxs] # Uniformly sample columns if required if len(x.columns) > ncols_max: columns_idxs = rng.choice( list(range(len(x.columns))), size=ncols_max, replace=False ) sorted_column_idxs = sorted(columns_idxs) selected_columns = list(x.columns[sorted_column_idxs]) x = x[selected_columns] else: sorted_column_idxs = list(range(len(x.columns))) if len(x) > nrows_max: # Stratify accordingly target_name = y.name data = pd.concat((x, y), axis="columns") _, subset = train_test_split( data, test_size=nrows_max, stratify=data[target_name], shuffle=True, random_state=seed, ) x = subset.drop(target_name, axis="columns") y = subset[target_name] # We need to convert categorical columns to string for openml categorical_mask = [self.categorical_mask[i] for i in sorted_column_idxs] columns = list(x.columns) return Dataset( # Technically this is not the same but it's where it was derived from dataset=self.dataset, x=x, y=y, categorical_mask=categorical_mask, columns=columns, ) ```

42 features

target (target)nominal8 unique values
0 missing
durationnumeric25 unique values
0 missing
protocol_typenominal3 unique values
0 missing
servicenominal28 unique values
0 missing
flagnominal6 unique values
0 missing
src_bytesnumeric209 unique values
0 missing
dst_bytesnumeric274 unique values
0 missing
landnominal1 unique values
0 missing
wrong_fragmentnumeric1 unique values
0 missing
urgentnumeric1 unique values
0 missing
hotnumeric4 unique values
0 missing
num_failed_loginsnumeric1 unique values
0 missing
logged_innominal2 unique values
0 missing
num_compromisednumeric2 unique values
0 missing
root_shellnominal1 unique values
0 missing
su_attemptednominal1 unique values
0 missing
num_rootnumeric2 unique values
0 missing
num_file_creationsnumeric2 unique values
0 missing
num_shellsnumeric1 unique values
0 missing
num_access_filesnumeric2 unique values
0 missing
num_outbound_cmdsnumeric1 unique values
0 missing
is_host_loginnominal1 unique values
0 missing
is_guest_loginnominal2 unique values
0 missing
countnumeric245 unique values
0 missing
srv_countnumeric95 unique values
0 missing
serror_ratenumeric8 unique values
0 missing
srv_serror_ratenumeric4 unique values
0 missing
rerror_ratenumeric7 unique values
0 missing
srv_rerror_ratenumeric4 unique values
0 missing
same_srv_ratenumeric35 unique values
0 missing
diff_srv_ratenumeric16 unique values
0 missing
srv_diff_host_ratenumeric35 unique values
0 missing
dst_host_countnumeric127 unique values
0 missing
dst_host_srv_countnumeric113 unique values
0 missing
dst_host_same_srv_ratenumeric69 unique values
0 missing
dst_host_diff_srv_ratenumeric35 unique values
0 missing
dst_host_same_src_port_ratenumeric41 unique values
0 missing
dst_host_srv_diff_host_ratenumeric23 unique values
0 missing
dst_host_serror_ratenumeric15 unique values
0 missing
dst_host_srv_serror_ratenumeric8 unique values
0 missing
dst_host_rerror_ratenumeric16 unique values
0 missing
dst_host_srv_rerror_ratenumeric18 unique values
0 missing

19 properties

2000
Number of instances (rows) of the dataset.
42
Number of attributes (columns) of the dataset.
8
Number of distinct values of the target attribute (if it is nominal).
0
Number of missing values in the dataset.
0
Number of instances with at least one value missing.
32
Number of numeric attributes.
10
Number of nominal attributes.
0.02
Number of attributes divided by the number of instances.
76.19
Percentage of numeric attributes.
57.35
Percentage of instances belonging to the most frequent class.
23.81
Percentage of nominal attributes.
1147
Number of instances belonging to the most frequent class.
0.05
Percentage of instances belonging to the least frequent class.
1
Number of instances belonging to the least frequent class.
5
Number of binary attributes.
11.9
Percentage of binary attributes.
0
Percentage of instances having missing values.
0.42
Average class difference between consecutive instances.
0
Percentage of missing values.

0 tasks

Define a new task