Data
KDDCup99_seed_4_nrows_2000_nclasses_10_ncols_100_stratify_True

KDDCup99_seed_4_nrows_2000_nclasses_10_ncols_100_stratify_True

active ARFF Publicly available Visibility: public Uploaded 17-11-2022 by Eddie Bergman
0 likes downloaded by 0 people , 0 total downloads 0 issues 0 downvotes
Issue #Downvotes for this reason By


Loading wiki
Help us complete this description Edit
Subsampling of the dataset KDDCup99 (42746) with seed=4 args.nrows=2000 args.ncols=100 args.nclasses=10 args.no_stratify=True Generated with the following source code: ```python def subsample( self, seed: int, nrows_max: int = 2_000, ncols_max: int = 100, nclasses_max: int = 10, stratified: bool = True, ) -> Dataset: rng = np.random.default_rng(seed) x = self.x y = self.y # Uniformly sample classes = y.unique() if len(classes) > nclasses_max: vcs = y.value_counts() selected_classes = rng.choice( classes, size=nclasses_max, replace=False, p=vcs / sum(vcs), ) # Select the indices where one of these classes is present idxs = y.index[y.isin(classes)] x = x.iloc[idxs] y = y.iloc[idxs] # Uniformly sample columns if required if len(x.columns) > ncols_max: columns_idxs = rng.choice( list(range(len(x.columns))), size=ncols_max, replace=False ) sorted_column_idxs = sorted(columns_idxs) selected_columns = list(x.columns[sorted_column_idxs]) x = x[selected_columns] else: sorted_column_idxs = list(range(len(x.columns))) if len(x) > nrows_max: # Stratify accordingly target_name = y.name data = pd.concat((x, y), axis="columns") _, subset = train_test_split( data, test_size=nrows_max, stratify=data[target_name], shuffle=True, random_state=seed, ) x = subset.drop(target_name, axis="columns") y = subset[target_name] # We need to convert categorical columns to string for openml categorical_mask = [self.categorical_mask[i] for i in sorted_column_idxs] columns = list(x.columns) return Dataset( # Technically this is not the same but it's where it was derived from dataset=self.dataset, x=x, y=y, categorical_mask=categorical_mask, columns=columns, ) ```

42 features

target (target)nominal8 unique values
0 missing
durationnumeric35 unique values
0 missing
protocol_typenominal3 unique values
0 missing
servicenominal26 unique values
0 missing
flagnominal6 unique values
0 missing
src_bytesnumeric208 unique values
0 missing
dst_bytesnumeric265 unique values
0 missing
landnominal1 unique values
0 missing
wrong_fragmentnumeric1 unique values
0 missing
urgentnumeric1 unique values
0 missing
hotnumeric4 unique values
0 missing
num_failed_loginsnumeric1 unique values
0 missing
logged_innominal2 unique values
0 missing
num_compromisednumeric2 unique values
0 missing
root_shellnominal1 unique values
0 missing
su_attemptednominal1 unique values
0 missing
num_rootnumeric3 unique values
0 missing
num_file_creationsnumeric3 unique values
0 missing
num_shellsnumeric1 unique values
0 missing
num_access_filesnumeric2 unique values
0 missing
num_outbound_cmdsnumeric1 unique values
0 missing
is_host_loginnominal1 unique values
0 missing
is_guest_loginnominal2 unique values
0 missing
countnumeric244 unique values
0 missing
srv_countnumeric103 unique values
0 missing
serror_ratenumeric10 unique values
0 missing
srv_serror_ratenumeric5 unique values
0 missing
rerror_ratenumeric10 unique values
0 missing
srv_rerror_ratenumeric8 unique values
0 missing
same_srv_ratenumeric32 unique values
0 missing
diff_srv_ratenumeric18 unique values
0 missing
srv_diff_host_ratenumeric34 unique values
0 missing
dst_host_countnumeric134 unique values
0 missing
dst_host_srv_countnumeric118 unique values
0 missing
dst_host_same_srv_ratenumeric62 unique values
0 missing
dst_host_diff_srv_ratenumeric37 unique values
0 missing
dst_host_same_src_port_ratenumeric37 unique values
0 missing
dst_host_srv_diff_host_ratenumeric24 unique values
0 missing
dst_host_serror_ratenumeric12 unique values
0 missing
dst_host_srv_serror_ratenumeric4 unique values
0 missing
dst_host_rerror_ratenumeric13 unique values
0 missing
dst_host_srv_rerror_ratenumeric21 unique values
0 missing

19 properties

2000
Number of instances (rows) of the dataset.
42
Number of attributes (columns) of the dataset.
8
Number of distinct values of the target attribute (if it is nominal).
0
Number of missing values in the dataset.
0
Number of instances with at least one value missing.
32
Number of numeric attributes.
10
Number of nominal attributes.
57.35
Percentage of instances belonging to the most frequent class.
23.81
Percentage of nominal attributes.
1147
Number of instances belonging to the most frequent class.
0.05
Percentage of instances belonging to the least frequent class.
1
Number of instances belonging to the least frequent class.
5
Number of binary attributes.
11.9
Percentage of binary attributes.
0
Percentage of instances having missing values.
0.42
Average class difference between consecutive instances.
0
Percentage of missing values.
0.02
Number of attributes divided by the number of instances.
76.19
Percentage of numeric attributes.

0 tasks

Define a new task