Data
Internet-Advertisements_seed_2_nrows_2000_nclasses_10_ncols_100_stratify_True

Internet-Advertisements_seed_2_nrows_2000_nclasses_10_ncols_100_stratify_True

active ARFF Publicly available Visibility: public Uploaded 17-11-2022 by Eddie Bergman
0 likes downloaded by 0 people , 0 total downloads 0 issues 0 downvotes
Issue #Downvotes for this reason By


Loading wiki
Help us complete this description Edit
Subsampling of the dataset Internet-Advertisements (40978) with seed=2 args.nrows=2000 args.ncols=100 args.nclasses=10 args.no_stratify=True Generated with the following source code: ```python def subsample( self, seed: int, nrows_max: int = 2_000, ncols_max: int = 100, nclasses_max: int = 10, stratified: bool = True, ) -> Dataset: rng = np.random.default_rng(seed) x = self.x y = self.y # Uniformly sample classes = y.unique() if len(classes) > nclasses_max: vcs = y.value_counts() selected_classes = rng.choice( classes, size=nclasses_max, replace=False, p=vcs / sum(vcs), ) # Select the indices where one of these classes is present idxs = y.index[y.isin(classes)] x = x.iloc[idxs] y = y.iloc[idxs] # Uniformly sample columns if required if len(x.columns) > ncols_max: columns_idxs = rng.choice( list(range(len(x.columns))), size=ncols_max, replace=False ) sorted_column_idxs = sorted(columns_idxs) selected_columns = list(x.columns[sorted_column_idxs]) x = x[selected_columns] else: sorted_column_idxs = list(range(len(x.columns))) if len(x) > nrows_max: # Stratify accordingly target_name = y.name data = pd.concat((x, y), axis="columns") _, subset = train_test_split( data, test_size=nrows_max, stratify=data[target_name], shuffle=True, random_state=seed, ) x = subset.drop(target_name, axis="columns") y = subset[target_name] # We need to convert categorical columns to string for openml categorical_mask = [self.categorical_mask[i] for i in sorted_column_idxs] columns = list(x.columns) return Dataset( # Technically this is not the same but it's where it was derived from dataset=self.dataset, x=x, y=y, categorical_mask=categorical_mask, columns=columns, ) ```

101 features

class (target)nominal2 unique values
0 missing
url.ads.switchboard.comnominal2 unique values
0 missing
url.keith.dumblenominal2 unique values
0 missing
url.ucsd.edunominal2 unique values
0 missing
url.geoguideii.nhnominal2 unique values
0 missing
url.derivednominal2 unique values
0 missing
url.timenominal2 unique values
0 missing
url.pharmacy.gifnominal2 unique values
0 missing
url.forumsnominal2 unique values
0 missing
url.images.go2net.comnominal2 unique values
0 missing
url.users.aol.comnominal2 unique values
0 missing
url.www.cqi.comnominal2 unique values
0 missing
url.claw1.gifnominal2 unique values
0 missing
url.ballnominal2 unique values
0 missing
url.htm.imagesnominal2 unique values
0 missing
url.buttonsnominal2 unique values
0 missing
url.bull.gifnominal2 unique values
0 missing
url.www.copymat.comnominal2 unique values
0 missing
url.tkainenominal2 unique values
0 missing
url.lovisa1nominal2 unique values
0 missing
url.uknominal2 unique values
0 missing
url.nhnominal2 unique values
0 missing
url.kandinsky.tnominal2 unique values
0 missing
url.uk.gifnominal2 unique values
0 missing
url.web.ukonline.co.uknominal2 unique values
0 missing
url.tournominal2 unique values
0 missing
origurl.adamspharmacy.comnominal2 unique values
0 missing
origurl.web.ukonline.co.uknominal2 unique values
0 missing
origurl.www.news.observer.comnominal2 unique values
0 missing
origurl.targetnominal2 unique values
0 missing
origurl.index.htmnominal2 unique values
0 missing
origurl.www.monmouth.comnominal2 unique values
0 missing
origurl.www.cob.sjsu.edunominal2 unique values
0 missing
origurl.kandinnominal2 unique values
0 missing
origurl.kushmericknominal2 unique values
0 missing
origurl.sjnominal2 unique values
0 missing
origurl.linksnominal2 unique values
0 missing
origurl.00.htmlnominal2 unique values
0 missing
origurl.leonora.htmlnominal2 unique values
0 missing
origurl.loftsnominal2 unique values
0 missing
origurl.5.hpenominal2 unique values
0 missing
origurl.yosemite.4301nominal2 unique values
0 missing
origurl.sjsu.edunominal2 unique values
0 missing
origurl.inwap.comnominal2 unique values
0 missing
origurl.lycos.co.uknominal2 unique values
0 missing
origurl.rannominal2 unique values
0 missing
origurl.internauts.canominal2 unique values
0 missing
origurl.coltrane.htmnominal2 unique values
0 missing
origurl.www.psnw.comnominal2 unique values
0 missing
origurl.tkaine.mogwhinominal2 unique values
0 missing
origurl.2647.chopinnominal2 unique values
0 missing
origurl.polypkemnominal2 unique values
0 missing
origurl.sohonominal2 unique values
0 missing
origurl.dreamn.comnominal2 unique values
0 missing
origurl.alley.6750nominal2 unique values
0 missing
origurl.charlie.htmlnominal2 unique values
0 missing
origurl.simstorynominal2 unique values
0 missing
origurl.hpe.10nominal2 unique values
0 missing
origurl.3727.turbonominal2 unique values
0 missing
origurl.messier.htmlnominal2 unique values
0 missing
origurl.mandypaul.mainnominal2 unique values
0 missing
origurl.square.chapelnominal2 unique values
0 missing
origurl.txtnominal2 unique values
0 missing
origurl.fbox.vt.edunominal2 unique values
0 missing
ancurl.express.scripts.comnominal2 unique values
0 missing
ancurl.www.internauts.canominal2 unique values
0 missing
ancurl.ran.orgnominal2 unique values
0 missing
ancurl.sendformnominal2 unique values
0 missing
ancurl.msnnominal2 unique values
0 missing
ancurl.monmouth.comnominal2 unique values
0 missing
ancurl.www.ibitexas.comnominal2 unique values
0 missing
ancurl.clickidnominal2 unique values
0 missing
ancurl.home.netscape.comnominal2 unique values
0 missing
ancurl.redirect.dllnominal2 unique values
0 missing
ancurl.www.mei.co.jpnominal2 unique values
0 missing
ancurl.comprodnominal2 unique values
0 missing
ancurl.kat001nominal2 unique values
0 missing
ancurl.pagenominal2 unique values
0 missing
ancurl.lspace.orgnominal2 unique values
0 missing
ancurl.aboutnominal2 unique values
0 missing
ancurl.526.redirectnominal2 unique values
0 missing
ancurl.lg.gifnominal2 unique values
0 missing
ancurl.www.wco.comnominal2 unique values
0 missing
ancurl.valley.2647nominal2 unique values
0 missing
ancurl.anynominal2 unique values
0 missing
ancurl.pointenominal2 unique values
0 missing
ancurl.yahoo.co.uknominal2 unique values
0 missing
ancurl.sjnominal2 unique values
0 missing
ancurl.mcnominal2 unique values
0 missing
ancurl.www.2meta.comnominal2 unique values
0 missing
ancurl.ring.clawnominal2 unique values
0 missing
ancurl.2fnominal2 unique values
0 missing
ancurl.dll.typenominal2 unique values
0 missing
ancurl.web.ukonline.co.uknominal2 unique values
0 missing
alt.take.anominal2 unique values
0 missing
alt.nownominal2 unique values
0 missing
alt.send.thisnominal2 unique values
0 missing
alt.linksnominal2 unique values
0 missing
alt.email.menominal2 unique values
0 missing
alt.logonominal2 unique values
0 missing
caption.mynominal2 unique values
0 missing

19 properties

2000
Number of instances (rows) of the dataset.
101
Number of attributes (columns) of the dataset.
2
Number of distinct values of the target attribute (if it is nominal).
0
Number of missing values in the dataset.
0
Number of instances with at least one value missing.
0
Number of numeric attributes.
101
Number of nominal attributes.
0
Percentage of missing values.
0.76
Average class difference between consecutive instances.
0
Percentage of numeric attributes.
0.05
Number of attributes divided by the number of instances.
100
Percentage of nominal attributes.
86
Percentage of instances belonging to the most frequent class.
1720
Number of instances belonging to the most frequent class.
14
Percentage of instances belonging to the least frequent class.
280
Number of instances belonging to the least frequent class.
101
Number of binary attributes.
100
Percentage of binary attributes.
0
Percentage of instances having missing values.

0 tasks

Define a new task