Data
Internet-Advertisements_seed_4_nrows_2000_nclasses_10_ncols_100_stratify_True

Internet-Advertisements_seed_4_nrows_2000_nclasses_10_ncols_100_stratify_True

active ARFF Publicly available Visibility: public Uploaded 17-11-2022 by Eddie Bergman
0 likes downloaded by 0 people , 0 total downloads 0 issues 0 downvotes
Issue #Downvotes for this reason By


Loading wiki
Help us complete this description Edit
Subsampling of the dataset Internet-Advertisements (40978) with seed=4 args.nrows=2000 args.ncols=100 args.nclasses=10 args.no_stratify=True Generated with the following source code: ```python def subsample( self, seed: int, nrows_max: int = 2_000, ncols_max: int = 100, nclasses_max: int = 10, stratified: bool = True, ) -> Dataset: rng = np.random.default_rng(seed) x = self.x y = self.y # Uniformly sample classes = y.unique() if len(classes) > nclasses_max: vcs = y.value_counts() selected_classes = rng.choice( classes, size=nclasses_max, replace=False, p=vcs / sum(vcs), ) # Select the indices where one of these classes is present idxs = y.index[y.isin(classes)] x = x.iloc[idxs] y = y.iloc[idxs] # Uniformly sample columns if required if len(x.columns) > ncols_max: columns_idxs = rng.choice( list(range(len(x.columns))), size=ncols_max, replace=False ) sorted_column_idxs = sorted(columns_idxs) selected_columns = list(x.columns[sorted_column_idxs]) x = x[selected_columns] else: sorted_column_idxs = list(range(len(x.columns))) if len(x) > nrows_max: # Stratify accordingly target_name = y.name data = pd.concat((x, y), axis="columns") _, subset = train_test_split( data, test_size=nrows_max, stratify=data[target_name], shuffle=True, random_state=seed, ) x = subset.drop(target_name, axis="columns") y = subset[target_name] # We need to convert categorical columns to string for openml categorical_mask = [self.categorical_mask[i] for i in sorted_column_idxs] columns = list(x.columns) return Dataset( # Technically this is not the same but it's where it was derived from dataset=self.dataset, x=x, y=y, categorical_mask=categorical_mask, columns=columns, ) ```

101 features

class (target)nominal2 unique values
0 missing
url.pool.imagesnominal2 unique values
0 missing
url.infoserver.etl.vt.edunominal2 unique values
0 missing
url.charlienominal2 unique values
0 missing
url.derivednominal2 unique values
0 missing
url.homenominal2 unique values
0 missing
url.www.ran.orgnominal2 unique values
0 missing
url.sjsu.edunominal2 unique values
0 missing
url.granominal2 unique values
0 missing
url.www.express.scripts.comnominal2 unique values
0 missing
url.www.finest.tm.frnominal2 unique values
0 missing
url.usersnominal2 unique values
0 missing
url.geoguideii.sendnominal2 unique values
0 missing
url.w.gifnominal2 unique values
0 missing
url.aol.comnominal2 unique values
0 missing
url.ballnominal2 unique values
0 missing
url.logo.bnominal2 unique values
0 missing
url.static.wired.comnominal2 unique values
0 missing
url.runofsite.anynominal2 unique values
0 missing
url.esi.imagenominal2 unique values
0 missing
url.baonsnominal2 unique values
0 missing
url.poolnominal2 unique values
0 missing
url.topnominal2 unique values
0 missing
url.rannominal2 unique values
0 missing
url.sunsetstripnominal2 unique values
0 missing
url.geoguideiinominal2 unique values
0 missing
origurl.alleynominal2 unique values
0 missing
origurl.pointenominal2 unique values
0 missing
origurl.kattsidanominal2 unique values
0 missing
origurl.sunsite.unc.edunominal2 unique values
0 missing
origurl.ubc.canominal2 unique values
0 missing
origurl.platonominal2 unique values
0 missing
origurl.www.likesbooks.comnominal2 unique values
0 missing
origurl.vault.3440nominal2 unique values
0 missing
origurl.searchnominal2 unique values
0 missing
origurl.biopic.htmnominal2 unique values
0 missing
origurl.newsnominal2 unique values
0 missing
origurl.larx.comnominal2 unique values
0 missing
origurl.paper.1998nominal2 unique values
0 missing
origurl.keithnominal2 unique values
0 missing
origurl.woscnominal2 unique values
0 missing
origurl.mindlink.netnominal2 unique values
0 missing
origurl.kerouac.htmnominal2 unique values
0 missing
origurl.martnet.comnominal2 unique values
0 missing
origurl.msherylnominal2 unique values
0 missing
origurl.vaultnominal2 unique values
0 missing
origurl.polypkem.indexnominal2 unique values
0 missing
origurl.8078.homenominal2 unique values
0 missing
origurl.library.pitcairnnominal2 unique values
0 missing
origurl.athensnominal2 unique values
0 missing
origurl.yosemitenominal2 unique values
0 missing
origurl.bordeaux.actupnominal2 unique values
0 missing
origurl.www.wednet.comnominal2 unique values
0 missing
origurl.www.access.chnominal2 unique values
0 missing
origurl.www.express.scripts.comnominal2 unique values
0 missing
origurl.arvann.pagesnominal2 unique values
0 missing
origurl.footballnominal2 unique values
0 missing
origurl.hollywood.9662nominal2 unique values
0 missing
origurl.truluck.comnominal2 unique values
0 missing
origurl.6712.catsnominal2 unique values
0 missing
origurl.fr.bordeauxnominal2 unique values
0 missing
ancurl.truluck.comnominal2 unique values
0 missing
ancurl.home.htmnominal2 unique values
0 missing
ancurl.csuhayward.edunominal2 unique values
0 missing
ancurl.newsnominal2 unique values
0 missing
ancurl.www.ibitexas.comnominal2 unique values
0 missing
ancurl.ng.spacedescnominal2 unique values
0 missing
ancurl.yahoonominal2 unique values
0 missing
ancurl.www.inwap.comnominal2 unique values
0 missing
ancurl.4.memnominal2 unique values
0 missing
ancurl.linksnominal2 unique values
0 missing
ancurl.exenominal2 unique values
0 missing
ancurl.bin.accnominal2 unique values
0 missing
ancurl.link.picsnominal2 unique values
0 missing
ancurl.ads.redirectnominal2 unique values
0 missing
ancurl.www.readersndexnominal2 unique values
0 missing
ancurl.www.theinternetadvantage.comnominal2 unique values
0 missing
ancurl.artnominal2 unique values
0 missing
ancurl.april.foolsnominal2 unique values
0 missing
ancurl.uknominal2 unique values
0 missing
ancurl.stnominal2 unique values
0 missing
ancurl.www.2meta.comnominal2 unique values
0 missing
ancurl.midnightnominal2 unique values
0 missing
ancurl.hotwired.comnominal2 unique values
0 missing
ancurl.mcet.edunominal2 unique values
0 missing
ancurl.clawringnominal2 unique values
0 missing
ancurl.www.thejeep.comnominal2 unique values
0 missing
ancurl.servicesnominal2 unique values
0 missing
ancurl.type.clicknominal2 unique values
0 missing
alt.andnominal2 unique values
0 missing
alt.allnominal2 unique values
0 missing
alt.ournominal2 unique values
0 missing
alt.visit.ournominal2 unique values
0 missing
alt.romancenominal2 unique values
0 missing
alt.newnominal2 unique values
0 missing
alt.amazonnominal2 unique values
0 missing
alt.home.pagenominal2 unique values
0 missing
alt.my.guestbooknominal2 unique values
0 missing
alt.graphicnominal2 unique values
0 missing
caption.here.fornominal2 unique values
0 missing
caption.clicknominal2 unique values
0 missing

19 properties

2000
Number of instances (rows) of the dataset.
101
Number of attributes (columns) of the dataset.
2
Number of distinct values of the target attribute (if it is nominal).
0
Number of missing values in the dataset.
0
Number of instances with at least one value missing.
0
Number of numeric attributes.
101
Number of nominal attributes.
280
Number of instances belonging to the least frequent class.
101
Number of binary attributes.
100
Percentage of binary attributes.
0
Percentage of instances having missing values.
0
Percentage of missing values.
0.75
Average class difference between consecutive instances.
0
Percentage of numeric attributes.
0.05
Number of attributes divided by the number of instances.
100
Percentage of nominal attributes.
86
Percentage of instances belonging to the most frequent class.
1720
Number of instances belonging to the most frequent class.
14
Percentage of instances belonging to the least frequent class.

0 tasks

Define a new task