Data
Internet-Advertisements_seed_1_nrows_2000_nclasses_10_ncols_100_stratify_True

Internet-Advertisements_seed_1_nrows_2000_nclasses_10_ncols_100_stratify_True

active ARFF Publicly available Visibility: public Uploaded 17-11-2022 by Eddie Bergman
0 likes downloaded by 0 people , 0 total downloads 0 issues 0 downvotes
Issue #Downvotes for this reason By


Loading wiki
Help us complete this description Edit
Subsampling of the dataset Internet-Advertisements (40978) with seed=1 args.nrows=2000 args.ncols=100 args.nclasses=10 args.no_stratify=True Generated with the following source code: ```python def subsample( self, seed: int, nrows_max: int = 2_000, ncols_max: int = 100, nclasses_max: int = 10, stratified: bool = True, ) -> Dataset: rng = np.random.default_rng(seed) x = self.x y = self.y # Uniformly sample classes = y.unique() if len(classes) > nclasses_max: vcs = y.value_counts() selected_classes = rng.choice( classes, size=nclasses_max, replace=False, p=vcs / sum(vcs), ) # Select the indices where one of these classes is present idxs = y.index[y.isin(classes)] x = x.iloc[idxs] y = y.iloc[idxs] # Uniformly sample columns if required if len(x.columns) > ncols_max: columns_idxs = rng.choice( list(range(len(x.columns))), size=ncols_max, replace=False ) sorted_column_idxs = sorted(columns_idxs) selected_columns = list(x.columns[sorted_column_idxs]) x = x[selected_columns] else: sorted_column_idxs = list(range(len(x.columns))) if len(x) > nrows_max: # Stratify accordingly target_name = y.name data = pd.concat((x, y), axis="columns") _, subset = train_test_split( data, test_size=nrows_max, stratify=data[target_name], shuffle=True, random_state=seed, ) x = subset.drop(target_name, axis="columns") y = subset[target_name] # We need to convert categorical columns to string for openml categorical_mask = [self.categorical_mask[i] for i in sorted_column_idxs] columns = list(x.columns) return Dataset( # Technically this is not the same but it's where it was derived from dataset=self.dataset, x=x, y=y, categorical_mask=categorical_mask, columns=columns, ) ```

101 features

class (target)nominal2 unique values
0 missing
url.meadows.9196nominal2 unique values
0 missing
url.image.navigatenominal2 unique values
0 missing
url.ibitexas.comnominal2 unique values
0 missing
url.keith.dumblenominal2 unique values
0 missing
url.cjackson.kandinskynominal2 unique values
0 missing
url.catringn.gifnominal2 unique values
0 missing
url.doubleclick.netnominal2 unique values
0 missing
url.victoria.pharmacynominal2 unique values
0 missing
url.geoguideii.pagesnominal2 unique values
0 missing
url.www.ireland.today.ienominal2 unique values
0 missing
url.grn.bullnominal2 unique values
0 missing
url.red.ballnominal2 unique values
0 missing
url.ads.msn.comnominal2 unique values
0 missing
url.vt.edunominal2 unique values
0 missing
url.www.larx.comnominal2 unique values
0 missing
url.autotown.comnominal2 unique values
0 missing
url.aboutnominal2 unique values
0 missing
url.static.wired.comnominal2 unique values
0 missing
url.slake.comnominal2 unique values
0 missing
url.valley.2539nominal2 unique values
0 missing
url.setnominal2 unique values
0 missing
url.ienominal2 unique values
0 missing
url.dumblenominal2 unique values
0 missing
url.kandinsky.tnominal2 unique values
0 missing
url.toyotaofroswell.comnominal2 unique values
0 missing
url.ixfolder.gifnominal2 unique values
0 missing
url.falcon.sonic.netnominal2 unique values
0 missing
url.pix.bnominal2 unique values
0 missing
url.www.thejeep.comnominal2 unique values
0 missing
url.graphicsnominal2 unique values
0 missing
origurl.www.innotts.co.uknominal2 unique values
0 missing
origurl..ion.0nominal2 unique values
0 missing
origurl.valley.2647nominal2 unique values
0 missing
origurl.platonominal2 unique values
0 missing
origurl.cgidir.dllnominal2 unique values
0 missing
origurl.valleynominal2 unique values
0 missing
origurl.ics.ikenobonominal2 unique values
0 missing
origurl.kushmericknominal2 unique values
0 missing
origurl.heartland.5309nominal2 unique values
0 missing
origurl.kandinskynominal2 unique values
0 missing
origurl.horn.htmlnominal2 unique values
0 missing
origurl.corridor.4590nominal2 unique values
0 missing
origurl.aprilfoolsdaynominal2 unique values
0 missing
origurl.express.scripts.comnominal2 unique values
0 missing
origurl.newsnominal2 unique values
0 missing
origurl.timesnominal2 unique values
0 missing
origurl.4010.indexnominal2 unique values
0 missing
origurl.hpenominal2 unique values
0 missing
origurl.kerouac.htmnominal2 unique values
0 missing
origurl.internauts.canominal2 unique values
0 missing
origurl.toyotaofroswell.comnominal2 unique values
0 missing
origurl.www.internauts.canominal2 unique values
0 missing
origurl.lovisa1nominal2 unique values
0 missing
origurl.tm.frnominal2 unique values
0 missing
origurl.bossintl.comnominal2 unique values
0 missing
origurl.kitty.andnominal2 unique values
0 missing
origurl.dreamn.comnominal2 unique values
0 missing
origurl.plainsnominal2 unique values
0 missing
origurl.storynominal2 unique values
0 missing
origurl.ikenobonominal2 unique values
0 missing
origurl.meadowsnominal2 unique values
0 missing
origurl.stinkynominal2 unique values
0 missing
origurl.mcp.htmnominal2 unique values
0 missing
origurl.homenominal2 unique values
0 missing
ancurl.jumpnominal2 unique values
0 missing
ancurl.FlowSoft.comnominal2 unique values
0 missing
ancurl.bridalinfonominal2 unique values
0 missing
ancurl.unc.edunominal2 unique values
0 missing
ancurl.forumsnominal2 unique values
0 missing
ancurl.e.htmlnominal2 unique values
0 missing
ancurl.excite.468x60nominal2 unique values
0 missing
ancurl.schwabnominal2 unique values
0 missing
ancurl.a.uknominal2 unique values
0 missing
ancurl.romancebooksnominal2 unique values
0 missing
ancurl.francoisnominal2 unique values
0 missing
ancurl.pratchettnominal2 unique values
0 missing
ancurl.descnominal2 unique values
0 missing
ancurl.forum.8078nominal2 unique values
0 missing
ancurl.netscape.comnominal2 unique values
0 missing
ancurl.readersndexnominal2 unique values
0 missing
ancurl.event.ngnominal2 unique values
0 missing
ancurl.www.geocities.comnominal2 unique values
0 missing
ancurl.spu.edunominal2 unique values
0 missing
ancurl.areanominal2 unique values
0 missing
ancurl.www.wco.comnominal2 unique values
0 missing
ancurl.topnominal2 unique values
0 missing
ancurl.www.news.observer.comnominal2 unique values
0 missing
ancurl.svnominal2 unique values
0 missing
ancurl.idnominal2 unique values
0 missing
ancurl.plainsnominal2 unique values
0 missing
ancurl.entrynominal2 unique values
0 missing
ancurl.clicknominal2 unique values
0 missing
ancurl.microsoft.comnominal2 unique values
0 missing
alt.visit.thenominal2 unique values
0 missing
alt.pages.likenominal2 unique values
0 missing
alt.bytesnominal2 unique values
0 missing
alt.clubnominal2 unique values
0 missing
alt.takenominal2 unique values
0 missing
alt.graphicnominal2 unique values
0 missing
caption.younominal2 unique values
0 missing

19 properties

2000
Number of instances (rows) of the dataset.
101
Number of attributes (columns) of the dataset.
2
Number of distinct values of the target attribute (if it is nominal).
0
Number of missing values in the dataset.
0
Number of instances with at least one value missing.
0
Number of numeric attributes.
101
Number of nominal attributes.
14
Percentage of instances belonging to the least frequent class.
280
Number of instances belonging to the least frequent class.
101
Number of binary attributes.
100
Percentage of binary attributes.
0
Percentage of instances having missing values.
0.76
Average class difference between consecutive instances.
0
Percentage of missing values.
0.05
Number of attributes divided by the number of instances.
0
Percentage of numeric attributes.
86
Percentage of instances belonging to the most frequent class.
100
Percentage of nominal attributes.
1720
Number of instances belonging to the most frequent class.

0 tasks

Define a new task