Data
Internet-Advertisements_seed_0_nrows_2000_nclasses_10_ncols_100_stratify_True

Internet-Advertisements_seed_0_nrows_2000_nclasses_10_ncols_100_stratify_True

active ARFF Publicly available Visibility: public Uploaded 17-11-2022 by Eddie Bergman
0 likes downloaded by 0 people , 0 total downloads 0 issues 0 downvotes
Issue #Downvotes for this reason By


Loading wiki
Help us complete this description Edit
Subsampling of the dataset Internet-Advertisements (40978) with seed=0 args.nrows=2000 args.ncols=100 args.nclasses=10 args.no_stratify=True Generated with the following source code: ```python def subsample( self, seed: int, nrows_max: int = 2_000, ncols_max: int = 100, nclasses_max: int = 10, stratified: bool = True, ) -> Dataset: rng = np.random.default_rng(seed) x = self.x y = self.y # Uniformly sample classes = y.unique() if len(classes) > nclasses_max: vcs = y.value_counts() selected_classes = rng.choice( classes, size=nclasses_max, replace=False, p=vcs / sum(vcs), ) # Select the indices where one of these classes is present idxs = y.index[y.isin(classes)] x = x.iloc[idxs] y = y.iloc[idxs] # Uniformly sample columns if required if len(x.columns) > ncols_max: columns_idxs = rng.choice( list(range(len(x.columns))), size=ncols_max, replace=False ) sorted_column_idxs = sorted(columns_idxs) selected_columns = list(x.columns[sorted_column_idxs]) x = x[selected_columns] else: sorted_column_idxs = list(range(len(x.columns))) if len(x) > nrows_max: # Stratify accordingly target_name = y.name data = pd.concat((x, y), axis="columns") _, subset = train_test_split( data, test_size=nrows_max, stratify=data[target_name], shuffle=True, random_state=seed, ) x = subset.drop(target_name, axis="columns") y = subset[target_name] # We need to convert categorical columns to string for openml categorical_mask = [self.categorical_mask[i] for i in sorted_column_idxs] columns = list(x.columns) return Dataset( # Technically this is not the same but it's where it was derived from dataset=self.dataset, x=x, y=y, categorical_mask=categorical_mask, columns=columns, ) ```

101 features

class (target)nominal2 unique values
0 missing
url.images.buttonsnominal2 unique values
0 missing
url.osonominal2 unique values
0 missing
url.tkaine.katsnominal2 unique values
0 missing
url.clawnext.gifnominal2 unique values
0 missing
url.area51nominal2 unique values
0 missing
url.carousel.orgnominal2 unique values
0 missing
url.www.yahoo.co.uknominal2 unique values
0 missing
url.ads.switchboard.comnominal2 unique values
0 missing
url.home.gifnominal2 unique values
0 missing
url.cjacksonnominal2 unique values
0 missing
url.labyrinth.9439nominal2 unique values
0 missing
url.homenominal2 unique values
0 missing
url.geoguideii.emailnominal2 unique values
0 missing
url.www.afn.orgnominal2 unique values
0 missing
url.iconsnominal2 unique values
0 missing
url.pages.bnominal2 unique values
0 missing
url.geoguideii.sendnominal2 unique values
0 missing
url.paul.spu.edunominal2 unique values
0 missing
url.hydrogeologist.imgsnominal2 unique values
0 missing
url.counternominal2 unique values
0 missing
url.ienominal2 unique values
0 missing
url.gcnominal2 unique values
0 missing
url.rannominal2 unique values
0 missing
url.ukonline.co.uknominal2 unique values
0 missing
url.inwap.comnominal2 unique values
0 missing
url.pixnominal2 unique values
0 missing
origurl.meadows.9196nominal2 unique values
0 missing
origurl.chapelnominal2 unique values
0 missing
origurl.junnominal2 unique values
0 missing
origurl.pharmacynominal2 unique values
0 missing
origurl.kattsidanominal2 unique values
0 missing
origurl.pages.catscatsnominal2 unique values
0 missing
origurl.carousel.orgnominal2 unique values
0 missing
origurl.soxnominal2 unique values
0 missing
origurl.leonoranominal2 unique values
0 missing
origurl.www.carousel.orgnominal2 unique values
0 missing
origurl.unc.edunominal2 unique values
0 missing
origurl.bordeauxnominal2 unique values
0 missing
origurl.sonic.netnominal2 unique values
0 missing
origurl.meadows.3727nominal2 unique values
0 missing
origurl.7735.mcpnominal2 unique values
0 missing
origurl.20.timeoutnominal2 unique values
0 missing
origurl.woscnominal2 unique values
0 missing
origurl.general.kushmericknominal2 unique values
0 missing
origurl.cybermognominal2 unique values
0 missing
origurl.peacenominal2 unique values
0 missing
origurl.polypkemnominal2 unique values
0 missing
origurl.malek.kandinnominal2 unique values
0 missing
origurl.forum.8078nominal2 unique values
0 missing
origurl.kbellnominal2 unique values
0 missing
origurl.alley.6750nominal2 unique values
0 missing
origurl.mogwhi.htmnominal2 unique values
0 missing
origurl.3727.turbonominal2 unique values
0 missing
origurl.msheryl.djangnominal2 unique values
0 missing
origurl.shtmlnominal2 unique values
0 missing
origurl.www.geocities.comnominal2 unique values
0 missing
origurl.squarenominal2 unique values
0 missing
origurl.padnominal2 unique values
0 missing
origurl.www.pacific.net.sgnominal2 unique values
0 missing
origurl.2539.indexnominal2 unique values
0 missing
ancurl.redirectnominal2 unique values
0 missing
ancurl.josefina3nominal2 unique values
0 missing
ancurl.jumpnominal2 unique values
0 missing
ancurl.relocate.adnominal2 unique values
0 missing
ancurl.468x60nominal2 unique values
0 missing
ancurl.n.anominal2 unique values
0 missing
ancurl.cjackson.kandinskynominal2 unique values
0 missing
ancurl.adclick.exenominal2 unique values
0 missing
ancurl.www.mei.co.jpnominal2 unique values
0 missing
ancurl.www.lycos.co.uknominal2 unique values
0 missing
ancurl.www.ran.orgnominal2 unique values
0 missing
ancurl.d.ukienominal2 unique values
0 missing
ancurl.forumsnominal2 unique values
0 missing
ancurl.namenominal2 unique values
0 missing
ancurl.1.dnominal2 unique values
0 missing
ancurl.geoguide.tournominal2 unique values
0 missing
ancurl.accessus.netnominal2 unique values
0 missing
ancurl.amazon.comnominal2 unique values
0 missing
ancurl.www.pacificrim.netnominal2 unique values
0 missing
ancurl.click.profileidnominal2 unique values
0 missing
ancurl.emailmenominal2 unique values
0 missing
ancurl.www.carousel.orgnominal2 unique values
0 missing
ancurl.adcountnominal2 unique values
0 missing
ancurl.dejaynominal2 unique values
0 missing
ancurl.areanominal2 unique values
0 missing
ancurl.www.theinternetadvantage.comnominal2 unique values
0 missing
ancurl.aspnominal2 unique values
0 missing
ancurl.ansnominal2 unique values
0 missing
ancurl.slagennominal2 unique values
0 missing
ancurl.plnominal2 unique values
0 missing
ancurl.dumblenominal2 unique values
0 missing
ancurl.ora.comnominal2 unique values
0 missing
ancurl.heartlandnominal2 unique values
0 missing
alt.andnominal2 unique values
0 missing
alt.ournominal2 unique values
0 missing
alt.morenominal2 unique values
0 missing
alt.atnominal2 unique values
0 missing
alt.bannernominal2 unique values
0 missing
alt.freenominal2 unique values
0 missing
alt.the.katnominal2 unique values
0 missing

19 properties

2000
Number of instances (rows) of the dataset.
101
Number of attributes (columns) of the dataset.
2
Number of distinct values of the target attribute (if it is nominal).
0
Number of missing values in the dataset.
0
Number of instances with at least one value missing.
0
Number of numeric attributes.
101
Number of nominal attributes.
100
Percentage of binary attributes.
0
Percentage of instances having missing values.
0
Percentage of missing values.
0.76
Average class difference between consecutive instances.
0
Percentage of numeric attributes.
0.05
Number of attributes divided by the number of instances.
100
Percentage of nominal attributes.
86
Percentage of instances belonging to the most frequent class.
1720
Number of instances belonging to the most frequent class.
14
Percentage of instances belonging to the least frequent class.
280
Number of instances belonging to the least frequent class.
101
Number of binary attributes.

0 tasks

Define a new task