Data
Internet-Advertisements_seed_3_nrows_2000_nclasses_10_ncols_100_stratify_True

Internet-Advertisements_seed_3_nrows_2000_nclasses_10_ncols_100_stratify_True

active ARFF Publicly available Visibility: public Uploaded 17-11-2022 by Eddie Bergman
0 likes downloaded by 0 people , 0 total downloads 0 issues 0 downvotes
Issue #Downvotes for this reason By


Loading wiki
Help us complete this description Edit
Subsampling of the dataset Internet-Advertisements (40978) with seed=3 args.nrows=2000 args.ncols=100 args.nclasses=10 args.no_stratify=True Generated with the following source code: ```python def subsample( self, seed: int, nrows_max: int = 2_000, ncols_max: int = 100, nclasses_max: int = 10, stratified: bool = True, ) -> Dataset: rng = np.random.default_rng(seed) x = self.x y = self.y # Uniformly sample classes = y.unique() if len(classes) > nclasses_max: vcs = y.value_counts() selected_classes = rng.choice( classes, size=nclasses_max, replace=False, p=vcs / sum(vcs), ) # Select the indices where one of these classes is present idxs = y.index[y.isin(classes)] x = x.iloc[idxs] y = y.iloc[idxs] # Uniformly sample columns if required if len(x.columns) > ncols_max: columns_idxs = rng.choice( list(range(len(x.columns))), size=ncols_max, replace=False ) sorted_column_idxs = sorted(columns_idxs) selected_columns = list(x.columns[sorted_column_idxs]) x = x[selected_columns] else: sorted_column_idxs = list(range(len(x.columns))) if len(x) > nrows_max: # Stratify accordingly target_name = y.name data = pd.concat((x, y), axis="columns") _, subset = train_test_split( data, test_size=nrows_max, stratify=data[target_name], shuffle=True, random_state=seed, ) x = subset.drop(target_name, axis="columns") y = subset[target_name] # We need to convert categorical columns to string for openml categorical_mask = [self.categorical_mask[i] for i in sorted_column_idxs] columns = list(x.columns) return Dataset( # Technically this is not the same but it's where it was derived from dataset=self.dataset, x=x, y=y, categorical_mask=categorical_mask, columns=columns, ) ```

101 features

class (target)nominal2 unique values
0 missing
arationumeric553 unique values
0 missing
url.hydrogeologistnominal2 unique values
0 missing
url.www.FlowSoft.comnominal2 unique values
0 missing
url.csuhayward.edunominal2 unique values
0 missing
url.romancebooks.pixnominal2 unique values
0 missing
url.images.geoguideiinominal2 unique values
0 missing
url.library.pitcairnnominal2 unique values
0 missing
url.pawbutton.gifnominal2 unique values
0 missing
url.geoguideii.pagesnominal2 unique values
0 missing
url.users.aol.comnominal2 unique values
0 missing
url.www.martnet.comnominal2 unique values
0 missing
url.polypkemnominal2 unique values
0 missing
url.gifsnominal2 unique values
0 missing
url.geoguideii.sendnominal2 unique values
0 missing
url.ranknominal2 unique values
0 missing
url.aol.comnominal2 unique values
0 missing
url.go2net.adsnominal2 unique values
0 missing
url.www.mcs.csuhayward.edunominal2 unique values
0 missing
url.tkainenominal2 unique values
0 missing
url.imagenominal2 unique values
0 missing
url.heartland.valleynominal2 unique values
0 missing
url.buttonnominal2 unique values
0 missing
url.linknominal2 unique values
0 missing
url.nhnominal2 unique values
0 missing
url.bishop.rednominal2 unique values
0 missing
url.alley.6750nominal2 unique values
0 missing
url.gort.ucsd.edunominal2 unique values
0 missing
url.inwap.comnominal2 unique values
0 missing
url.www.cob.sjsu.edunominal2 unique values
0 missing
url.smallnominal2 unique values
0 missing
origurl.target..ionnominal2 unique values
0 missing
origurl.0.rppnominal2 unique values
0 missing
origurl.users.aol.comnominal2 unique values
0 missing
origurl.lemoyne.edunominal2 unique values
0 missing
origurl.www.monmouth.comnominal2 unique values
0 missing
origurl.carousel.orgnominal2 unique values
0 missing
origurl.labyrinth.9439nominal2 unique values
0 missing
origurl.www.icsi.comnominal2 unique values
0 missing
origurl.victoria.pharmacynominal2 unique values
0 missing
origurl.www.toyotaofroswell.comnominal2 unique values
0 missing
origurl.messiernominal2 unique values
0 missing
origurl.heartland.5309nominal2 unique values
0 missing
origurl.home.netscape.comnominal2 unique values
0 missing
origurl.icsnominal2 unique values
0 missing
origurl.4010.indexnominal2 unique values
0 missing
origurl.inwap.comnominal2 unique values
0 missing
origurl.www.mei.co.jpnominal2 unique values
0 missing
origurl.kerouac.htmnominal2 unique values
0 missing
origurl.heartland.meadowsnominal2 unique values
0 missing
origurl.lovisa1nominal2 unique values
0 missing
origurl.athensnominal2 unique values
0 missing
origurl.123greetings.comnominal2 unique values
0 missing
origurl.pad.htmnominal2 unique values
0 missing
origurl.zueri.chnominal2 unique values
0 missing
origurl.www.MeissnerChevrolet.autotown.comnominal2 unique values
0 missing
origurl.fr.bordeauxnominal2 unique values
0 missing
origurl.paws.padnominal2 unique values
0 missing
origurl.padnominal2 unique values
0 missing
ancurl.clawring.htmnominal2 unique values
0 missing
ancurl.redirectnominal2 unique values
0 missing
ancurl.cnetnominal2 unique values
0 missing
ancurl.picsnominal2 unique values
0 missing
ancurl.www.FlowSoft.comnominal2 unique values
0 missing
ancurl.www.monmouth.comnominal2 unique values
0 missing
ancurl.heartland.pointenominal2 unique values
0 missing
ancurl.redirnominal2 unique values
0 missing
ancurl.pacific.net.sgnominal2 unique values
0 missing
ancurl.adclicknominal2 unique values
0 missing
ancurl.any.timenominal2 unique values
0 missing
ancurl.doubleclick.netnominal2 unique values
0 missing
ancurl.ukienominal2 unique values
0 missing
ancurl.timenominal2 unique values
0 missing
ancurl.image.httpnominal2 unique values
0 missing
ancurl.namenominal2 unique values
0 missing
ancurl.sjsu.edunominal2 unique values
0 missing
ancurl.geoguide.tournominal2 unique values
0 missing
ancurl.romancebooksnominal2 unique values
0 missing
ancurl.linksnominal2 unique values
0 missing
ancurl.athens.forumnominal2 unique values
0 missing
ancurl.autotown.comnominal2 unique values
0 missing
ancurl.esinominal2 unique values
0 missing
ancurl.catsnominal2 unique values
0 missing
ancurl.gallerynominal2 unique values
0 missing
ancurl.binnominal2 unique values
0 missing
ancurl.corridornominal2 unique values
0 missing
ancurl.pacificrim.netnominal2 unique values
0 missing
ancurl.www.news.observer.comnominal2 unique values
0 missing
ancurl.members.aol.comnominal2 unique values
0 missing
ancurl.ngnominal2 unique values
0 missing
ancurl.rndnominal2 unique values
0 missing
ancurl.com.homenominal2 unique values
0 missing
ancurl.plxnominal2 unique values
0 missing
ancurl.www.thejeep.comnominal2 unique values
0 missing
ancurl.www.sanjosesabercats.comnominal2 unique values
0 missing
ancurl.servicesnominal2 unique values
0 missing
ancurl.profileidnominal2 unique values
0 missing
alt.tonominal2 unique values
0 missing
alt.bynominal2 unique values
0 missing
alt.rank.mynominal2 unique values
0 missing
alt.fromnominal2 unique values
0 missing

19 properties

2000
Number of instances (rows) of the dataset.
101
Number of attributes (columns) of the dataset.
2
Number of distinct values of the target attribute (if it is nominal).
0
Number of missing values in the dataset.
0
Number of instances with at least one value missing.
1
Number of numeric attributes.
100
Number of nominal attributes.
1720
Number of instances belonging to the most frequent class.
14
Percentage of instances belonging to the least frequent class.
280
Number of instances belonging to the least frequent class.
100
Number of binary attributes.
99.01
Percentage of binary attributes.
0
Percentage of instances having missing values.
0.76
Average class difference between consecutive instances.
0
Percentage of missing values.
0.05
Number of attributes divided by the number of instances.
0.99
Percentage of numeric attributes.
86
Percentage of instances belonging to the most frequent class.
99.01
Percentage of nominal attributes.

0 tasks

Define a new task