Data
Are-Two-Sentences-of-the-Same-Topic

Are-Two-Sentences-of-the-Same-Topic

active ARFF CC BY-NC-SA 4.0 Visibility: public Uploaded 24-03-2022 by Dustin Carrion
0 likes downloaded by 0 people , 0 total downloads 0 issues 0 downvotes
  • Computer Systems Transportation
Issue #Downvotes for this reason By


Loading wiki
Help us complete this description Edit
Do two sentences come from the same article? We randomly sampled sentences from across Wikipedia. Some sentences came from the same articles, others do not. Sentences from the Same Article These two sentences are from the same article. There were 2,788 housing units at an average density of 4 per squaremile (2/km). It is also home to the Oklahoma State Reformatory, located in Granite. So are these: Monument of the Judiciary Citadel of Salerno, near the Colle Bellara. The La Carnale Castle got his name from a medieval battle against the Arabs and is part of a sport complex (with pool, tennis courts and hockey). As are these: The idea of Haar measure is to take a sort of limit of as becomes smaller to make it additive on all pairs of disjoint compact sets, though it first has to be normalized so that the limit is not just infinity. When left and right Haar measures differ, the right measure is usually preferred as a prior distribution. Sentences from Different Articles These two sentences are from different articles: US Open womens doubles champion France Ranked world No. The average household size was 2.72 and the average family size was 3.19. As are these: The initial goal of the WordNet project was to build a lexical database that would be consistent with theories of human semantic memory developed in the late 1960s. Males had a median income of 25,625 versus 20,515 for females. These are also different: Meanwhile, Western foods which are rich in fat, salt, sugar, and refined starches are also imported into countries. According to the United States Census Bureau, the CDP has a total area of , of which is land and (3.61) is water. Disclaimer Please note, we attempted to remove any data sampled that includes controversial words or hate speech. However, such language is present in Wikipedia, so some such material may be present in this dataset. Due to the size of this dataset, it was not possible to have a human being audit each sentence.

3 features

id (ignore)numeric129156 unique values
0 missing
sent1string102604 unique values
0 missing
sent2string102694 unique values
0 missing
same_sourcenumeric2 unique values
0 missing

19 properties

129156
Number of instances (rows) of the dataset.
3
Number of attributes (columns) of the dataset.
Number of distinct values of the target attribute (if it is nominal).
0
Number of missing values in the dataset.
0
Number of instances with at least one value missing.
1
Number of numeric attributes.
0
Number of nominal attributes.
0
Number of attributes divided by the number of instances.
33.33
Percentage of numeric attributes.
Percentage of instances belonging to the most frequent class.
0
Percentage of nominal attributes.
Number of instances belonging to the most frequent class.
Percentage of instances belonging to the least frequent class.
Number of instances belonging to the least frequent class.
0
Number of binary attributes.
0
Percentage of binary attributes.
0
Percentage of instances having missing values.
Average class difference between consecutive instances.
0
Percentage of missing values.

0 tasks

Define a new task