Data
AfriSenti

AfriSenti

active ARFF Publicly available Visibility: public Uploaded 16-05-2023 by Idris Abdulmumin
0 likes downloaded by 0 people , 0 total downloads 0 issues 0 downvotes
Issue #Downvotes for this reason By


Loading wiki
Help us complete this description Edit
We introduce AfriSenti, which consists of 14 sentiment datasets of 110,000+ tweets in 14 African languages (Amharic, Algerian Arabic, Hausa, Igbo, Kinyarwanda, Moroccan Arabic, Mozambican Portuguese, Nigerian Pidgin, Oromo, Swahili, Tigrinya, Twi, Xitsonga, and \yoruba) from four language families annotated by native speakers. The data was used in SemEval 2023 Task 12, the first Afro-centric SemEval shared task. We hope AfriSenti enables new work on under-represented languages. The dataset is available at https://github.com/afrisenti-semeval/afrisent-semeval-2023.

4 features

label (target)string3 unique values
0 missing
languagenominal14 unique values
0 missing
splitnominal3 unique values
0 missing
textstring107887 unique values
0 missing

19 properties

111720
Number of instances (rows) of the dataset.
4
Number of attributes (columns) of the dataset.
3
Number of distinct values of the target attribute (if it is nominal).
0
Number of missing values in the dataset.
0
Number of instances with at least one value missing.
0
Number of numeric attributes.
2
Number of nominal attributes.
0
Percentage of binary attributes.
0
Percentage of instances having missing values.
1
Average class difference between consecutive instances.
0
Percentage of missing values.
0
Number of attributes divided by the number of instances.
0
Percentage of numeric attributes.
35.04
Percentage of instances belonging to the most frequent class.
50
Percentage of nominal attributes.
39151
Number of instances belonging to the most frequent class.
32.23
Percentage of instances belonging to the least frequent class.
36005
Number of instances belonging to the least frequent class.
0
Number of binary attributes.

0 tasks

Define a new task