Data
Phishing_Email_Dataset

Phishing_Email_Dataset

active ARFF Attribution-ShareAlike (CC BY-SA) Visibility: public Uploaded 31-05-2024 by Iwo Godzwon
0 likes downloaded by 0 people , 0 total downloads 0 issues 0 downvotes
Issue #Downvotes for this reason By


Loading wiki
Help us complete this description Edit
### Description: The dataset named "phishing_email.csv" comprises email contents that have been classified into phishing or legitimate categories. Each row in the dataset is an email entry, containing two fields: `text_combined` and `label`. The `text_combined` field holds the entire content of an email, which may include the subject, body, and any embedded URLs, while the `label` fields classify the email as phishing (`1`) or legitimate (`0`). ### Attribute Description: - text_combined: Contains a comprehensive dumped text of an email, amalgamating the subject, the body, and possibly URLs. The text is unstructured, potentially lengthy, and may exhibit a wide range of natural language features, including informal language, technical terminology, and various linguistic structures. Examples of content range from technical support emails, linguistic textbook descriptions, corporate summaries regarding energy market negotiations, to phishing schemes pretending to offer financial opportunities. - label: A binary indicator with `1` representing a phishing email and `0` signifying a legitimate email. This classification serves as the dataset's target variable for predictive modeling tasks aimed at identifying phishing attempts. ### Use Case: This dataset can significantly contribute to cybersecurity efforts, particularly in developing machine learning models capable of detecting and filtering phishing attempts from legitimate email communications. Researchers and developers can leverage the rich, varied content of the emails to train models that understand the nuances and patterns indicative of phishing. Additionally, linguistic analysts may find the dataset beneficial for studying language use in fraudulent versus legitimate emails, potentially uncovering linguistic markers that are characteristic of phishing attempts. Moreover, organizations focused on strengthening their email security protocols can use insights derived from this dataset to better educate their employees on recognizing and handling suspicious emails, ultimately reducing the risk of phishing attacks.

2 features

text_combinedstring82078 unique values
0 missing
labelnominal2 unique values
0 missing

19 properties

82486
Number of instances (rows) of the dataset.
2
Number of attributes (columns) of the dataset.
Number of distinct values of the target attribute (if it is nominal).
0
Number of missing values in the dataset.
0
Number of instances with at least one value missing.
0
Number of numeric attributes.
1
Number of nominal attributes.
Percentage of instances belonging to the most frequent class.
50
Percentage of nominal attributes.
Number of instances belonging to the most frequent class.
Percentage of instances belonging to the least frequent class.
Number of instances belonging to the least frequent class.
1
Number of binary attributes.
50
Percentage of binary attributes.
0
Percentage of instances having missing values.
Average class difference between consecutive instances.
0
Percentage of missing values.
0
Number of attributes divided by the number of instances.
0
Percentage of numeric attributes.

0 tasks

Define a new task