OpenML

JavaScript is required to properly view the contents of this page!

Explore
- Data
- Task
- Flow
- Run
- Study
- Task type
- Measure
- People
Help
Blog
Contact
Please cite us

Phishing_Email_Dataset

active ARFF Attribution-ShareAlike (CC BY-SA) Visibility: public Uploaded 31-05-2024 by Iwo Godzwon
0 likes downloaded by 0 people , 0 total downloads 0 issues 0 downvotes

Issue	#Downvotes for this reason	By

Loading wiki

Help us complete this description Edit

### Description: The dataset named "phishing_email.csv" comprises email contents that have been classified into phishing or legitimate categories. Each row in the dataset is an email entry, containing two fields: `text_combined` and `label`. The `text_combined` field holds the entire content of an email, which may include the subject, body, and any embedded URLs, while the `label` fields classify the email as phishing (`1`) or legitimate (`0`). ### Attribute Description: - text_combined: Contains a comprehensive dumped text of an email, amalgamating the subject, the body, and possibly URLs. The text is unstructured, potentially lengthy, and may exhibit a wide range of natural language features, including informal language, technical terminology, and various linguistic structures. Examples of content range from technical support emails, linguistic textbook descriptions, corporate summaries regarding energy market negotiations, to phishing schemes pretending to offer financial opportunities. - label: A binary indicator with `1` representing a phishing email and `0` signifying a legitimate email. This classification serves as the dataset's target variable for predictive modeling tasks aimed at identifying phishing attempts. ### Use Case: This dataset can significantly contribute to cybersecurity efforts, particularly in developing machine learning models capable of detecting and filtering phishing attempts from legitimate email communications. Researchers and developers can leverage the rich, varied content of the emails to train models that understand the nuances and patterns indicative of phishing. Additionally, linguistic analysts may find the dataset beneficial for studying language use in fraudulent versus legitimate emails, potentially uncovering linguistic markers that are characteristic of phishing attempts. Moreover, organizations focused on strengthening their email security protocols can use insights derived from this dataset to better educate their employees on recognizing and handling suspicious emails, ultimately reducing the risk of phishing attacks.