OpenML

JavaScript is required to properly view the contents of this page!

Explore
- Data
- Task
- Flow
- Run
- Study
- Task type
- Measure
- People
Help
Blog
Contact
Please cite us

Employee-Turnover-at-TECHCO

active ARFF Attribution 4.0 International (CC BY 4.0) Visibility: public Uploaded 23-03-2022 by Onur Yildirim
0 likes downloaded by 0 people , 0 total downloads 0 issues 0 downvotes

Issue	#Downvotes for this reason	By

Loading wiki

Help us complete this description Edit

Context These are simulated data based on employee turnover data in a real technology company in India (we refer to this company by a pseudonym, 'TECHCO'). These data can be used to analyze drivers of turnover at TECHCO. The original dataset was analyzed in the paper Machine Learning for Pattern Discovery in Management Research (SSRN version here). This publicly offered dataset is simulated based on the original data for privacy considerations. Along with the accompanying Python Kaggle code and R Kaggle code, this dataset will help readers learn how to implement the ML techniques in the paper. The data and code demonstrate how ML can be useful for discovering nonlinear and interactive patterns between variables that may otherwise have gone unnoticed. Content This dataset includes 1,191 entry-level employees that were quasi-randomly deployed to any of TECHCOs nine geographically dispersed production centers in 2007. The data are structured as a panel with one observation for each month that an individual is employed at the company for up to 40 months. The data include 34,453 observations from 1,191 employees total; The dependent variable, Turnover, indicates whether the employee left or stayed during that time period. Objectives The objective in the original paper was to explore patterns in the data that would help us learn more about the drivers of employee turnover. Another objective could be to find the best predictive model to estimate when a specific employee will leave.

10 features

turnover (target)	string	2 unique values 0 missing
time	numeric	39 unique values 0 missing
training_score	numeric	1190 unique values 0 missing
logical_score	numeric	18 unique values 0 missing
verbal_score	numeric	25 unique values 0 missing
avg_literacy	numeric	1190 unique values 0 missing
location_age	numeric	21 unique values 0 missing
distance	numeric	1087 unique values 0 missing
similar_language	numeric	941 unique values 0 missing
is_male	numeric	2 unique values 0 missing
emp_id (ignore)	numeric	1191 unique values 0 missing