Context
These are simulated data based on employee turnover data in a real technology company in India (we refer to this company by a pseudonym, 'TECHCO'). These data can be used to analyze drivers of turnover at TECHCO. The original dataset was analyzed in the paper Machine Learning for Pattern Discovery in Management Research (SSRN version here). This publicly offered dataset is simulated based on the original data for privacy considerations. Along with the accompanying Python Kaggle code and R Kaggle code, this dataset will help readers learn how to implement the ML techniques in the paper. The data and code demonstrate how ML can be useful for discovering nonlinear and interactive patterns between variables that may otherwise have gone unnoticed.
Content
This dataset includes 1,191 entry-level employees that were quasi-randomly deployed to any of TECHCOs nine geographically dispersed production centers in 2007. The data are structured as a panel with one observation for each month that an individual is employed at the company for up to 40 months. The data include 34,453 observations from 1,191 employees total; The dependent variable, Turnover, indicates whether the employee left or stayed during that time period.
Objectives
The objective in the original paper was to explore patterns in the data that would help us learn more about the drivers of employee turnover. Another objective could be to find the best predictive model to estimate when a specific employee will leave.