Data
Employee-Turnover-at-TECHCO

Employee-Turnover-at-TECHCO

active ARFF Attribution 4.0 International (CC BY 4.0) Visibility: public Uploaded 23-03-2022 by Onur Yildirim
0 likes downloaded by 0 people , 0 total downloads 0 issues 0 downvotes
  • Computer Systems Machine Learning
Issue #Downvotes for this reason By


Loading wiki
Help us complete this description Edit
Context These are simulated data based on employee turnover data in a real technology company in India (we refer to this company by a pseudonym, 'TECHCO'). These data can be used to analyze drivers of turnover at TECHCO. The original dataset was analyzed in the paper Machine Learning for Pattern Discovery in Management Research (SSRN version here). This publicly offered dataset is simulated based on the original data for privacy considerations. Along with the accompanying Python Kaggle code and R Kaggle code, this dataset will help readers learn how to implement the ML techniques in the paper. The data and code demonstrate how ML can be useful for discovering nonlinear and interactive patterns between variables that may otherwise have gone unnoticed. Content This dataset includes 1,191 entry-level employees that were quasi-randomly deployed to any of TECHCOs nine geographically dispersed production centers in 2007. The data are structured as a panel with one observation for each month that an individual is employed at the company for up to 40 months. The data include 34,453 observations from 1,191 employees total; The dependent variable, Turnover, indicates whether the employee left or stayed during that time period. Objectives The objective in the original paper was to explore patterns in the data that would help us learn more about the drivers of employee turnover. Another objective could be to find the best predictive model to estimate when a specific employee will leave.

10 features

turnover (target)string2 unique values
0 missing
timenumeric39 unique values
0 missing
training_scorenumeric1190 unique values
0 missing
logical_scorenumeric18 unique values
0 missing
verbal_scorenumeric25 unique values
0 missing
avg_literacynumeric1190 unique values
0 missing
location_agenumeric21 unique values
0 missing
distancenumeric1087 unique values
0 missing
similar_languagenumeric941 unique values
0 missing
is_malenumeric2 unique values
0 missing
emp_id (ignore)numeric1191 unique values
0 missing

19 properties

34452
Number of instances (rows) of the dataset.
10
Number of attributes (columns) of the dataset.
2
Number of distinct values of the target attribute (if it is nominal).
0
Number of missing values in the dataset.
0
Number of instances with at least one value missing.
9
Number of numeric attributes.
0
Number of nominal attributes.
0
Number of attributes divided by the number of instances.
90
Percentage of numeric attributes.
98.57
Percentage of instances belonging to the most frequent class.
0
Percentage of nominal attributes.
33958
Number of instances belonging to the most frequent class.
1.43
Percentage of instances belonging to the least frequent class.
494
Number of instances belonging to the least frequent class.
0
Number of binary attributes.
0
Percentage of binary attributes.
0
Percentage of instances having missing values.
1
Average class difference between consecutive instances.
0
Percentage of missing values.

0 tasks

Define a new task