Using the “IBM HR Analytics Employee Attrition & Performance” dataset from Kaggle, which offers a comprehensive look at various factors that might influence an employee's decision to leave the company. There are 16 numerical and 19 categorical variables in the dataset that encompass a range of factors from demographic details like age and gender to job-specific information such as role and travel frequency, alongside performance indicators. Usng R and Julia these two programming languange.
https://www.kaggle.com/datasets/pavansubhasht/ibm-hr-analytics-attrition-dataset/data
The business question is based on IBM's employee information records to analyze the factorsmost affecting employee attrition. This project aim to discover the relationship between employee’s personal information and performance records with their attrition status. This analysis can helpus get strategies to enhance understanding of employee performances, and potentially reduce overall attrition rates. IBM Hr could leverage this dataset to discover the patterns and insights of employee attrition. We plan to use logistic regression, a statistical method(Logistic regression, Classification---KNN Classifier, LDA, Neural Network Classifier, Multinomial Classifier Models) that estimates the probability of a binary outcome based on one or more predictor variables, to find the variables that most affect employee attrition. Besides, classification models will group employees into 'attrited' or 'not attrited' categories based on their characteristics.