[en] Using data mining methods, this paper presents a new means of identifying freshmen's profiles likely to face major difficulties to complete their first academic year. We aim at early detection of potential failure using student data available at registration, i.e. school records and environmental factors, with a view to timely and efficient remediation and/or study reorientation. We adapt three data mining methods, namely random forest, logistic regression and artificial neural network algorithms. We design algorithms to increase the accuracy of the prediction when some classes are of major interest. These algorithms are context independent and can be used in different fields. They rely on a dynamic split of the observations into subclasses during the training process, so as to maximize an accuracy criterion. Four classes are so built: high risk of failure, risk of failure, expected success or high probability of success. Real data pertaining to undergraduates at the University of Liège (Belgium), illustrates our methodology. With our approach, we are now able to identify with a high rate of confidence (90%) a subset of 12.2% of students facing a very high risk of failure, almost the quadruple of those identified with a non-dynamic approach. By testing some confidence levels, our approach makes it possible to rank the students by levels of risk and a sensitivity analysis allows us to find out why some students are likely to encounter difficulties.
Disciplines :
Quantitative methods in economics & management
Author, co-author :
Hoffait, Anne-Sophie ; Université de Liège > HEC Liège : UER > Statistique appliquée à la gestion et à l'économie
Schyns, Michael ; Université de Liège > HEC Liège : UER > UER Opérations : Informatique de gestion
Language :
English
Title :
Early detection of university students with potential difficulties