2021-12: Diversity Enhanced Active Learning with Strictly Proper Scoring Rules
Date: Time: 12 Dec, 2021, Friday 10-11 am AEST
Title: Diversity Enhanced Active Learning with Strictly Proper Scoring Rules
Abstract: We study acquisition functions for active learning (AL) for text classification. The Expected Loss Reduction (ELR) method focuses on a Bayesian estimate of the reduction in classification error, recently updated with Mean Objective Cost of Uncertainty (MOCU). We convert the ELR framework to estimate the increase in (strictly proper) scores like log probability or negative mean square error, which we call Bayesian Estimate of Mean Proper Scores (BEMPS). We also prove convergence results borrowing techniques used with MOCU. In order to allow better experimentation with the new acquisition functions, we develop a complementary batch AL algorithm, which encourages diversity in the vector of expected changes in scores for unlabelled data. To allow high-performance text classifiers, we combine ensembling and dynamic validation set construction on pre-trained language models. Extensive experimental evaluation then explores how these different acquisition functions perform. The results show that the use of mean square error and log probability with BEMPS yields robust acquisition functions, which consistently outperform the others tested.
Presenter’s Bio: Dr Lan DU is currently a senior lecturer in the Faculty of Information Technology, Monash University, he is also the faculty’s director of postgraduate programs. His research interest lies in the joint area of machine learning and natural language processing and their applications, particularly in public health. He and his research team have been working closely with medical experts on developing cutting-edge NLP technologies for AI-enabled healthcare. He has published more than 60 research papers in top-tier conferences/journals, including NeurIPS, ICML, ACL, AAAI, IJCAI and TPMAI; and attracted millions of funds from both government and industries, including a Google-funded Natural Language understanding-focused project, 3 NHMRC grants, and an MRFF grant. He is also a member of the machine learning journal editorial board, a senior program committee member of AAAI and the technical program committee member for many conferences in both machine learning and NLP.