عنوان مقاله فارسی: یک طرح امتیازدهی برای انتخاب ویژگی های آنلاین: شبیه سازی عملکرد مدل بدون آموزش مجدد
عنوان مقاله لاتین: A Scoring Scheme for Online Feature Selection: Simulating Model Performance Without Retraining
نویسندگان: Debarka Sengupta; Sanghamitra Bandyopadhyay; Debajyoti Sinha
تعداد صفحات: 9
سال انتشار: 2017
زبان: لاتین
Abstract:
Increasing the number of features increases the complexity of a model even if the additional feature does not improve its decision-making capacity. Irrelevant features may also cause overfitting and reduce interpretability of the concerned model. It is, therefore, important that the features are optimally selected before a model is built. In the case of online learning, new instances are periodically discovered, and the respective model is tactically retrained as required. Similarly, there are many real-life situations where hundreds of new features are discovered periodically, and the existing model needs to be retrained or tested for its performance improvement. Supervised selection of feature subset usually requires creation of multiple suboptimal models, thus incurring time-intensive computations. Unsupervised selections, although faster, largely rely on some subjective definition of feature relevance. In this paper, we introduce a score that accurately determines the importance of the features. The proposed score is appropriate for online feature selection scenarios for its low time complexity and ability to interpret performance improvement of the current model after the addition of a new feature, without invoking a retraining.
a scoring scheme for online feature selection simulating model performance without retraining_1619528683_47942_4145_1395.zip2.21 MB |