SAMPLE SIZE EFFECTS ON ML CLASSIFICATION ACCURACY

Authors

  • Roidar khan Author

Keywords:

Machine Learning, Classification Algorithms, Sample Size, Predictive Performance, Accuracy

Abstract

The performance of machine learning classification models is strongly influenced by training dataset size. This study analyzes how varying sample sizes affect five popular classifiers: Logistic Regression, Decision Tree, Random Forest, Support Vector Machine (SVM), and Naïve Bayes. Using simulated datasets from 50 to 5,000 samples, models were evaluated on Accuracy, Precision, Recall, and F1-score. Results show that all models improve with more data, but sensitivity to sample size differs. Logistic Regression and SVM perform consistently well across sizes, while Naïve Bayes excels even with limited data. Decision Trees are unstable with small datasets but improve notably with larger samples. Random Forests improve gradually, achieving competitive results at scale. These insights guide practitioners in choosing appropriate algorithms based on data availability, highlighting the need to match model complexity to dataset size for optimal performance.

Downloads

Published

2025-03-31