Notebook Nine | Repository

Ensemble Methods

Andrea Leone
University of Trento
January 2022


eXtreme Gradient Boosting Classifier


XGBoostis an award-winning regularizing gradient boosting framework that includes proportional leaf shrinking, automatic feature selection, and parallelised distributed computing. This implementation is set for multiclass classification using the softmax objective, outputting the predicted probability of each data point belonging to each class.

score board — XGBC

pipeline         accuracy  precision recall     cm_d

en_core_web_lg   .74225352 .73888784 .73142552  193 216 118
en_core_web_lg   .74126984 .73736670 .73244905  179 179 109  without outliers (pm=LOF)
en_core_web_lg   .75657894 .74321071 .73360243  116 164  65  without outliers (pm=IF)

en_core_web_trf  .62905500 .61890454 .60664763  182 197  67
en_core_web_trf  .63758389 .61466636 .60319474  129 195  56  without outliers (pm=LOF)
en_core_web_trf  .65838509 .63576423 .62916671   72 105  35  without outliers (pm=IF)


Decision Trees Classifier


score board — DTC

pipeline         accuracy  precision recall     cm_d

en_core_web_lg   .56478873 .55439725 .55301604  147 171  83
en_core_web_lg   .58412698 .57744174 .57580514  132 153  83  without outliers (pm=LOF)
en_core_web_lg   .63377192 .60823434 .60714087  101 140  48  without outliers (pm=IF)

en_core_web_trf  .45839210 .44928940 .44936768  135 122  68
en_core_web_trf  .44295302 .42325117 .42290729   87 133  44  without outliers (pm=LOF)
en_core_web_trf  .43788819 .42152348 .42188823   48  68  25  without outliers (pm=IF)


Random Forest Classifier


The goal of ensemble methods is to combine the predictions of several base estimators built with a given learning algorithm in order to improve generalisability / robustness over a single estimator.

score board — RFC

pipeline         accuracy  precision recall     cm_d

en_core_web_lg   .71971830 .71761997 .70200640  186 224 101
en_core_web_lg   .72222222 .72061624 .70609495  179 185  91  without outliers (pm=LOF)
en_core_web_lg   .74561403 .73590551 .71329278  118 166  56  without outliers (pm=IF)

en_core_web_trf  .62764456 .63577654 .60276742  180 205  60
en_core_web_trf  .58053691 .56517999 .53353591  121 192  33  without outliers (pm=LOF)
en_core_web_trf  .61801242 .58141088 .56230695   75 108  16  without outliers (pm=IF)