class: clear, center, middle background-image: url(images/stacking-icon.jpg) background-position: center background-size: cover <br><br><br> .font300.white[Model Stacking & AutoML] --- # Introduction .pull-left[ .center.bold.font120[Thoughts] - Original concept formalized by Leo Breiman [
<i class="ai ai-google-scholar faa-tada animated-hover "></i>
](http://statistics.berkeley.edu/sites/default/files/tech-reports/367.pdf) - Theoretically formalized in 2007 as ___Super Learners___ [
<i class="ai ai-google-scholar faa-tada animated-hover "></i>
](https://www.degruyter.com/view/j/sagmb.2007.6.issue-1/sagmb.2007.6.1.1309/sagmb.2007.6.1.1309.xml) where the authors... - proved that super learners will learn the optimal combination of supplied base learners and will typically perform as well as or better than any of the individual base learners. - Nearly all prediction competitions are won with super learners ] -- .pull-right[ .center.bold.font120[Overview] - Basic idea - Stacking existing models - Stacking a grid search - Auto machine learning search ] --- # Prereqs .red[
<i class="fas fa-hand-point-right faa-horizontal animated " style=" color:red;"></i>
code chunk 1] .scrollable90[ .pull-left[ .center.bold.font120[Packages] ```r library(recipes) library(h2o) h2o.init(max_mem_size = "5g") ``` ] .pull-right[ .center.bold.font120[Data] ```r # ames data ames <- AmesHousing::make_ames() # split data set.seed(123) split <- rsample::initial_split(ames, strata = "Sale_Price") ames_train <- rsample::training(split) ames_test <- rsample::testing(split) # make sure we have consistent categorical levels blueprint <- recipe(Sale_Price ~ ., data = ames_train) %>% step_other(all_nominal(), threshold = .005) # create training & test sets train_h2o <- prep(blueprint, training = ames_train, retain = TRUE) %>% juice() %>% as.h2o() test_h2o <- prep(blueprint, training = ames_train) %>% bake(new_data = ames_test) %>% as.h2o() # get names of response and features Y <- "Sale_Price" X <- setdiff(names(ames_train), Y) ``` ] ] --- class: clear, center, middle, inverse .font300.white[Basic Idea] --- # Common ensemble methods <br> .font110[ * Ensemble machine learning methods use multiple learning algorithms to obtain better predictive performance than could be obtained from any of the constituent learning algorithms. * Combining multiple predictors is not new - Bagging - Random forests - Gradient boosting * However, these ensemble approaches combine common weak base learning algorithms (i.e. decision trees) * Stacking, on the other hand, is designed to .bold.blue[ensemble a diverse group of strong learners.] ] --- # Super learner algorithm Part 1: Set up the ensemble - Specify a list of _L_ base learners (with a specific set of model parameters). - Specify a metalearning algorithm. Can be any one of the algorithms discussed in the previous chapters but most often is regularized regression. --- # Super learner algorithm .opacity[Part 1: Set up the ensemble] Part 2: Train the ensemble - Train each of the _L_ base learners on the training set. - Perform k-fold cross-validation on each of these learners and collect the cross-validated predicted values from each of the _L_ algorithms (must use the same k-folds for each base learner). These predicted values represent `\(p_1,…,p_L\)`. - The _N_ cross-validated predicted values from each of the _L_ algorithms can be combined to form a new `\(N \times L\)` matrix (represented by _Z_). This matrix, along with the original response vector (_y_), is called the “level-one” data. (N = number of rows in the training set.) <img src="images/stacking-equation.png" width="495" style="display: block; margin: auto;" /> - Train the metalearning algorithm on the level-one data ( `\(y=f(Z)\)` ). The “ensemble model” consists of the __L__ base learning models and the metalearning model, which can then be used to generate predictions on a test set. --- # Super learner algorithm .opacity[Part 1: Set up the ensemble] .opacity[Part 2: Train the ensemble] Part 3: Predict on new data - To generate ensemble predictions, first generate predictions from the base learners. - Feed those predictions into the metalearner to generate the ensemble prediction. <br><br><br><br> -- .center.bold.font90[_Stacking rarely does worse than selecting the single best base learner on the training data. .blue[The biggest gains are usually produced when stacking base learners that have high variability, and uncorrelated, predicted values.] The more similar the predicted values, the less advantage there is in stacking._] --- # Package implementation 📦 <br> .font110[ * [h2o](http://docs.h2o.ai/h2o/latest-stable/h2o-docs/data-science/stacked-ensembles.html): My go-to package for stacking and autoML. Provides three appraoches for model stacking. * [SuperLearner](https://github.com/ecpolley/SuperLearner): original implementation, works with `caret` and many other algorithm packages. Worth exploring. * [subsemble](https://github.com/ledell/subsemble): developed by Erin Ledell who now is one of the developers of __h2o__. Maintained for backward compatibility but not forward development * [caretEnsemble](https://github.com/zachmayer/caretEnsemble): implements a boostrapped (rather than cross-validated) version of stacking. The bootstrapped version will train faster since bootrapping (with a train/test) is a fraction of the work as k-fold cross-validation, however the the ensemble performance suffers as a result of this shortcut. ] --- class: clear, center, middle, inverse .font300.white[Stacking Existing Models] .white[_Train first, stack later_] --- # Stacking existing models .pull-left.font90[ Say we found the optimal hyperparameters that provided the best predictive accuracy for a: 1. Regularized regression base learner 2. Random forest base learner 3. Stochastic GBM base learner 4. XGBoost base learner To stack them later we need to do a few specific things: 1. All models must be trained on the same training set. 2. All models must be trained with the same number of CV folds. 3. All models must use the same fold assignment to ensure the same observations are used (`fold_assignment = "Modulo"`). 4. The cross-validated predictions from all of the models must be preserved (`keep_cross_validation_predictions = True`). ] --- # Stacking existing models .scrollable90[ .pull-left.font90[ Say we found the optimal hyperparameters that provided the best predictive accuracy for a: 1. Regularized regression base learner 2. Random forest base learner 3. Stochastic GBM base learner 4. XGBoost base learner To stack them later we need to do a few specific things: 1. All models must be trained on the same training set. 2. All models must be trained with the same number of CV folds. 3. All models must use the same fold assignment to ensure the same observations are used (`fold_assignment = "Modulo"`). 4. The cross-validated predictions from all of the models must be preserved (`keep_cross_validation_predictions = True`). ] .pull-right[ .center.bold.font90[
<i class="fas fa-exclamation-triangle faa-FALSE animated " style=" color:red;"></i>
This code takes ~12 min
<i class="fas fa-exclamation-triangle faa-FALSE animated " style=" color:red;"></i>
] ```r # Train & Cross-validate a GLM model best_glm <- h2o.glm( x = X, y = Y, training_frame = train_h2o, alpha = .1, remove_collinear_columns = TRUE, * nfolds = 10, * fold_assignment = "Modulo", * keep_cross_validation_predictions = TRUE, seed = 123 ) h2o.rmse(best_glm, xval = TRUE) ## [1] 35638.96 # Train & Cross-validate a RF model best_rf <- h2o.randomForest( x = X, y = Y, training_frame = train_h2o, ntrees = 1000, mtries = 20, max_depth = 30, min_rows = 1, sample_rate = 0.8, * nfolds = 10, * fold_assignment = "Modulo", * keep_cross_validation_predictions = TRUE, seed = 123, stopping_rounds = 50, stopping_metric = "RMSE", stopping_tolerance = 0 ) h2o.rmse(best_rf, xval = TRUE) ## [1] 24103.8 # Train & Cross-validate a GBM model best_gbm <- h2o.gbm( x = X, y = Y, training_frame = train_h2o, ntrees = 5000, learn_rate = 0.01, max_depth = 7, min_rows = 5, sample_rate = 0.8, * nfolds = 10, * fold_assignment = "Modulo", * keep_cross_validation_predictions = TRUE, seed = 123, stopping_rounds = 50, stopping_metric = "RMSE", stopping_tolerance = 0 ) h2o.rmse(best_gbm, xval = TRUE) ## [1] 21747.52 # Train & Cross-validate an XGBoost model best_xgb <- h2o.xgboost( x = X, y = Y, training_frame = train_h2o, ntrees = 5000, learn_rate = 0.05, max_depth = 3, min_rows = 3, sample_rate = 0.8, categorical_encoding = "Enum", * nfolds = 10, * fold_assignment = "Modulo", * keep_cross_validation_predictions = TRUE, seed = 123, stopping_rounds = 50, stopping_metric = "RMSE", stopping_tolerance = 0 ) h2o.rmse(best_xgb, xval = TRUE) ## [1] 20936.64 ``` ] ] --- # Stacking existing models .pull-left[ * use `h2o.stackedEnsemble()` to stack these models * we can use many different metalearning algorithms ("superlearners") - `glm`: regularized linear regression - `drf`: random forest - `gbm`: gradient boosted machine - `deeplearning`: neural network ] .pull-right[ ```r # Train a stacked tree ensemble ensemble_tree <- h2o.stackedEnsemble( x = X, y = Y, training_frame = train_h2o, model_id = "my_tree_ensemble", base_models = list(best_glm, best_rf, best_gbm, best_xgb), * metalearner_algorithm = "drf" ) ``` ] --- # Stacking existing models .scrollable90[ .pull-left[ * use `h2o.stackedEnsemble()` to stack these models * we can use many different metalearning algorithms ("superlearners") - `glm`: regularized linear regression - `drf`: random forest - `gbm`: gradient boosted machine - `deeplearning`: neural network * results illustrate a slight improvement * we're restricted on how much improvement stacking will make due to highly correlated predictions ] .pull-right[ ```r # base learners get_rmse <- function(model) { results <- h2o.performance(model, newdata = test_h2o) results@metrics$RMSE } list(best_glm, best_rf, best_gbm, best_xgb) %>% purrr::map_dbl(get_rmse) ## [1] 30024.67 23075.24 20859.92 21391.20 # stacked glm results_tree <- h2o.performance(ensemble_tree, newdata = test_h2o) results_tree@metrics$RMSE ## [1] 20664.56 ``` ```r data.frame( GLM_pred = as.vector(h2o.getFrame(best_glm@model$cross_validation_holdout_predictions_frame_id$name)), RF_pred = as.vector(h2o.getFrame(best_rf@model$cross_validation_holdout_predictions_frame_id$name)), GBM_pred = as.vector(h2o.getFrame(best_gbm@model$cross_validation_holdout_predictions_frame_id$name)), XGB_pred = as.vector(h2o.getFrame(best_xgb@model$cross_validation_holdout_predictions_frame_id$name)) ) %>% cor() ## GLM_pred RF_pred GBM_pred XGB_pred ## GLM_pred 1.0000000 0.9390229 0.9291982 0.9345048 ## RF_pred 0.9390229 1.0000000 0.9920349 0.9821944 ## GBM_pred 0.9291982 0.9920349 1.0000000 0.9854160 ## XGB_pred 0.9345048 0.9821944 0.9854160 1.0000000 ``` ] ] --- class: clear, center, middle, inverse .font300.white[Stacking a Grid Search] --- # Stacking a grid search .scrollable90[ .pull-left[ * We can also stack multiple models generated from the same base learner * Certain tuning parameters allow us to find unique patterns within the data * By stacking the results of a grid search, we can capitalize on the benefits of each of the models in our grid search to create a meta model * For example, the following performs a random grid search across a wide range of hyperparameter settings. We set the search to stop after 25 models have run. ] .pull-right[ ```r # GBM hyperparameters hyper_grid <- list( max_depth = c(1, 3, 5), min_rows = c(1, 5, 10), learn_rate = c(0.01, 0.05, 0.1), learn_rate_annealing = c(.99, 1), sample_rate = c(.5, .75, 1), col_sample_rate = c(.8, .9, 1) ) # random grid search criteria search_criteria <- list( strategy = "RandomDiscrete", max_models = 25 ) # build random grid search random_grid <- h2o.grid( algorithm = "gbm", grid_id = "gbm_grid", x = X, y = Y, training_frame = train_h2o, hyper_params = hyper_grid, search_criteria = search_criteria, ntrees = 5000, stopping_metric = "RMSE", stopping_rounds = 10, stopping_tolerance = 0, nfolds = 10, fold_assignment = "Modulo", keep_cross_validation_predictions = TRUE, seed = 123 ) ``` ]] --- # Stacking a grid search If we look at the grid search models we see that the cross-validated RMSE range from 20756-57826 .scrollable90[ ```r # collect the results and sort by our model performance metric of choice random_grid_perf <- h2o.getGrid( grid_id = "gbm_grid", sort_by = "rmse" ) random_grid_perf ## H2O Grid Details ## ================ ## ## Grid ID: gbm_grid ## Used hyper parameters: ## - col_sample_rate ## - learn_rate ## - learn_rate_annealing ## - max_depth ## - min_rows ## - sample_rate ## Number of models: 25 ## Number of failed models: 0 ## ## Hyper-Parameter Search Summary: ordered by increasing rmse ## col_sample_rate learn_rate learn_rate_annealing max_depth min_rows sample_rate model_ids rmse ## 1 0.9 0.01 1.0 3 1.0 1.0 gbm_grid_model_20 20756.16775065606 ## 2 0.9 0.01 1.0 5 1.0 0.75 gbm_grid_model_2 21188.696088824694 ## 3 0.9 0.1 1.0 3 1.0 0.75 gbm_grid_model_5 21203.753908665003 ## 4 0.8 0.01 1.0 5 5.0 1.0 gbm_grid_model_16 21704.257699437963 ## 5 1.0 0.1 0.99 3 1.0 1.0 gbm_grid_model_17 21710.275753497197 ## ## --- ## col_sample_rate learn_rate learn_rate_annealing max_depth min_rows sample_rate model_ids rmse ## 20 1.0 0.01 1.0 1 10.0 0.75 gbm_grid_model_11 26164.879525289896 ## 21 0.8 0.01 0.99 3 1.0 0.75 gbm_grid_model_15 44805.63843296435 ## 22 1.0 0.01 0.99 3 10.0 1.0 gbm_grid_model_18 44854.611500840605 ## 23 0.8 0.01 0.99 1 10.0 1.0 gbm_grid_model_21 57797.874642563846 ## 24 0.9 0.01 0.99 1 10.0 0.75 gbm_grid_model_10 57809.60302408739 ## 25 0.8 0.01 0.99 1 5.0 0.75 gbm_grid_model_4 57826.30370545089 ``` ] --- # Stacking a grid search .scrollable90[ .pull-left[ Single best model applied to our test set ```r # Grab the model_id for the top model, chosen by validation error best_model_id <- random_grid_perf@model_ids[[1]] best_model <- h2o.getModel(best_model_id) h2o.performance(best_model, newdata = test_h2o) ## H2ORegressionMetrics: gbm ## ## MSE: 466551295 *## RMSE: 21599.8 ## MAE: 13697.78 ## RMSLE: 0.1090604 ## Mean Residual Deviance : 466551295 ``` ] .pull-right[ Meta learner of our grid search applied to our test set ```r # Train a stacked ensemble using the GBM grid ensemble <- h2o.stackedEnsemble( x = X, y = Y, training_frame = train_h2o, model_id = "ensemble_gbm_grid", base_models = random_grid@model_ids, metalearner_algorithm = "gbm" ) # Eval ensemble performance on a test set h2o.performance(ensemble, newdata = test_h2o) ## H2ORegressionMetrics: stackedensemble ## ## MSE: 469579433 *## RMSE: 21669.78 ## MAE: 13499.93 ## RMSLE: 0.1061244 ## Mean Residual Deviance : 469579433 ``` ]] .center.bold.blue[We see little benefit here; likely because our predictions are highly correlated.] --- class: clear, center, middle, inverse .font300.white[Auto ML Search] --- # Auto ML search .pull-left[ * Perform an automated search across - multiple base learners - multiple hyperparameter settings * Frees your time * Multiple players - DataRobot, a current leader in AutoML (R and Python interfaces are available). - H2O Driverless AI, another leader in AutoML (R and Python interfaces available). - auto-sklearn, an automated machine learning toolkit and a drop-in replacement for scikit-learn estimator’s (Python). ] .pull-right[ <img src="images/autoML-cartoon.png" width="1500" style="display: block; margin: auto;" /> ] --- # Auto ML search .scrollable90[ .pull-left[ * `h2o.automl()` performs automated grid search and cross validation on - GLMs - RF - GBM - XGBoost - Deep Learning - Stacked results * Default will search for 1 hour but you can adjust search to run for specified: - time - number of models - and control individual model stopping tolerance * This Auto ML search: - assessed 80 models in 2 hours - good results but not as good as our previous models - use auto ML to point in good directions but don't rely on it as crème de la crème ] .pull-right[ ```r auto_ml <- h2o.automl( x = X, y = Y, training_frame = train_h2o, nfolds = 5, max_runtime_secs = 60*120, # 2 hours! max_models = 50, keep_cross_validation_predictions = TRUE, sort_metric = "RMSE", seed = 123, stopping_rounds = 50, stopping_metric = "RMSE", stopping_tolerance = 0 ) # assess the leader board # get top model: auto_ml@leader auto_ml@leaderboard %>% as.data.frame() ## model_id mean_residual_deviance rmse mse mae rmsle ## 1 XGBoost_1_AutoML_20190220_084553 494171636 22229.97 494171636 13822.87 0.1278834 ## 2 GBM_grid_1_AutoML_20190220_084553_model_1 503430766 22437.26 503430766 13767.88 0.1253053 ## 3 GBM_grid_1_AutoML_20190220_084553_model_3 518817574 22777.57 518817574 13858.60 0.1290617 ## 4 GBM_2_AutoML_20190220_084553 519183630 22785.60 519183630 14224.31 0.1280925 ## 5 GBM_3_AutoML_20190220_084553 535163213 23133.59 535163213 14220.94 0.1291349 ## 6 GBM_4_AutoML_20190220_084553 537565303 23185.45 537565303 14217.37 0.1288148 ## 7 XGBoost_2_AutoML_20190220_084553 538225152 23199.68 538225152 14219.07 0.1309783 ## 8 XGBoost_1_AutoML_20190220_075753 539692588 23231.28 539692588 14579.98 0.1336797 ## 9 GBM_1_AutoML_20190220_084553 544128711 23326.57 544128711 14261.71 0.1303192 ## 10 GBM_grid_1_AutoML_20190220_075753_model_2 544308592 23330.42 544308592 14096.86 0.1285264 ## 11 XGBoost_3_AutoML_20190220_084553 551086335 23475.23 551086335 14281.41 0.1329799 ## 12 XGBoost_grid_1_AutoML_20190220_084553_model_3 554604150 23550.04 554604150 14450.80 0.1323416 ## 13 XGBoost_grid_1_AutoML_20190220_075753_model_15 558894714 23640.95 558894714 14623.96 0.1371709 ## 14 XGBoost_grid_1_AutoML_20190220_084553_model_8 559164350 23646.66 559164350 14471.03 0.1352374 ## 15 XGBoost_grid_1_AutoML_20190220_084553_model_6 560854881 23682.37 560854881 14659.27 0.1389242 ## 16 XGBoost_grid_1_AutoML_20190220_084553_model_12 561541549 23696.87 561541549 14517.76 0.1319543 ## 17 GBM_1_AutoML_20190220_075753 564388242 23756.86 564388242 14553.10 0.1325419 ## 18 XGBoost_grid_1_AutoML_20190220_084553_model_7 566710386 23805.68 566710386 14221.74 0.1317049 ## 19 XGBoost_grid_1_AutoML_20190220_075753_model_6 569618396 23866.68 569618396 14791.46 0.1396311 ## 20 GBM_grid_1_AutoML_20190220_084553_model_2 573869609 23955.58 573869609 13675.43 0.1259909 ## 21 GBM_2_AutoML_20190220_075753 575416124 23987.83 575416124 14514.53 0.1309590 ## 22 XGBoost_grid_1_AutoML_20190220_075753_model_8 590222063 24294.49 590222063 14677.83 0.1371388 ## 23 GBM_3_AutoML_20190220_075753 591347916 24317.65 591347916 14548.26 0.1317904 ## 24 GBM_5_AutoML_20190220_084553 592751585 24346.49 592751585 15167.43 0.1354277 ## 25 XGBoost_grid_1_AutoML_20190220_084553_model_9 593645832 24364.85 593645832 14409.93 0.1347822 ## 26 GBM_4_AutoML_20190220_075753 593842616 24368.89 593842616 14539.17 0.1312543 ## 27 XGBoost_grid_1_AutoML_20190220_075753_model_14 595847877 24410.00 595847877 15102.16 0.1412699 ## 28 XGBoost_2_AutoML_20190220_075753 597183621 24437.34 597183621 14774.69 0.1350752 ## 29 XGBoost_3_AutoML_20190220_075753 618364439 24866.93 618364439 14627.21 0.1362630 ## 30 XGBoost_grid_1_AutoML_20190220_075753_model_12 618668364 24873.04 618668364 14962.68 0.1343198 ## 31 XGBoost_grid_1_AutoML_20190220_075753_model_7 623752476 24975.04 623752476 15002.37 0.1376980 ## 32 XGBoost_grid_1_AutoML_20190220_075753_model_13 625612063 25012.24 625612063 15250.60 0.1410862 ## 33 XGBoost_grid_1_AutoML_20190220_075753_model_2 631028689 25120.28 631028689 15311.83 0.1373562 ## 34 XGBoost_grid_1_AutoML_20190220_075753_model_9 639355245 25285.47 639355245 14589.69 0.1368971 ## 35 XGBoost_grid_1_AutoML_20190220_084553_model_4 644708584 25391.11 644708584 15545.27 0.1455432 ## 36 DRF_1_AutoML_20190220_084553 651375000 25522.05 651375000 15286.66 0.1413668 ## 37 DeepLearning_grid_1_AutoML_20190220_084553_model_1 652899521 25551.90 652899521 14924.34 0.1686744 ## 38 XGBoost_grid_1_AutoML_20190220_084553_model_1 666911878 25824.64 666911878 15785.05 0.1448420 ## 39 XGBoost_grid_1_AutoML_20190220_084553_model_11 669644295 25877.49 669644295 15973.13 0.1460775 ## 40 XGBoost_grid_1_AutoML_20190220_075753_model_5 670032753 25884.99 670032753 15827.29 0.1444645 ## 41 XGBoost_grid_1_AutoML_20190220_084553_model_5 673877787 25959.16 673877787 15890.51 0.1442607 ## 42 XGBoost_grid_1_AutoML_20190220_075753_model_11 674002846 25961.56 674002846 15994.60 0.1459860 ## 43 XGBoost_grid_1_AutoML_20190220_075753_model_1 679742125 26071.86 679742125 15878.26 0.1455838 ## 44 DeepLearning_grid_1_AutoML_20190220_075753_model_2 683895191 26151.39 683895191 15371.53 0.1378527 ## 45 XGBoost_grid_1_AutoML_20190220_084553_model_2 693546595 26335.27 693546595 16835.52 0.1434459 ## 46 XGBoost_grid_1_AutoML_20190220_075753_model_3 712715861 26696.74 712715861 15755.10 0.1408707 ## 47 XRT_1_AutoML_20190220_084553 715885103 26756.03 715885103 16134.13 0.1506908 ## 48 XGBoost_grid_1_AutoML_20190220_075753_model_16 716643971 26770.21 716643971 15652.50 0.1360591 ## 49 DRF_1_AutoML_20190220_075753 718590439 26806.54 718590439 16441.92 0.1470079 ## 50 DeepLearning_grid_1_AutoML_20190220_075753_model_3 720915166 26849.86 720915166 14959.75 0.1501482 ## 51 DeepLearning_grid_1_AutoML_20190220_075753_model_6 753480062 27449.59 753480062 15026.75 0.1344080 ## 52 GBM_grid_1_AutoML_20190220_075753_model_1 754219556 27463.06 754219556 18038.62 0.1519872 ## 53 DeepLearning_grid_1_AutoML_20190220_084553_model_2 755131890 27479.66 755131890 16007.82 0.1397099 ## 54 GBM_grid_1_AutoML_20190220_075753_model_7 764119160 27642.71 764119160 18469.07 0.1565801 ## 55 GBM_5_AutoML_20190220_075753 774950440 27837.93 774950440 16288.80 0.1435441 ## 56 DeepLearning_grid_1_AutoML_20190220_075753_model_5 776388379 27863.75 776388379 15576.71 0.1358005 ## 57 DeepLearning_1_AutoML_20190220_084553 805890899 28388.22 805890899 15329.34 0.1428664 ## 58 DeepLearning_grid_1_AutoML_20190220_075753_model_1 813465684 28521.32 813465684 14933.25 0.1478771 ## 59 GBM_grid_1_AutoML_20190220_075753_model_4 843074198 29035.74 843074198 18032.57 0.1638802 ## 60 XGBoost_grid_1_AutoML_20190220_084553_model_10 858324912 29297.18 858324912 16341.85 0.1386949 ## 61 XRT_1_AutoML_20190220_075753 873499767 29555.03 873499767 17548.96 0.1595296 ## 62 DeepLearning_1_AutoML_20190220_075753 874287980 29568.36 874287980 16888.83 0.1502274 ## 63 GBM_grid_1_AutoML_20190220_084553_model_4 957689400 30946.56 957689400 19817.20 0.1785158 ## 64 DeepLearning_grid_1_AutoML_20190220_075753_model_4 1080202776 32866.44 1080202776 19627.77 0.1761614 ## 65 GBM_grid_1_AutoML_20190220_084553_model_5 1154050429 33971.32 1154050429 22031.69 0.1813508 ## 66 GBM_grid_1_AutoML_20190220_075753_model_8 1189517985 34489.39 1189517985 23847.28 NA ## 67 DeepLearning_grid_1_AutoML_20190220_084553_model_3 1338954960 36591.73 1338954960 26890.09 0.2390289 ## 68 GBM_grid_1_AutoML_20190220_075753_model_6 1344509600 36667.56 1344509600 26568.92 0.2250109 ## 69 XGBoost_grid_1_AutoML_20190220_084553_model_13 1633478745 40416.32 1633478745 30366.55 0.2074675 ## 70 GBM_grid_1_AutoML_20190220_075753_model_9 2279530122 47744.43 2279530122 34931.67 NA ## 71 StackedEnsemble_AllModels_AutoML_20190220_084553 2485686253 49856.66 2485686253 35106.78 0.2722184 ## 72 StackedEnsemble_AllModels_AutoML_20190220_075753 3496013158 59127.09 3496013158 42268.32 0.3167804 ## 73 StackedEnsemble_BestOfFamily_AutoML_20190220_084553 5885176593 76714.90 5885176593 55812.19 0.4030877 ## 74 StackedEnsemble_BestOfFamily_AutoML_20190220_075753 5890317586 76748.40 5890317586 55813.43 0.4032120 ## 75 GBM_grid_1_AutoML_20190220_075753_model_5 6156796280 78465.26 6156796280 57096.84 0.4119697 ## 76 GBM_grid_1_AutoML_20190220_075753_model_3 6167799467 78535.34 6167799467 57115.15 0.4121815 ## 77 GLM_grid_1_AutoML_20190220_075753_model_1 6445574649 80284.34 6445574649 58500.33 0.4212456 ## 78 GLM_grid_1_AutoML_20190220_084553_model_1 6445574649 80284.34 6445574649 58500.33 0.4212456 ## 79 XGBoost_grid_1_AutoML_20190220_075753_model_4 8567249418 92559.44 8567249418 48730.19 5.1997830 ## 80 XGBoost_grid_1_AutoML_20190220_075753_model_10 15721368661 125384.88 15721368661 81968.35 7.3465085 ``` ]] --- # Summary <br> .pull-left[ * Multiple approaches for model stacking - Stacking existing models - Stacking a grid search * Multiple applications for model stacking - H2O - DataRobot - auto-sklearn * Auto ML frees up your time but .bold.blue[is not a panacea]! ] .pull-right[ <img src="images/automl-panacea.jpg" width="1707" style="display: block; margin: auto;" /> ] --- class: clear, center, middle background-image: url(https://slideplayer.com/slide/8645270/26/images/8/Good+vrs.+Bad+No+question+is+a+bad+question.+Well%2C+for+teachers+this+is+not+the+case%21%21.jpg) background-size: contain --- # Back home <br><br><br><br> [.center[
<i class="fas fa-home fa-10x faa-FALSE animated "></i>
]](https://github.com/uc-r/Advanced-R) .center[https://github.com/uc-r/Advanced-R]