class: clear, center, middle background-image: url(https://educationalresearchtechniques.files.wordpress.com/2014/08/1.jpg?w=624) background-position: center background-size: cover <br><br><br><br><br><br><br><br><br><br><br><br><br> .font200.bold[Regression & Cousins] --- # Introduction .pull-left[ .center.bold.font120[Thoughts] - a fundamental analytic method - still widely used - basic approaches have large assumptions - serves as a foundation to many extension methods ] -- .pull-right[ .center.bold.font120[Overview] - Ordinary Least Squares - Principal Component Regression - Partial Least Squares Regression - Regularized Regression - Multivariate Adaptive Regression Splines ] --- # Prereqs .red[
<i class="fas fa-hand-point-right faa-horizontal animated " style=" color:red;"></i>
code chunk 1] .pull-left[ .center.bold.font120[Packages] ```r library(dplyr) library(ggplot2) library(rsample) library(recipes) library(vip) library(caret) ``` ] .pull-right[ .center.bold.font120[Data] ```r # ames data ames <- AmesHousing::make_ames() # split data set.seed(123) split <- initial_split(ames, strata = "Sale_Price") ames_train <- training(split) ``` ] --- class: center, middle, inverse .font300.white[Ordinary Least Squares] --- # The Objective <img src="05-regression_files/figure-html/unnamed-chunk-1-1.png" style="display: block; margin: auto;" /> * Model form: `\(y_i = \beta_0 + \beta_{1}x_{i1} + \beta_{2}x_{i2} \cdots + \beta_{p}x_{ip} + \epsilon_i\)` * Objective function: `\(\text{minimize} \bigg \{ SSE = \sum^n_{i=1} (y_i - \hat{y}_i)^2 \bigg \} \equiv \text{minimize MSE}\)` --- # Simple linear regression .red[
<i class="fas fa-hand-point-right faa-horizontal animated " style=" color:red;"></i>
code chunk 2] .pull-left.font120[ - .bold.blue[`lm()`] performs OLS in base R - `glm()` also performs linear regression but extends to other generalized methods (i.e. logistic regression) - `summary(model)` provides many results (i.e. "Residual Standard Error" is the RMSE) - No method for resampling (i.e. cross validation) with `lm()` ] .pull-right[ ```r model1 <- lm(Sale_Price ~ Gr_Liv_Area, data = ames_train) summary(model1) ## ## Call: ## lm(formula = Sale_Price ~ Gr_Liv_Area, data = ames_train) ## ## Residuals: ## Min 1Q Median 3Q Max ## -482352 -30498 -1929 23455 325966 ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) 13908.322 3758.256 3.701 0.00022 *** ## Gr_Liv_Area 111.387 2.374 46.915 < 2e-16 *** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Residual standard error: 56760 on 2197 degrees of freedom ## Multiple R-squared: 0.5005, Adjusted R-squared: 0.5002 ## F-statistic: 2201 on 1 and 2197 DF, p-value: < 2.2e-16 ``` ] --- # Multiple linear regression .red[
<i class="fas fa-hand-point-right faa-horizontal animated " style=" color:red;"></i>
code chunk 3] .pull-left[ ```r # OLS model with two predictors model2 <- lm(Sale_Price ~ Gr_Liv_Area + Year_Built, data = ames_train) # OLS model with specified interactions model3 <- lm(Sale_Price ~ Gr_Liv_Area + Year_Built + Gr_Liv_Area : Year_Built, data = ames_train) # include all possible main effects model4 <- lm(Sale_Price ~ ., data = ames_train) ``` ] .pull-right[
] --- # Assessing model accuracy .pull-left[ We've fit four models to the Ames housing data: 1. a single predictor, 2. two predictors, 3. two predictors with interaction, 4. and all possible main effect predictors. <br> .center.bold.blue[Which model is "best"?] ] --- # Assessing model accuracy .red[
<i class="fas fa-hand-point-right faa-horizontal animated " style=" color:red;"></i>
code chunk 4] .scrollable90[ .pull-left[ We've fit four models to the Ames housing data: 1. a single predictor, 2. two predictors, 3. two predictors with interaction, 4. and all possible main effect predictors. <br> .center.bold.blue[Which model is "best"?] ] .pull-right[ ```r # create a resampling method cv <- trainControl( method = "repeatedcv", number = 10, repeats = 5 ) # model 1 CV set.seed(123) (cv_model1 <- train( Sale_Price ~ Gr_Liv_Area, data = ames_train, * method = "lm", trControl = cv) ) ## Linear Regression ## ## 2199 samples ## 1 predictor ## ## No pre-processing ## Resampling: Cross-Validated (10 fold, repeated 5 times) ## Summary of sample sizes: 1979, 1979, 1979, 1980, 1979, 1979, ... ## Resampling results: ## ## RMSE Rsquared MAE ## 56638.21 0.5087348 38936.29 ## ## Tuning parameter 'intercept' was held constant at a value of TRUE ``` ] ] --- # Assessing model accuracy .red[
<i class="fas fa-hand-point-right faa-horizontal animated " style=" color:red;"></i>
code chunk 5] .scrollable90[ .pull-left[ We've fit four models to the Ames housing data: 1. a single predictor, 2. two predictors, 3. two predictors with interaction, 4. and all possible main effect predictors. <br> .center.bold.blue[Model using most predictors is marginally superior] ] .pull-right[ ```r # model 2 CV set.seed(123) cv_model2 <- train( Sale_Price ~ Gr_Liv_Area + Year_Built, data = ames_train, method = "lm", trControl = cv ) # model 3 CV set.seed(123) cv_model3 <- train( Sale_Price ~ Gr_Liv_Area + Year_Built + Gr_Liv_Area : Year_Built, data = ames_train, method = "lm", trControl = cv ) # model 4 CV set.seed(123) cv_model4 <- train( Sale_Price ~ ., data = ames_train, method = "lm", trControl = cv ) # Extract out of sample performance measures summary(resamples(list( model1 = cv_model1, model2 = cv_model2, model3 = cv_model3, model4 = cv_model4 ))) ## ## Call: ## summary.resamples(object = resamples(list(model1 = cv_model1, model2 ## = cv_model2, model3 = cv_model3, model4 = cv_model4))) ## ## Models: model1, model2, model3, model4 ## Number of resamples: 50 ## ## MAE ## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's ## model1 33391.58 37115.39 39074.58 38936.29 40519.76 44903.97 0 ## model2 27294.80 30392.32 31415.36 31665.22 32910.40 36435.19 0 ## model3 26063.69 28723.74 30447.37 30384.24 31527.17 35026.06 0 ## model4 13423.32 16336.78 17336.64 17783.08 18944.19 28176.20 0 ## ## RMSE ## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's ## model1 45041.67 53682.66 56166.20 56638.21 60191.24 68067.25 0 ## model2 36153.70 43554.93 45907.15 46773.06 51105.87 59716.92 0 ## model3 34660.02 41331.79 44432.30 45996.25 50120.74 61033.82 0 ## model4 18834.46 28623.92 42438.30 44643.81 54707.54 97182.92 0 ## ## Rsquared ## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's ## model1 0.3280132 0.4614648 0.5158123 0.5087348 0.5556577 0.6291068 0 ## model2 0.5165565 0.6362555 0.6707707 0.6649413 0.7030637 0.7615288 0 ## model3 0.4909203 0.6315862 0.7035659 0.6781410 0.7333001 0.7964460 0 ## model4 0.2978212 0.6137001 0.7517621 0.7303331 0.8854576 0.9443121 0 ``` ] ] --- # Model concerns .pull-left[ <br><br><br> .bold.center[With simplistic models comes many assumptions...often at the expense of model performance] ] .pull-right[ <br> <img src="https://media1.tenor.com/images/3c888132eb6fbedec9a131bc55a05315/tenor.gif?itemid=10744949" style="display: block; margin: auto;" /> ] --- # Model concerns .pull-left[ 1. .bold.red[Linear relationship] 2. Constant variance among residuals 3. No autocorrelation 4. More observations than predictors 5. No or little multicollinearity <br> .bold.center[<u>Sometimes</u> we can resolve this with transformations] ] .pull-right[ <img src="05-regression_files/figure-html/unnamed-chunk-7-1.png" style="display: block; margin: auto;" /> ] --- # Model concerns .pull-left[ 1. Linear relationship 2. .bold.red[Constant variance among residuals] 3. No autocorrelation 4. More observations than predictors 5. No or little multicollinearity <br> .bold.center[<u>Sometimes</u> we can resolve this with transformations or adding more features] ] .pull-right[ <img src="05-regression_files/figure-html/unnamed-chunk-8-1.png" style="display: block; margin: auto;" /> ] --- # Model concerns .pull-left[ 1. Linear relationship 2. Constant variance among residuals 3. .bold.red[No autocorrelation] 4. More observations than predictors 5. No or little multicollinearity <br> .bold.center[<u>Sometimes</u> we can resolve this by adding more features] ] .pull-right[ <img src="05-regression_files/figure-html/unnamed-chunk-9-1.png" style="display: block; margin: auto;" /> ] --- # Model concerns .pull-left[ 1. Linear relationship 2. Constant variance among residuals 3. No autocorrelation 4. .bold.red[More observations than predictors] 5. No or little multicollinearity <br> .bold.center[<u>Sometimes</u> we can resolve this with feature reduction techniques] ] .pull-right[ <table class="table table-striped" style="margin-left: auto; margin-right: auto;"> <thead> <tr> <th style="text-align:right;"> y </th> <th style="text-align:right;"> x1 </th> <th style="text-align:right;"> x2 </th> <th style="text-align:right;"> x3 </th> <th style="text-align:right;"> x4 </th> <th style="text-align:right;"> x5 </th> <th style="text-align:right;"> x6 </th> <th style="text-align:right;"> x7 </th> <th style="text-align:right;"> x8 </th> <th style="text-align:right;"> x9 </th> <th style="text-align:right;"> x10 </th> </tr> </thead> <tbody> <tr> <td style="text-align:right;"> 199434 </td> <td style="text-align:right;"> 10 </td> <td style="text-align:right;"> 3 </td> <td style="text-align:right;"> 3 </td> <td style="text-align:right;"> 4 </td> <td style="text-align:right;"> 4 </td> <td style="text-align:right;"> 1 </td> <td style="text-align:right;"> 5 </td> <td style="text-align:right;"> 9 </td> <td style="text-align:right;"> 10 </td> <td style="text-align:right;"> 8 </td> </tr> <tr> <td style="text-align:right;"> 199698 </td> <td style="text-align:right;"> 9 </td> <td style="text-align:right;"> 5 </td> <td style="text-align:right;"> 9 </td> <td style="text-align:right;"> 10 </td> <td style="text-align:right;"> 8 </td> <td style="text-align:right;"> 5 </td> <td style="text-align:right;"> 8 </td> <td style="text-align:right;"> 8 </td> <td style="text-align:right;"> 2 </td> <td style="text-align:right;"> 4 </td> </tr> <tr> <td style="text-align:right;"> 158515 </td> <td style="text-align:right;"> 10 </td> <td style="text-align:right;"> 7 </td> <td style="text-align:right;"> 9 </td> <td style="text-align:right;"> 1 </td> <td style="text-align:right;"> 7 </td> <td style="text-align:right;"> 5 </td> <td style="text-align:right;"> 9 </td> <td style="text-align:right;"> 9 </td> <td style="text-align:right;"> 6 </td> <td style="text-align:right;"> 1 </td> </tr> <tr> <td style="text-align:right;"> 251432 </td> <td style="text-align:right;"> 6 </td> <td style="text-align:right;"> 4 </td> <td style="text-align:right;"> 2 </td> <td style="text-align:right;"> 8 </td> <td style="text-align:right;"> 7 </td> <td style="text-align:right;"> 9 </td> <td style="text-align:right;"> 2 </td> <td style="text-align:right;"> 5 </td> <td style="text-align:right;"> 5 </td> <td style="text-align:right;"> 1 </td> </tr> <tr> <td style="text-align:right;"> 209020 </td> <td style="text-align:right;"> 1 </td> <td style="text-align:right;"> 3 </td> <td style="text-align:right;"> 9 </td> <td style="text-align:right;"> 8 </td> <td style="text-align:right;"> 10 </td> <td style="text-align:right;"> 5 </td> <td style="text-align:right;"> 4 </td> <td style="text-align:right;"> 10 </td> <td style="text-align:right;"> 4 </td> <td style="text-align:right;"> 8 </td> </tr> </tbody> </table> .bold.center.red[Not invertible --> solutions are non-unique meaning there are many "right" solutions for our feature coefficients!] ] --- # Model concerns .red[
<i class="fas fa-hand-point-right faa-horizontal animated " style=" color:red;"></i>
code chunk 6] .pull-left[ 1. Linear relationship 2. Constant variance among residuals 3. No autocorrelation 4. More observations than predictors 5. .bold.red[No or little multicollinearity] <br> .bold.center[<u>Sometimes</u> we can resolve this with feature reduction techniques] ] .pull-right[ ```r m1 <- lm(Sale_Price ~ Gr_Liv_Area + TotRms_AbvGrd, data = ames_train) m2 <- lm(Sale_Price ~ Gr_Liv_Area, data = ames_train) m3 <- lm(Sale_Price ~ TotRms_AbvGrd, data = ames_train) *coef(m1) ## (Intercept) Gr_Liv_Area TotRms_AbvGrd ## 44396.7120 139.6244 -11300.7303 *coef(m2) ## (Intercept) Gr_Liv_Area ## 13908.3217 111.3867 *coef(m3) ## (Intercept) TotRms_AbvGrd ## 18614.36 25179.02 ``` ] --- # Model concerns .pull-left[ 1. Linear relationship 2. Constant variance among residuals 3. No autocorrelation 4. More observations than predictors 5. No or little multicollinearity ] .pull-right[ <img src="http://tinderdistrict.com/wp-content/uploads/2018/06/complicated.gif" style="display: block; margin: auto;" /> ] -- <br><br> .bold.center[Many regression extensions have been developed to deal with these concerns.] --- class: center, middle, inverse .font300.white[Principal Component Regression] --- # The idea .pull-left[ PCR performs feature reduction to help minimize impact of: - multicollinearity (becomes a bigger concern the more predictors we have) - when `\(p >> n\)` Steps: 1. Reduce *p* features to *c* PCs (not guided by the response) 2. Use PCs as predictors and perform regression as usual ] .pull-right[ <img src="images/pcr-steps.png" width="86%" height="86%" style="display: block; margin: auto;" /> ] --- # R packages 📦 <br> .font130[ - Any package that implements PCA can be applied prior to modeling, - See [multivariate task view]( https://CRAN.R-project.org/view=Multivariate ) on CRAN for options; however,... - .bold[caret] provides and integrated `method = "pcr"` that helps to automate the tuning process ] --- # Implementation .red[
<i class="fas fa-hand-point-right faa-horizontal animated " style=" color:red;"></i>
code chunk 7] .pull-left[ ```r # 1. hypergrid hyper_grid <- expand.grid(ncomp = seq(2, 40, by = 2)) # 2. PCR set.seed(123) cv_pcr <- train( Sale_Price ~ ., data = ames_train, trControl = cv, * method = "pcr", * preProcess = c("zv", "center", "scale"), * tuneGrid = hyper_grid, metric = "RMSE" ) # model with lowest RMSE cv_pcr$bestTune ## ncomp ## 20 40 cv_pcr$results %>% filter(ncomp == as.numeric(cv_pcr$bestTune)) ## ncomp RMSE Rsquared MAE RMSESD RsquaredSD MAESD ## 1 40 33719.52 0.8242447 21181.53 6806.823 0.05997334 1723.526 ``` ] .pull-right[ ```r # plot cross-validated RMSE plot(cv_pcr) ``` <img src="05-regression_files/figure-html/pcr-plot-revised-1.png" style="display: block; margin: auto;" /> .center.bold[Feature reduction with PCR improves prediction error by ~ $10K] ] --- # Tuning .red[
<i class="fas fa-hand-point-right faa-horizontal animated " style=" color:red;"></i>
code chunk 8] .scrollable90[ .pull-left[ - The number of PCs is the only hyperparameter - rule of
<img src="https://emojis.slackmojis.com/emojis/images/1511903783/3230/wiggle_thumbs_up.gif?1511903783" style="height:1em; width:auto; "/>
- assess 2-*p* in evenly divided segments - start with a few and zoom in ] .pull-right[ ```r # 1. hypergrid p <- length(ames_train) - 1 *hyper_grid <- expand.grid(ncomp = seq(2, 80, length.out = 10)) # 2. PCR set.seed(123) cv_pcr <- train( Sale_Price ~ ., data = ames_train, trControl = cv, method = "pcr", preProcess = c("zv", "center", "scale"), tuneGrid = hyper_grid, metric = "RMSE" ) # RMSE cv_pcr$results %>% filter(ncomp == cv_pcr$bestTune$ncomp) ## ncomp RMSE Rsquared MAE RMSESD RsquaredSD MAESD ## 1 62.66667 32995.73 0.8312953 20620.62 6733.679 0.05834625 1764.946 # plot cross-validated RMSE plot(cv_pcr) ``` <img src="05-regression_files/figure-html/pcr-grid-2-1.png" style="display: block; margin: auto;" /> ] ] --- class: center, middle, inverse .font300.white[Partial Least Squares Regression] --- # The idea .pull-left[ - A problem with PCR is that the PCs are developed independent of the response. - PLS - has similar intentions as PCR - finds PCs that maximize correlation with the response - typically results in a stronger signal between PCs and response ] .pull-right[ <img src="images/pls-steps.png" width="94%" height="94%" style="display: block; margin: auto;" /> ] --- # The idea .pull-left[ - A problem with PCR is that the PCs are developed independent of the response. - PLS - has similar intentions as PCR - finds PCs that maximize correlation with the response - .bold.blue[typically results in a stronger signal between PCs and response] ] .pull-right[ <img src="05-regression_files/figure-html/pls-vs-pcr-relationship-1.png" style="display: block; margin: auto;" /> ] --- # R packages 📦 .pull-left[ ## [`pls`](https://cran.r-project.org/package=pls) * **p**artial **l**east **s**quares * Original and primary implementation of PLS * Provides both PLS & PCR capabilities ] .pull-right[ ## [Other pkgs](https://CRAN.R-project.org/view=Multivariate) * `ppls`: penalized partial least squares * `dr`: provides various dimension reduction regression options * `plsgenomics`: provides partial least squares analyses for genomics ] --- # Implementation .red[
<i class="fas fa-hand-point-right faa-horizontal animated " style=" color:red;"></i>
code chunk 9] .pull-left[ ```r # PLS set.seed(123) cv_pls <- train( Sale_Price ~ ., data = ames_train, trControl = cv, * method = "pls", preProcess = c("zv", "center", "scale"), tuneGrid = hyper_grid, metric = "RMSE" ) # model with lowest RMSE cv_pls$bestTune ## ncomp ## 2 4 cv_pls$results %>% filter(ncomp == as.numeric(cv_pls$bestTune)) ## ncomp RMSE Rsquared MAE RMSESD RsquaredSD MAESD ## 1 4 32415.83 0.8387649 18399.02 8059.457 0.06714616 1647.492 ``` ] .pull-right[ ```r # plot cross-validated RMSE plot(cv_pls) ``` <img src="05-regression_files/figure-html/pls-plot-1.png" style="display: block; margin: auto;" /> .center.bold[Using PLS improves prediction error by an additional $500] ] --- # Tuning - The number of PCs is the only hyperparameter - Will almost always require less PCs than PCR - rule of
<img src="https://emojis.slackmojis.com/emojis/images/1511903783/3230/wiggle_thumbs_up.gif?1511903783" style="height:1em; width:auto; "/>
- assess 2-*p* in evenly divided segments - start with a few and zoom in --- class: center, middle, inverse .font300.white[Regularized Regression] --- # The Idea .font120[As *p* grows larger, there are three main issues we most commonly run into:] 1. Multicollinearity (we've already seen how PCR & PLS help to resolve this) 2. Insufficient solution ( `\(p >> n\)` ) 3. Interpretability - Approach 1: model selection - computationally inefficient (Ames data: `\(2^{80}\)` models to evaluate) - simply assume a feature as in or out `\(\rightarrow\)` _hard threshholding_ - Approach 2: regularize - retain all coefficients - slowly pushes a feature's effect towards zero `\(\rightarrow\)` _soft threshholding_ -- <br> .center.bold.blue[Regularization helps with all three of these issues!] --- # Regular regression <br> `\begin{equation} \text{minimize} \bigg \{ SSE = \sum^n_{i=1} (y_i - \hat{y}_i)^2 \bigg \} \end{equation}` <img src="05-regression_files/figure-html/unnamed-chunk-11-1.png" style="display: block; margin: auto;" /> --- # Regular.red[ized] regression <br> `\begin{equation} \text{minimize} \big \{ SSE + P \big \} \end{equation}` <br> Modify OLS objective function by adding a ___.red[P]enalty___ parameter - Constrains magnitude of the coefficients - Progressively shrinks coefficients to zero - Reduces variability of coefficients (pulls correlated coefficients together) - Can automate feature selection <br> .center.bold.blue[There are 3 variants of regularized regression] --- # .red[Ridge] regression .pull-left[ Objective function: `\begin{equation} \text{minimize } \bigg \{ SSE + \lambda \sum^p_{j=1} \beta_j^2 \bigg \} \end{equation}` * referred to as `\(L_2\)` penalty * pulls correlated features towards each other * pushes coefficients to .red[near zero] * retains .red[all] features ] .pull-right[ <img src="05-regression_files/figure-html/ridge-coef-example-1.png" style="display: block; margin: auto;" /> <img src="images/lambda.001.png" width="1753" style="display: block; margin: auto;" /> ] --- # .red[Lasso] regression .pull-left[ Objective function: `\begin{equation} \text{minimize } \bigg \{ SSE + \lambda \sum^p_{j=1} | \beta_j | \bigg \} \end{equation}` * referred to as `\(L_1\)` penalty * pulls correlated features towards each other * pushes coefficients to .red[zero] * performs .red[automated feature selection] ] .pull-right[ <img src="05-regression_files/figure-html/lasso-coef-example-1.png" style="display: block; margin: auto;" /> <img src="images/lambda.001.png" width="1753" style="display: block; margin: auto;" /> ] --- # .red[Elastic net] regression .pull-left[ Objective function: `\begin{equation} \text{minimize } \bigg \{ SSE + \lambda_1 \sum^p_{j=1} \beta_j^2 + \lambda_2 \sum^p_{j=1} | \beta_j | \bigg \} \end{equation}` * combines `\(L_1\)` & `\(L_2\)` penalties * provides best of both worlds ] .pull-right[ <img src="05-regression_files/figure-html/elastic-net-coef-example-1.png" style="display: block; margin: auto;" /> <img src="images/lambda.001.png" width="1753" style="display: block; margin: auto;" /> ] --- # Tuning .red[
<i class="fas fa-hand-point-right faa-horizontal animated " style=" color:red;"></i>
code chunk 10] .pull-left[ * .bold[lambda] - controls the magnitude of the penalty parameter - rule of
<img src="https://emojis.slackmojis.com/emojis/images/1511903783/3230/wiggle_thumbs_up.gif?1511903783" style="height:1em; width:auto; "/>
: 0.1, 10, 100, 1000, 10000 * .bold[alpha] - controls the type of penalty (ridge, lasso, elastic net) - rule of
<img src="https://emojis.slackmojis.com/emojis/images/1511903783/3230/wiggle_thumbs_up.gif?1511903783" style="height:1em; width:auto; "/>
: 0, .25, .50, .75, 1 ] .pull-right[ <br> .center[.bold[Tip]: find tuning parameters with:] ```r caret::getModelInfo("glmnet")$glmnet$parameters ## parameter class label ## 1 alpha numeric Mixing Percentage ## 2 lambda numeric Regularization Parameter ``` .center[Here, "glmnet" represents the __caret__ method we are going to use] ] --- # R packages 📦 .pull-left[ ## [`glmnet`](https://cran.r-project.org/package=glmnet) * original implementation of regularized regression in R * linear regression, logistic and multinomial regression models, Poisson regression and the Cox model * extremely efficient procedures for fitting the entire lasso or elastic-net regularization path ] .pull-right[ ## [h2o](https://cran.r-project.org/package=h2o) 💧 * java-based interface * Automated feature pre-processing & validation procedures * Supports the following distributions: “guassian”, “binomial”, “multinomial”, “ordinal”, “poisson”, “gamma”, “tweedie” ] .center.bold[Other options exist (see __Regularized and Shrinkage Methods__ section of [Machine Learning task view](https://CRAN.R-project.org/view=MachineLearning )) but these are the preferred.] --- # Implementation .red[
<i class="fas fa-hand-point-right faa-horizontal animated " style=" color:red;"></i>
code chunk 11] .scrollable90[ .pull-left[ ```r # tuning grid hyper_grid <- expand.grid( alpha = seq(0, 1, by = .25), lambda = c(0.1, 10, 100, 1000, 10000) ) # perform resampling set.seed(123) cv_glmnet <- train( Sale_Price ~ ., data = ames_train, trControl = cv, * method = "glmnet", preProcess = c("zv", "center", "scale"), tuneGrid = hyper_grid, metric = "RMSE" ) # best model cv_glmnet$results %>% filter( alpha == cv_glmnet$bestTune$alpha, lambda == cv_glmnet$bestTune$lambda ) ## alpha lambda RMSE Rsquared MAE RMSESD RsquaredSD MAESD ## 1 0 10000 31223.9 0.847993 16522.27 8883.647 0.07382721 1513.611 ``` ] .pull-right[ ```r # plot results plot(cv_glmnet) ``` <img src="05-regression_files/figure-html/cv-glmnet-plot-1.png" style="display: block; margin: auto;" /> .center.bold[Regularization gives us a slight improvement (~$1K)] ] ] --- class: center, middle, inverse .font300.white[Multivariate Adaptive Regression Splines] --- # The idea * So far, we have tried to improve our linear model with various feature reduction and regularization approaches * However, we are still assuming linear relationships * The actual relationship(s) may have non-linear patterns that we cannot capture <img src="05-regression_files/figure-html/non-linearity-1.png" style="display: block; margin: auto;" /> --- # The idea .font120[ * There are some traditional approaches we could take to capture non-linear relationships: - polynomial relationships - step function relationships ] <img src="05-regression_files/figure-html/traditional-nonlinear-approaches-1.png" style="display: block; margin: auto;" /> <br> .center.bold.blue[However, these require the user explicitly identify & incorporate
<img src="https://emojis.slackmojis.com/emojis/images/1542340473/4983/yuck.gif?1542340473" style="height:2em; width:auto; "/>
] --- # The idea .pull-left[ * Multivariate adaptive regression splines (MARS) provide a convenient & automated approach to capture non-linearity * Easy transition from linear regression to non-linearity methods * Looks for .blue[knots] in predictors <br><br> `\begin{equation} \text{y} = \begin{cases} \beta_0 + \beta_1(1.183606 - \text{x}) & \text{x} < 1.183606, \\ \beta_0 + \beta_1(\text{x} - 1.183606) & \text{x} > 1.183606 \end{cases} \end{equation}` ] .pull-right[ <img src="05-regression_files/figure-html/one-knot-1.png" style="display: block; margin: auto;" /> ] --- # The idea .pull-left[ * Multivariate adaptive regression splines (MARS) provide a convenient & automated approach to capture non-linearity * Easy transition from linear regression to non-linearity methods * Looks for .blue[knots] in predictors <br><br> `\begin{equation} \text{y} = \begin{cases} \beta_0 + \beta_1(1.183606 - \text{x}) & \text{x} < 1.183606, \\ \beta_0 + \beta_1(\text{x} - 1.183606) & \text{x} > 1.183606 \quad \& \quad \text{x} < 4.898114, \\ \beta_0 + \beta_1(4.898114 - \text{x}) & \text{x} > 4.898114 \end{cases} \end{equation}` ] .pull-right[ <img src="05-regression_files/figure-html/two-knots-1.png" style="display: block; margin: auto;" /> ] --- # The idea .pull-left[ * Multivariate adaptive regression splines (MARS) provide a convenient & automated approach to capture non-linearity * Easy transition from linear regression to non-linearity methods * Looks for .blue[knots] in predictors ] .pull-right[ <img src="05-regression_files/figure-html/three-knots-1.png" style="display: block; margin: auto;" /> ] --- # The idea .pull-left[ * Multivariate adaptive regression splines (MARS) provide a convenient & automated approach to capture non-linearity * Easy transition from linear regression to non-linearity methods * Looks for .blue[knots] in predictors ] .pull-right[ <img src="05-regression_files/figure-html/four-knots-1.png" style="display: block; margin: auto;" /> ] --- # The idea .pull-left[ * Multivariate adaptive regression splines (MARS) provide a convenient & automated approach to capture non-linearity * Easy transition from linear regression to non-linearity methods * Looks for .blue[knots] in predictors ] .pull-right[ <img src="05-regression_files/figure-html/nine-knots-1.png" style="display: block; margin: auto;" /> ] --- # R packages 📦 .pull-left[ ## [`mda`](https://cran.r-project.org/package=mda) * **m**ixture **d**iscriminant **a**nalysis * Lightweight function `mars()` * Gives quite similar results to Friedman's original FORTRAN program * No formula method ] .pull-right[ ## [`earth`](http://www.milbo.users.sonic.net/earth/) 🌎 * **e**nhanced **a**daptive **r**egression **t**hrough **h**inges * Derived from `mda::mars()` * Support for GLMs (e.g., logistic regression) * More bells and whistles than `mda::mars()`; for example, - Variable importance scores - Support for `\(k\)`-fold cross-validation) ] --- # Tuning parameters .red[
<i class="fas fa-hand-point-right faa-horizontal animated " style=" color:red;"></i>
code chunk 12] MARS models have two tuning parameters: .pull-left[ 1. .blue[_nprune_]: the maximum number of terms in the pruned model (including the intercept) 2. .blue[_degree_]: the maximum degree of interaction ] .pull-right[ ```r caret::getModelInfo("earth")$earth$parameters ## parameter class label ## 1 nprune numeric #Terms ## 2 degree numeric Product Degree ``` ] --- # Implementation .red[
<i class="fas fa-hand-point-right faa-horizontal animated " style=" color:red;"></i>
code chunk 13] .scrollable90[ .pull-left[ ```r # tuning grid hyper_grid <- expand.grid( nprune = seq(2, 50, length.out = 10) %>% floor(), degree = 1:3 ) # perform resampling set.seed(123) cv_mars <- train( Sale_Price ~ ., data = ames_train, trControl = cv, * method = "earth", tuneGrid = hyper_grid, metric = "RMSE" ) # best model cv_mars$results %>% filter( nprune == cv_mars$bestTune$nprune, degree == cv_mars$bestTune$degree ) ## degree nprune RMSE Rsquared MAE RMSESD RsquaredSD MAESD ## 1 1 34 26705.66 0.8894754 16874.05 5521.644 0.0375183 1475.742 ``` ] .pull-right[ ```r # plot results plot(cv_mars) ``` <img src="05-regression_files/figure-html/cv-mars-plot-1.png" style="display: block; margin: auto;" /> .center.bold[MARS' non-linearity gives us a big improvement (~$4.5K)!] ] ] --- class: center, middle, inverse .font300.white[Model Comparison] --- # Comparing error distributions .red[
<i class="fas fa-hand-point-right faa-horizontal animated " style=" color:red;"></i>
code chunk 14] .pull-left[ ```r results <- resamples(list( OLS = cv_model4, PCR = cv_pcr, PLS = cv_pls, EN = cv_glmnet, MARS = cv_mars )) summary(results)$statistics$RMSE ## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's ## OLS 18834.46 28623.92 42438.30 44643.81 54707.54 97182.92 0 ## PCR 23982.16 27808.91 30312.96 32995.73 38845.12 51513.31 0 ## PLS 20677.75 24821.80 31198.85 32415.83 39860.46 49923.76 0 ## EN 18146.03 23290.31 28011.76 31223.90 38184.16 51833.34 0 ## MARS 20260.84 22463.84 24585.35 26705.66 30168.53 45845.05 0 ``` ] .pull-right[ ```r p1 <- bwplot(results, metric = "RMSE") p2 <- dotplot(results, metric = "RMSE") gridExtra::grid.arrange(p1, p2, nrow = 1) ``` <img src="05-regression_files/figure-html/compare-bwplot-1.png" style="display: block; margin: auto;" /> ] .center.bold[Student's *t*-test or a rank sum test could also be used.] --- class: center, middle, inverse .font300.white[Feature Interpretation] --- # Feature importance .red[
<i class="fas fa-hand-point-right faa-horizontal animated " style=" color:red;"></i>
code chunk 15] .pull-left[ .center.bold.font120[vip] * .bold[v]ariable .bold[i]mportance .bold[p]lots illustrate the influence each predictor has * many packages have their own vip plots * the __vip__ 📦 provides a common output * different models measure "importance" differently * we'll review this more indepth tomorrow ] .pull-right[ ```r vip(cv_mars) ``` <img src="05-regression_files/figure-html/mars-vip-1.png" style="display: block; margin: auto;" /> ] --- # Feature importance .red[
<i class="fas fa-hand-point-right faa-horizontal animated " style=" color:red;"></i>
code chunk 16] <img src="05-regression_files/figure-html/all-vip-1.png" style="display: block; margin: auto;" /> --- # Feature importance .scrollable90[ .pull-left.font130[ * Some models do not perform feature selection - OLS - PCR - PLS ] .pull-right[ <img src="05-regression_files/figure-html/no-feature-selection-1.png" style="display: block; margin: auto;" /> ] ] --- # Feature importance .scrollable90[ .pull-left.font120[ * Some models do not perform feature selection - OLS - PCR - PLS * Whereas some models do - Regularized regression - MARS ] .pull-right[ <img src="05-regression_files/figure-html/mars-feature-selection-1.png" style="display: block; margin: auto;" /> ] ] --- # Feature effects .red[
<i class="fas fa-hand-point-right faa-horizontal animated " style=" color:red;"></i>
code chunk 17] .pull-left[ * feature effects measures the relationship between a feature and the target variable * most common approach is a .bold[p]artial .bold[d]ependence .bold[p]lot * computs the average response value when all observations use a particular value for a given feature * we will review this more tomorrow ] .pull-right[ ```r pdp::partial(cv_model2, pred.var = "Gr_Liv_Area", grid.resolution = 10) %>% autoplot() ``` <img src="05-regression_files/figure-html/ols-pdp-1.png" style="display: block; margin: auto;" /> ] --- # Feature effects .red[
<i class="fas fa-hand-point-right faa-horizontal animated " style=" color:red;"></i>
code chunk 18] <br><br> <img src="05-regression_files/figure-html/all-pdps, -1.png" style="display: block; margin: auto;" /> --- # Feature effects .red[
<i class="fas fa-hand-point-right faa-horizontal animated " style=" color:red;"></i>
code chunk 19] Assess the interaction of the top 2 predictors: ```r pdp::partial(cv_mars, pred.var = c("Gr_Liv_Area", "Year_Built"), grid.resolution = 10) %>% pdp::plotPartial(levelplot = FALSE, zlab = "yhat", drape = TRUE, colorkey = TRUE, screen = list(z = -20, x = -60)) ``` <img src="05-regression_files/figure-html/interaction-pdp-1.png" style="display: block; margin: auto;" /> --- class: center, middle, inverse .font300.white[Wrapping up] --- # Summary .pull-left[ * Ordinary least squares - simple but lots of assumptions - typically poor predictive accuracy * Principal Component Regression - minimizes multicollinearity - helps when `\(p >> n\)` * Partial Least Squares - same benefits as PCR but - creates stronger signal btwn PCs and target ] .pull-right[ * Regularized Regression - minimizes multicollinearity - helps when `\(p >> n\)` - can provide automated feature selection * Multivariate Adaptive Regression Splines - captures non-linear relationships - can automatically capture interactions - provides automated feature selection ] --- # Questions? <img src="https://66.media.tumblr.com/tumblr_lra006KFZc1qk976yo1_500.gif" width="80%" height="80%" style="display: block; margin: auto;" /> --- # Back home <br><br><br><br> [.center[
<i class="fas fa-home fa-10x faa-FALSE animated "></i>
]](https://github.com/uc-r/Advanced-R) .center[https://github.com/uc-r/Advanced-R]