Linear Regression. — linearRegress • easytidymodels

Runs a linear regression model, and either:

fits a basic lm() model and shows diagnostics and model fit
Uses the tidymodels approach: evaluates it on training and testing set, and tunes hyperparameters.

linearRegress(
  response = response,
  computeMarginalEffects = FALSE,
  data = df,
  train = train_df,
  test = test_df,
  tidyModelVersion = FALSE,
  recipe = rec,
  folds = folds,
  evalMetric = "rmse"
)

Arguments

response

Character. The variable that is the response for analysis.

computeMarginalEffects

Logical. Compute marginal effects for lm model?

data

The entire data frame. Used for tidyModelVersion = FALSE.

train

Data frame/tibble. The training data set.

test

Data frame/tibble. The testing data set.

tidyModelVersion

Logical. Run a tidymodel version of linear regression? If yes, will tune hyperparameters and return a tidymodels regression model. If no, will fit an lm() object and return output based on that computation.

recipe

A recipes::recipe object.

folds

A rsample::vfolds_cv object.

evalMetric

Character. The regression metric you want to evaluate the model's accuracy on (tidymodels only). Default is RMSE. Can choose from the following:

rmse
mae
rsq
mase
ccc
icc
huber_loss

Value

A list. If tidyModelVersion = TRUE:

Training set predictions
Training set evaluation on RMSE and MAE
Testing set predictions
Testing set evaluation on RMSE and MAE
Tuned model object

If tidyModelVersion = FALSE:

lm() model object
broom() cleaned object of summary of lm() model
diagnostic plots

Details

Note: Tidymodels version tunes the following parameters:

penalty: The total amount of regularization in the model. Also known as lambda.
mixture: The mixture amounts of different types of regularization (see below). If 1, amounts to LASSO regression. If 0, amounts to Ridge Regression. Also known as alpha.

Examples

library(easytidymodels)
library(dplyr)
library(recipes)
utils::data(penguins, package = "modeldata")

#Define your response variable and formula object here
resp <- "bill_length_mm"
formula <- stats::as.formula(paste(resp, ".", sep="~"))

#Split data into training and testing sets
split <- trainTestSplit(penguins, responseVar = resp)

#Create recipe for feature engineering for dataset, varies based on data working with
rec <- recipe(formula, split$train) %>% prep()
train_df <- bake(rec, split$train)
test_df <- bake(rec, split$test)
folds <- cvFolds(train_df)

#Fit a linear regression model (commented out only due to long run time)
#linReg <- linearRegress(recipe = rec, response = resp, data = penguins, tidyModelVersion = FALSE,
#folds = folds, train = train_df, test = test_df, evalMetric = "rmse")

#Visualize training data and its predictions
#linReg$trainPred %>% select(.pred, !!resp)

#View how model metrics for RMSE, R-Squared, and MAE look for training data
#linReg$trainScore

#Visualize testing data and its predictions
#linReg$testPred %>% select(.pred, !!resp)

#View how model metrics for RMSE, R-Squared, and MAE look for testing data
#linReg$testScore

#See the final model chosen by KNN based on optimizing for your chosen evaluation metric
#linReg$final

#See how model fit looks based on another evaluation metric
#linReg$tune %>% tune::show_best("mae")