Runs a linear regression model, and either:

  1. fits a basic lm() model and shows diagnostics and model fit

  2. Uses the tidymodels approach: evaluates it on training and testing set, and tunes hyperparameters.

  response = response,
  computeMarginalEffects = FALSE,
  data = df,
  train = train_df,
  test = test_df,
  tidyModelVersion = FALSE,
  recipe = rec,
  folds = folds,
  evalMetric = "rmse"



Character. The variable that is the response for analysis.


Logical. Compute marginal effects for lm model?


The entire data frame. Used for tidyModelVersion = FALSE.


Data frame/tibble. The training data set.


Data frame/tibble. The testing data set.


Logical. Run a tidymodel version of linear regression? If yes, will tune hyperparameters and return a tidymodels regression model. If no, will fit an lm() object and return output based on that computation.


A recipes::recipe object.


A rsample::vfolds_cv object.


Character. The regression metric you want to evaluate the model's accuracy on (tidymodels only). Default is RMSE. Can choose from the following:

  • rmse

  • mae

  • rsq

  • mase

  • ccc

  • icc

  • huber_loss


A list. If tidyModelVersion = TRUE:

  • Training set predictions

  • Training set evaluation on RMSE and MAE

  • Testing set predictions

  • Testing set evaluation on RMSE and MAE

  • Tuned model object

If tidyModelVersion = FALSE:

  • lm() model object

  • broom() cleaned object of summary of lm() model

  • diagnostic plots


Note: Tidymodels version tunes the following parameters:

  • penalty: The total amount of regularization in the model. Also known as lambda.

  • mixture: The mixture amounts of different types of regularization (see below). If 1, amounts to LASSO regression. If 0, amounts to Ridge Regression. Also known as alpha.


utils::data(penguins, package = "modeldata")

#Define your response variable and formula object here
resp <- "bill_length_mm"
formula <- stats::as.formula(paste(resp, ".", sep="~"))

#Split data into training and testing sets
split <- trainTestSplit(penguins, responseVar = resp)

#Create recipe for feature engineering for dataset, varies based on data working with
rec <- recipe(formula, split$train) %>% prep()
train_df <- bake(rec, split$train)
test_df <- bake(rec, split$test)
folds <- cvFolds(train_df)

#Fit a linear regression model (commented out only due to long run time)
#linReg <- linearRegress(recipe = rec, response = resp, data = penguins, tidyModelVersion = FALSE,
#folds = folds, train = train_df, test = test_df, evalMetric = "rmse")

#Visualize training data and its predictions
#linReg$trainPred %>% select(.pred, !!resp)

#View how model metrics for RMSE, R-Squared, and MAE look for training data

#Visualize testing data and its predictions
#linReg$testPred %>% select(.pred, !!resp)

#View how model metrics for RMSE, R-Squared, and MAE look for testing data

#See the final model chosen by KNN based on optimizing for your chosen evaluation metric

#See how model fit looks based on another evaluation metric
#linReg$tune %>% tune::show_best("mae")