linearRegress.Rd
Runs a linear regression model, and either:
fits a basic lm() model and shows diagnostics and model fit
Uses the tidymodels approach: evaluates it on training and testing set, and tunes hyperparameters.
linearRegress(
response = response,
computeMarginalEffects = FALSE,
data = df,
train = train_df,
test = test_df,
tidyModelVersion = FALSE,
recipe = rec,
folds = folds,
evalMetric = "rmse"
)
Character. The variable that is the response for analysis.
Logical. Compute marginal effects for lm model?
The entire data frame. Used for tidyModelVersion = FALSE.
Data frame/tibble. The training data set.
Data frame/tibble. The testing data set.
Logical. Run a tidymodel version of linear regression? If yes, will tune hyperparameters and return a tidymodels regression model. If no, will fit an lm() object and return output based on that computation.
A recipes::recipe object.
A rsample::vfolds_cv object.
Character. The regression metric you want to evaluate the model's accuracy on (tidymodels only). Default is RMSE. Can choose from the following:
rmse
mae
rsq
mase
ccc
icc
huber_loss
A list. If tidyModelVersion = TRUE:
Training set predictions
Training set evaluation on RMSE and MAE
Testing set predictions
Testing set evaluation on RMSE and MAE
Tuned model object
If tidyModelVersion = FALSE:
lm() model object
broom() cleaned object of summary of lm() model
diagnostic plots
Note: Tidymodels version tunes the following parameters:
penalty: The total amount of regularization in the model. Also known as lambda.
mixture: The mixture amounts of different types of regularization (see below). If 1, amounts to LASSO regression. If 0, amounts to Ridge Regression. Also known as alpha.
library(easytidymodels)
library(dplyr)
library(recipes)
utils::data(penguins, package = "modeldata")
#Define your response variable and formula object here
resp <- "bill_length_mm"
formula <- stats::as.formula(paste(resp, ".", sep="~"))
#Split data into training and testing sets
split <- trainTestSplit(penguins, responseVar = resp)
#Create recipe for feature engineering for dataset, varies based on data working with
rec <- recipe(formula, split$train) %>% prep()
train_df <- bake(rec, split$train)
test_df <- bake(rec, split$test)
folds <- cvFolds(train_df)
#Fit a linear regression model (commented out only due to long run time)
#linReg <- linearRegress(recipe = rec, response = resp, data = penguins, tidyModelVersion = FALSE,
#folds = folds, train = train_df, test = test_df, evalMetric = "rmse")
#Visualize training data and its predictions
#linReg$trainPred %>% select(.pred, !!resp)
#View how model metrics for RMSE, R-Squared, and MAE look for training data
#linReg$trainScore
#Visualize testing data and its predictions
#linReg$testPred %>% select(.pred, !!resp)
#View how model metrics for RMSE, R-Squared, and MAE look for testing data
#linReg$testScore
#See the final model chosen by KNN based on optimizing for your chosen evaluation metric
#linReg$final
#See how model fit looks based on another evaluation metric
#linReg$tune %>% tune::show_best("mae")