The goal of easytidymodels is to make running analyses in R using the tidymodels framework both easier and more reproducible. This is a wrapper for the tidymodels packages so that, after your data pre-processing steps, it all runs in one line of code and automatically tunes all the hyperparameters that are offered.

If you are not familiar with tidymodels, I recommend learning more here or here.

For more details on how the functions work in this package, I recommend checking out the reference page, referencing the vignettes on this site, or calling help on the function of interest in R to learn more. Here I will just give a brief overview of the workflow of this package.

Installation

You can install easytidymodels like this:

# install.packages("devtools")
devtools::install_github("amanda-park/easytidymodels")

Preparing Data for Analysis

There are three main functions to prepare your data for analysis:

  • trainTestSplit lets you split data into training and testing sets, with the ability to stratify on a variable and split based on a point in time.
  • cvFolds splits your data into cross-validation folds to allow the model’s hyperparameters to be tuned.
  • createRecipe does some basic data preprocessing on your dataset. NOTE: I recommend calling recipe() and creating a recipe object specific to your dataset’s needs, as every dataset will require its own preprocessing prior to analysis.

Classification Functions

The binary classification machine learning models available are as follows:

  • XGBoost (function xgBinaryClassif)
  • Logistic Regression (function logRegBinary)
  • K-Nearest Neighbors (function knnClassif)
  • Support Vector Machine (function svmClassif)

The multiclass classifications available are as follows:

  • XGBoost (function xgMultiClassif)
  • Multinomial Regression (function logRegMulti)
  • K-Nearest Neighbors (function knnClassif)
  • Support Vector Machine (function svmClassif)

Each of these models will tune the appropriate hyperparameters in the mode. However, these models allow for optimizing hyperparameters based on a specific evaluation metric. The list of metrics are as follows:

  • Balanced Accuracy (Average of Sensitivity and Specificity, call “bal_accuracy”)
  • Mean Log Loss (Call “mn_log_loss”)
  • ROC AUC (Area Under the Receiver Operating Curve, call “roc_auc”)
  • MCC (Matthew’s Correlation Coefficient, call “mcc”)
  • Kappa (Normalized Accuracy, call “kap”)
  • Sensitivity (Call “sens”)
  • Specificity (Call “spec”)
  • Precision (Call “precision”)
  • Recall (Call “recall”)

Save the model output to an object; the model will return the following in a list (can be accessed using $):

  • Confusion matrix on training data
  • Accuracy evaluation on training data
  • Confusion matrix on testing data
  • Accuracy evaluation on testing data
  • Description of final model chosen
  • A tuned version of the model (in the case you want to try model stacking or seeing the optimal model fit based on a different evaluation metric)

Regression Functions

The regression functions available are as follows:

  • Random Forest (function rfRegress)
  • XGBoost (function xgRegress)
  • Linear Regression (function linearRegress)
  • MARS (function marsRegress)
  • K-Nearest Neighbor Regression (function knnRegress)
  • Support Vector Machine Regression (function svmRegress)

These models allow for optimizing hyperparameters based on a specific evaluation metric as well. The list of metrics are as follows:

  • RMSE (Root Mean Squared Error, call “rmse”)
  • MAE (Mean Absolute Error, call “mae”)
  • RSQ (R-Squared, call “rsq”)
  • MASE (Mean Absolute Scaled Error, call “mase”)
  • CCC (Concordance Correlation Coefficient, call “ccc”)
  • IIC (Index of Ideality of Correlation, call “iic”)
  • HUBER_LOSS (Huber loss, call “huber_loss”)

Save the model output to an object; the model will return the following in a list (can be accessed using $):

  • Predictions on training data
  • RMSE and MAE evaluation on training data
  • Predictions on testing data
  • RMSE and MAE evaluation on testing data
  • Description of final model chosen
  • A tuned version of the model (in the case you want to try model stacking or seeing the optimal model fit based on a different evaluation metric)