Create a training and testing data set. Also returns a bootstrapped version of the training data set.

trainTestSplit(
  data = df,
  splitAmt = 0.8,
  timeDependent = FALSE,
  responseVar = "nameOfResponseVar",
  stratifyOnResponse = FALSE,
  numberOfBootstrapSamples = 25
)

Arguments

data

The data set of interest.

splitAmt

The amount of data you want in the training set. Default is .8

timeDependent

Logical. Is your data time-dependent? If so, set TRUE.

responseVar

Name of response variable in analysis.

stratifyOnResponse

Logical. Should the training and testing splits be stratified based on the response? If so, set TRUE.

numberOfBootstrapSamples

Numeric. How many bootstrap samples do you want? Default is 25.

Value

A list with four components: train is the training set, test is the testing set, boot is a bootstrapped data set, and split is an rsample object that helps split your original data set.

Examples

library(easytidymodels)
library(dplyr)
utils::data(penguins, package = "modeldata")
resp <- "sex"
split <- trainTestSplit(penguins, stratifyOnResponse = TRUE, responseVar = resp)
#Training data
split$train
#> # A tibble: 275 x 7
#>    species island    bill_length_mm bill_depth_mm flipper_length_mm body_mass_g
#>    <fct>   <fct>              <dbl>         <dbl>             <int>       <int>
#>  1 Adelie  Torgersen           39.5          17.4               186        3800
#>  2 Adelie  Torgersen           40.3          18                 195        3250
#>  3 Adelie  Torgersen           NA            NA                  NA          NA
#>  4 Adelie  Torgersen           36.7          19.3               193        3450
#>  5 Adelie  Torgersen           38.9          17.8               181        3625
#>  6 Adelie  Torgersen           42            20.2               190        4250
#>  7 Adelie  Torgersen           41.1          17.6               182        3200
#>  8 Adelie  Torgersen           36.6          17.8               185        3700
#>  9 Adelie  Torgersen           38.7          19                 195        3450
#> 10 Adelie  Torgersen           34.4          18.4               184        3325
#> # ... with 265 more rows, and 1 more variable: sex <fct>

#Testing data
split$test
#> # A tibble: 69 x 7
#>    species island    bill_length_mm bill_depth_mm flipper_length_mm body_mass_g
#>    <fct>   <fct>              <dbl>         <dbl>             <int>       <int>
#>  1 Adelie  Torgersen           39.1          18.7               181        3750
#>  2 Adelie  Torgersen           37.8          17.3               180        3700
#>  3 Adelie  Biscoe              38.8          17.2               180        3800
#>  4 Adelie  Biscoe              40.5          17.9               187        3200
#>  5 Adelie  Dream               39.5          17.8               188        3300
#>  6 Adelie  Dream               39.2          21.1               196        4150
#>  7 Adelie  Dream               38.8          20                 190        3950
#>  8 Adelie  Dream               36.5          18                 182        3150
#>  9 Adelie  Dream               44.1          19.7               196        4400
#> 10 Adelie  Dream               39.6          18.8               190        4600
#> # ... with 59 more rows, and 1 more variable: sex <fct>

#Bootstrapped data
split$boot
#> # Bootstrap sampling 
#> # A tibble: 25 x 2
#>    splits            id         
#>    <list>            <chr>      
#>  1 <split [275/100]> Bootstrap01
#>  2 <split [275/94]>  Bootstrap02
#>  3 <split [275/105]> Bootstrap03
#>  4 <split [275/98]>  Bootstrap04
#>  5 <split [275/108]> Bootstrap05
#>  6 <split [275/98]>  Bootstrap06
#>  7 <split [275/108]> Bootstrap07
#>  8 <split [275/100]> Bootstrap08
#>  9 <split [275/103]> Bootstrap09
#> 10 <split [275/94]>  Bootstrap10
#> # ... with 15 more rows

#Split object (helpful to call if you want to do model stacking)
split$split
#> <Analysis/Assess/Total>
#> <275/69/344>