trainTestSplit.Rd
Create a training and testing data set. Also returns a bootstrapped version of the training data set.
trainTestSplit(
data = df,
splitAmt = 0.8,
timeDependent = FALSE,
responseVar = "nameOfResponseVar",
stratifyOnResponse = FALSE,
numberOfBootstrapSamples = 25
)
The data set of interest.
The amount of data you want in the training set. Default is .8
Logical. Is your data time-dependent? If so, set TRUE.
Name of response variable in analysis.
Logical. Should the training and testing splits be stratified based on the response? If so, set TRUE.
Numeric. How many bootstrap samples do you want? Default is 25.
A list with four components: train is the training set, test is the testing set, boot is a bootstrapped data set, and split is an rsample object that helps split your original data set.
library(easytidymodels)
library(dplyr)
utils::data(penguins, package = "modeldata")
resp <- "sex"
split <- trainTestSplit(penguins, stratifyOnResponse = TRUE, responseVar = resp)
#Training data
split$train
#> # A tibble: 275 x 7
#> species island bill_length_mm bill_depth_mm flipper_length_mm body_mass_g
#> <fct> <fct> <dbl> <dbl> <int> <int>
#> 1 Adelie Torgersen 39.5 17.4 186 3800
#> 2 Adelie Torgersen 40.3 18 195 3250
#> 3 Adelie Torgersen NA NA NA NA
#> 4 Adelie Torgersen 36.7 19.3 193 3450
#> 5 Adelie Torgersen 38.9 17.8 181 3625
#> 6 Adelie Torgersen 42 20.2 190 4250
#> 7 Adelie Torgersen 41.1 17.6 182 3200
#> 8 Adelie Torgersen 36.6 17.8 185 3700
#> 9 Adelie Torgersen 38.7 19 195 3450
#> 10 Adelie Torgersen 34.4 18.4 184 3325
#> # ... with 265 more rows, and 1 more variable: sex <fct>
#Testing data
split$test
#> # A tibble: 69 x 7
#> species island bill_length_mm bill_depth_mm flipper_length_mm body_mass_g
#> <fct> <fct> <dbl> <dbl> <int> <int>
#> 1 Adelie Torgersen 39.1 18.7 181 3750
#> 2 Adelie Torgersen 37.8 17.3 180 3700
#> 3 Adelie Biscoe 38.8 17.2 180 3800
#> 4 Adelie Biscoe 40.5 17.9 187 3200
#> 5 Adelie Dream 39.5 17.8 188 3300
#> 6 Adelie Dream 39.2 21.1 196 4150
#> 7 Adelie Dream 38.8 20 190 3950
#> 8 Adelie Dream 36.5 18 182 3150
#> 9 Adelie Dream 44.1 19.7 196 4400
#> 10 Adelie Dream 39.6 18.8 190 4600
#> # ... with 59 more rows, and 1 more variable: sex <fct>
#Bootstrapped data
split$boot
#> # Bootstrap sampling
#> # A tibble: 25 x 2
#> splits id
#> <list> <chr>
#> 1 <split [275/100]> Bootstrap01
#> 2 <split [275/94]> Bootstrap02
#> 3 <split [275/105]> Bootstrap03
#> 4 <split [275/98]> Bootstrap04
#> 5 <split [275/108]> Bootstrap05
#> 6 <split [275/98]> Bootstrap06
#> 7 <split [275/108]> Bootstrap07
#> 8 <split [275/100]> Bootstrap08
#> 9 <split [275/103]> Bootstrap09
#> 10 <split [275/94]> Bootstrap10
#> # ... with 15 more rows
#Split object (helpful to call if you want to do model stacking)
split$split
#> <Analysis/Assess/Total>
#> <275/69/344>