Fits a K-Nearest Neighbors Classification Model.

  response = response,
  recipe = rec,
  folds = folds,
  train = train_df,
  test = test_df,
  gridNumber = 15,
  evalMetric = "bal_accuracy"



Character. The variable that is the response for analysis.


A recipe object.


A rsample::vfolds_cv object.


Data frame/tibble. The training data set.


Data frame/tibble. The testing data set.


Numeric. The size of the grid to tune on. Default is 15.


Character. The classification metric you want to evaluate the model's accuracy on. Default is bal_accuracy. List of metrics available to choose from:

  • bal_accuracy

  • mn_log_loss

  • roc_auc

  • mcc

  • kap

  • sens

  • spec

  • precision

  • recall


A list with the following outputs:

  • Training confusion matrix

  • Training model metric score

  • Testing confusion matrix

  • Testing model metric score

  • Final model chosen

  • Tuned model


Note: tunes the following parameters:

  • neighbors: The number of neighbors considered at each prediction.

  • weight_func: The type of kernel function that weights the distances between samples.

  • dist_power: The parameter used when calculating the Minkowski distance. This corresponds to the Manhattan distance with dist_power = 1 and the Euclidean distance with dist_power = 2.


utils::data(penguins, package = "modeldata")
#Define your response variable and formula object here
resp <- "sex"
formula <- stats::as.formula(paste(resp, ".", sep="~"))
#Split data into training and testing sets
split <- trainTestSplit(penguins, stratifyOnResponse = TRUE,
responseVar = resp)
#Create recipe for feature engineering for dataset, varies based on data working with
rec <- recipe(formula, data = split$train) %>% step_knnimpute(!!resp) %>%
step_dummy(all_nominal(), -all_outcomes()) %>%
step_medianimpute(all_predictors()) %>% step_normalize(all_predictors()) %>%
step_dummy(all_nominal(), -all_outcomes()) %>% step_nzv(all_predictors()) %>%
step_corr(all_numeric(), -all_outcomes(), threshold = .8) %>% prep()
train_df <- bake(rec, split$train)
test_df <- bake(rec, split$test)
folds <- cvFolds(train_df)

