K-Nearest Neighbors Classification — knnClassif • easytidymodels

Fits a K-Nearest Neighbors Classification Model.

knnClassif(
  response = response,
  recipe = rec,
  folds = folds,
  train = train_df,
  test = test_df,
  gridNumber = 15,
  evalMetric = "bal_accuracy"
)

Arguments

response

Character. The variable that is the response for analysis.

recipe

A recipe object.

folds

A rsample::vfolds_cv object.

train

Data frame/tibble. The training data set.

test

Data frame/tibble. The testing data set.

gridNumber

Numeric. The size of the grid to tune on. Default is 15.

evalMetric

Character. The classification metric you want to evaluate the model's accuracy on. Default is bal_accuracy. List of metrics available to choose from:

bal_accuracy
mn_log_loss
roc_auc
mcc
kap
sens
spec
precision
recall

Value

A list with the following outputs:

Training confusion matrix
Training model metric score
Testing confusion matrix
Testing model metric score
Final model chosen
Tuned model

Details

Note: tunes the following parameters:

neighbors: The number of neighbors considered at each prediction.
weight_func: The type of kernel function that weights the distances between samples.
dist_power: The parameter used when calculating the Minkowski distance. This corresponds to the Manhattan distance with dist_power = 1 and the Euclidean distance with dist_power = 2.

Examples

library(easytidymodels)
library(dplyr)
library(recipes)
#> Warning: package 'recipes' was built under R version 4.1.2
#> 
#> Attaching package: 'recipes'
#> The following object is masked from 'package:stats':
#> 
#>     step
utils::data(penguins, package = "modeldata")
#Define your response variable and formula object here
resp <- "sex"
formula <- stats::as.formula(paste(resp, ".", sep="~"))
#Split data into training and testing sets
split <- trainTestSplit(penguins, stratifyOnResponse = TRUE,
responseVar = resp)
#Create recipe for feature engineering for dataset, varies based on data working with
rec <- recipe(formula, data = split$train) %>% step_knnimpute(!!resp) %>%
step_dummy(all_nominal(), -all_outcomes()) %>%
step_medianimpute(all_predictors()) %>% step_normalize(all_predictors()) %>%
step_dummy(all_nominal(), -all_outcomes()) %>% step_nzv(all_predictors()) %>%
step_corr(all_numeric(), -all_outcomes(), threshold = .8) %>% prep()
#> Warning: `step_medianimpute()` was deprecated in recipes 0.1.16.
#> Please use `step_impute_median()` instead.
#> This warning is displayed once every 8 hours.
#> Call `lifecycle::last_lifecycle_warnings()` to see where this warning was generated.
#> Warning: `step_knnimpute()` was deprecated in recipes 0.1.16.
#> Please use `step_impute_knn()` instead.
#> This warning is displayed once every 8 hours.
#> Call `lifecycle::last_lifecycle_warnings()` to see where this warning was generated.
train_df <- bake(rec, split$train)
test_df <- bake(rec, split$test)
folds <- cvFolds(train_df)

#knn <- svmClassif(recipe = rec, response = resp, folds = folds,
#train = train_df, test = test_df)

#Confusion Matrix
#knn$trainConfMat

#Plot of confusion matrix
#knn$trainConfMatPlot

#Test Confusion Matrix
#knn$testConfMat

#Test Confusion Matrix Plot
#knn$testConfMatPlot