Hands-on: Accelerating k-NN with Rcpp
Overview
In this 30 min session you will:
- Download and inspect the C++ implementation for k‑NN.
- Compile and load the Rcpp code.
- Run benchmarks comparing pure-R vs Rcpp predictions.
- Analyze how sample size and dimensionality affect performance.
Setup
Download the C++ and helper scripts into your working directory:
curl -O https://raw.githubusercontent.com/mmadoliat/WSoRT/refs/heads/main/src/knn_pred.cpp
curl -O https://raw.githubusercontent.com/mmadoliat/WSoRT/refs/heads/main/runthis.ROpen the files in your editor to review the code:
- knn_pred.cpp contains the
knn_pred_cpp()function (Rcpp). - runthis.R sources both R and C++ implementations and runs
microbenchmark().
1. Compile the C++ code
In an R console or RStudio, run:
If successful, you should see knn_pred_cpp available:
2. Inspect the runner script
Open runthis.R, which contains:
source("R/knn_s3_formula.R") # loads knn_s3 and predict()
Rcpp::sourceCpp("src/knn_pred.cpp") # loads Rcpp function
# Simulate data and benchmark
data <- simulate_knn_data(n = 1000, p = 5, m = 200, k = 10)
mb <- microbenchmark::microbenchmark(
Rcpp = knn_pred_cpp(data$train_x, data$train_y, data$test_x, data$k),
R = knn_pred_R(data$train_x, data$train_y, data$test_x, data$k),
times = 20
)
print(mb)Try running this script:
3. Vary parameters
Modify runthis.R or re-run interactively to examine different settings:
- Increase n (training size) from 1000 to 5000 or 10000.
- Increase p (dimensions) from 5 to 20 or 50.
- Observe how the Rcpp version scales relative to pure-R.
Focus on how the Rcpp implementation stays much faster as complexity grows.
4. Discussion
- Where does Rcpp help most?
- Are there settings where pure R is sufficient?
- How might you further optimize (e.g., using STL partial_sort)?
Next steps
- Try integrating this into your
knn_s3class and Shiny app. - Explore parallel Rcpp implementations (OpenMP).
- Consider other statistical routines with nested loops.