K-Nearest Neighbors Classifier (2020)
This code was created to classify a subset of the 20 newsgroups text dataset from sci-kit learn. Posts drawn from the ‘talk.politics.guns’, ‘talk.politics.mideast’, and ‘talk.politics.misc’ were converted to a bag of words representation, and classified using a kNN implementation.
Github repository link to my code
About the code
The format for launching the script is:
build_kNN.py training_data test_data k_val similarity_func sys_output
where training_data
is train.vectors.txt, test_data
is test.vectors.txt, k_val
is the number of nearest neighbors used to make a classification decision, similarity_func
is 1 for Euclidean distance and 2 for cosine similarity, and sys_output
is the classification results for the train and test data.
![]() |
---|
kNN classifier test accuracy by k values and similarity functions. |