Decision Tree Classifier (2020)

This code was used to classify a subset of the 20 newsgroups text dataset from sci-kit learn. Posts drawn from the ‘talk.politics.guns’, ‘talk.politics.mideast’, and ‘talk.politics.misc’, converted to a bag of words representation, and classified using a decision tree implementation.

The script builds a decision tree from the training data, classifies the training and test data, and calculates the accuracy.

Github repository link to my code

About the code

The format for launching the script is:

build_dt.py training_data test_data max_depth min_gain model_file sys_output

where training_data is train.vectors.txt, test_data is test.vectors.txt, max_depth is the maximum depth of the tree, min_gain is the minimal information gain for each split, model_file is the filename for the output model, and sys_output is the classification results for the train and test data.

results_table.png
Decision tree results when min_gain=0.