Substance Use Stigma Detection System for Reddit Data (2022)
System that leverages contextual embeddings combined with affective, social, and behavioral features to classify instances of substance use stigma in Reddit posts.
System that leverages contextual embeddings combined with affective, social, and behavioral features to classify instances of substance use stigma in Reddit posts.
English and Arabic Twitter sarcasm detection systems. Group project for University of Washington 573: NLP Systems and Applications.
A GRU-based character-level language model trained on a corpus of the fiction of H.P. Lovecraft.
A Pytorch implementation of the Deep Averaging Network introduced in Iyyer et al (2015). Performs binary sentiment classification on the IMDB reviews dataset.
This code was created to classify a subset of the 20 newsgroups text dataset from sci-kit learn. Posts drawn from the ‘talk.politics.guns’, ‘talk.politics.mideast’, and ‘talk.politics.misc’ were converted to a bag of words representation, and classified using a ‘from scratch’ kNN implementation.
The script builds a decision tree ‘from scratch’ using the training data (a subset of the 20 newsgroups text dataset from sci-kit learn), classifies the training and test data, and calculates the accuracy.
This script reads an HMM file produced by the MALLET machine learning toolkit and uses an implementation of the Viterbi algorithm to find the most probable tag sequence for the text.
A ‘from scratch’ naive Bayes classifier implementation that classifies fragments of text according to language category.
A search trie implementation that locates DNA sequences in the human genome chromosome dataset produced by UCSC.