NLP Portfolio

A portfolio containing my implementations of various statistical NLP models.

Substance Use Stigma Detection System for Reddit Data (2022)

System that leverages contextual embeddings combined with affective, social, and behavioral features to classify instances of substance use stigma in Reddit posts.

English and Arabic Sarcasm Detection in Tweets (2022)

English and Arabic Twitter sarcasm detection systems. Group project for University of Washington 573: NLP Systems and Applications.

H.P. Lovecraft RNN Text Generator (2021)

A GRU-based character-level language model trained on a corpus of the fiction of H.P. Lovecraft.

Deep Averaging Network (2021)

A Pytorch implementation of the Deep Averaging Network introduced in Iyyer et al (2015). Performs binary sentiment classification on the IMDB reviews dataset.

K-Nearest Neighbors Classifier (2020)

This code was created to classify a subset of the 20 newsgroups text dataset from sci-kit learn. Posts drawn from the ‘talk.politics.guns’, ‘talk.politics.mideast’, and ‘talk.politics.misc’ were converted to a bag of words representation, and classified using a ‘from scratch’ kNN implementation.

Decision Tree Classifier (2020)

The script builds a decision tree ‘from scratch’ using the training data (a subset of the 20 newsgroups text dataset from sci-kit learn), classifies the training and test data, and calculates the accuracy.

Viterbi Implementation for HMM POS Tagging (2020)

This script reads an HMM file produced by the MALLET machine learning toolkit and uses an implementation of the Viterbi algorithm to find the most probable tag sequence for the text.

Naive Bayes Language Classifier (2020)

A ‘from scratch’ naive Bayes classifier implementation that classifies fragments of text according to language category.

DNA Sequence Search Trie (2020)

A search trie implementation that locates DNA sequences in the human genome chromosome dataset produced by UCSC.

David Roesler

NLP Portfolio