Substance Use Stigma Detection System for Reddit Data (2022)
System that I developed for NIH-funded project at the University of Washington School of Medicine Department of Biomedical Infomatics.
Abstract
Stigma surrounding substance use can result in severe negative consequences for physical and mental health. To develop effective interventions, identifying situations in which stigma occurs and characterizing its impact are critical. As part of a project to identify facilitators of substance use stigma reduction and to inform the development of interventions for substance use disorder, this study leverages social media data to identify content with a high probability of containing stigma. We create an annotated corpus of 2,214 Reddit posts from subreddits relating to substance use. We train a set of binary classifiers, in which each classifier detects one of three stigma types: Internalized Stigma, Anticipated Stigma, and Enacted Stigma. By combining RoBERTa contextual embeddings and affective, social, and behavioral features, we produce systems that identify instances of substance use stigma for all three stigma types and outperform RoBERTa-only baselines by up to 6.45 macro F1.
Architecture of the proposed hybrid model. |