NLP Toolbox

Since January 2019, I have worked on a suite of natural language processing (NLP) tools with a professor from my undergraduate university. We began with a custom Python implementation of Stanford CoreNLP and a handful of corpus analysis tools. The original goal was to make NLP tools available to students and to conduct meaningful research, especially related to narrative, location, subject-verb-object (SVO), and shape-of-story analysis. Since then, we have added over 150 tools. Our software builds on Stanford CoreNLP, Yago, DBpedia, WordNet, Gensim, and other modern NLP packages. We have also built a GUI and developed tools to visually tag corpora and display results. Our software interfaces with Excel, Gephi, Google Earth, ArcGIS, and other software packages, and it can export results in several web and cross-platform-friendly formats. I and a handful of other students continue to further develop the software and address student requests. Additionally, I used many of these tools in my undergraduate thesis research on mining adverse drug side-effects from online forums.

I am also in the process of writing a paper with the professor that uses this software to perform deep analysis on book reviews. Specifically, we are looking at the ways authors of different races are written about in New York Times reviews. In 2011, an article was published where the author performed similar analysis manually, and we are showing both that we can do all of the analysis automatically and that we can analyze more text and perform more in-depth analysis using the tools we have developed. Our software is also being used by others to perform research on race, mental health, and several other topics.

Through this project, I have learned about and used several modern NLP tools and improved my software development skills. In addition, I have learned about collaboration, debugging, and methods of code distribution that make the software accessible to users without a computer science background.