Research Summary

My research interests include topological data analysis with applications to dynamical systems and biological phenomena. I am also interested in the intersection of topological statistics and machine learning. In particular, I am interested in answering questions related to the robustness of topological summaries in noisy systems, the predictive power of topological statistics in modeling, and the effectiveness of topological quantities as input for machine learning tasks. At present, I work in a range of applications including zebrafish stripe development and evolutionary biology. Below I give a very brief description of my research projects. Feel free to contact me for more information.

My research is currently being funded by the National Science Foundation Graduate Research Fellowship Program. Any opinions, findings, and conclusions or recommendations expressed on this website are my own and do not necessarily reflect the views of the National Science Foundation.

Current Research Projects

A Topological Analysis of Model Sensitivity and Pattern Formation
My dissertation research is advised by Björn Sandstede and co-advised by Andrew Blumberg. We are interested in applying topological data analysis to study dynamical systems and spatio-temporal pattern formation. Currently I am working on a problem related to classifying model outputs from zebrafish stripe development models. We use topological summaries of the model outputs as input to a classification algorithm. This gives us a way to automatically classify model outputs under various parameter regimes and noise tests. We hope to use this work to better understand these models and the underlying biological mechanisms. Moreover, we use topological methods to study spiral wave patterns and reaction-diffusion models.

Hierarchical Clustering of Gene-Level Association Statistics
I work with Samuel Pattillo Smith, Bjorn Sandstede, and Sohini Ramachandran to develop new methods for addressing genomics problems from a machine learning perspective. We recently developed Ward clustering to identify Internal Node branch length outliers using Gene Scores (WINGS), a clustering algorithm for characterizing shared and divergent genetic architecture among multiple phenotypes. WINGS identifies clusters of phenotypes that share a core set of genes enriched for mutations.


The above figure shows WINGS applied to PEGASUS gene-level association statistics for 26 phenotypes from the UK Biobank. As shown above, WINGS identifies and ranks clusters of phenotypes with shared genetic architecture. The ranks are determined by the clusters' branch lengths in the corresponding cluster dendrogram (not shown). All clusters on or above the dashed red line are deemed signficant. The significant threshold is determined via WINGS' multi-step branch length thresholding algorithm. Code for WINGS is freely available on Github. For more information, check out our paper.

Topological Estimation of Recombination Rates
I collaborate with Devon Humphreys and Michael Miyagi under the supervision of Andrew Blumberg on this project. Briefly, we extract topological summary statistics from genomic data and then use techniques from regression analysis to infer hotspots of recombination.

We have proposed a novel, efficient estimator for recombination rates based on topological summaries, TREE. Compared to previous TDA methods, TREE more closely approximates the results of commonly used model-based methods and we provide theoretical justifications for the choice of topological summaries.

See below for TREE's predictive performance on simulated data (left) and on an empirical dataset (Drosophila dataset with 22 samples) in comparison to LDhelmet's predictions (right).


For more information, check out our paper in GENETICS. The code for TREE is publicly available on Github.

Comparing songs without Listening
In the Summer@ICERM 2017 program I began working with Dr. Katherine Kinnaird and undergraduates Erin Bugbee, Claire Savard, and Jonathan Weisskoff on the cover song task. The goal for this project is to develop a flexible and computationally efficient method for completing the cover song task. We use methods inspired from topological data analysis to correctly match songs which are remakes of the same original piece.

In this work we propose start-end (SE) diagrams and start(normalized)-length (SNL) diagrams, two novel structure-based representations for sequential music data. Inspired by TDA, these diagrams are equipped with efficiently computable and stable metrics which are then used to address the cover song task. SE and SNL diagrams stem from Aligned Hierarchies (introduced by Katherine M. Kinnaird in this ISMIR paper) but they overcome many of the limitations of Aligned Hierarchies while addressing the cover song with higher accuracy. See below for sample SE and SNL diagrams produced from the Aligned Hierarchies representation of a song.



This work in collaboration with Katherine Kinnarid, Erin Bugbee, and Claire Savard. Congratulations to Claire and Erin for winning the MAA "Outstanding Poster Award" at the 2018 Joint Mathematics Meeting where they presented our work! Our paper, "SE and SnL diagrams: Flexible data structures for MIR" was accepted for publication in the Proceedings of the 19th ISMIR conference.

Research Group

Professor Sandstede's research group participates in weekly meetings each semester where we discuss topics in dynamical systems and related fields. Below is an overview of our meetings along with some of the presentations I have given.

Semester Main Theme My Subtopic Presentation
Fall 2019 Probability and Statistics TBA TBA
Spring 2019 Data-driven modelling and analysis Discovering equations from data Here.
Fall 2018 Agent-Based Modeling Calcium Dynamics Group Work
Spring 2018 Probability and Statistics Classification Algorithms Lecture and Python Demo
Fall 2017 Dynamics and Statistics Parallel Computing in Matlab Interactive demos
Spring 2017 Data Science Methods of Machine learning Here.
Fall 2016 Vegetation Patterns TDA and Diffusion Maps Here.

QSIDE

I am a member of the Institute for the Quantitative Study of Inclusion, Diversity, and Equity ( QSIDE). We use quantitative research to increase inclusion, diversity, and equity. Check out our website for more information.

Past Research Projects

A Topological Analysis of Targeted In-111 Uptake in SPECT Images of Murine Tumors
I worked under the supervision of David Damiano for my undergraduate thesis at The College of the Holy Cross. We developed a novel topological method of analyzing Targeted In-111 Uptake in SPECT Images of Mouse Tumors. The motivation for this method is illustrated in the videos below. The videos below show the super levelsets of images of murine tumors as we descend through a filtration of tumor uptake values. The video on the left corresponds to the mouse tumor at hour 24 in the study, and the video on the right is the same tumor at hour 72. Our method captures the topological information of these filtrations of the tumors across a time series of images and surpasses standard techniques in their ability to capture tumoar heterogenity. You can find our paper here.



Assimilating Eulerian and Lagrangian data in traffic-flow models
During the summer of 2014 I participated in a research project on data assimilation for traffic flow through a Research Training Grant on “Integrating Dynamics and Stochastics.” I worked under the supervision of Björn Sandstede and collaborated with Courtney Cochrane, Joey DeGuire, Bridget Fan, Emma Holmes, Patrick Murphy, and Jenna Palmer. Our graduate mentors were Paul Carter, Laura Slivinsky, and Chao Xia. You can find our paper here.