A Study of Location-Based Mutation Patterns in the SARS-CoV-2 Genome

This past semester, I worked in a pair on a research project for a computational molecular biology and genomics course. We were interested in investigating the potential role location plays in the evolutionary landscape of the SARS-CoV-2 genome, especially in light of new waves of viral spread and new sequence data available in recent months. At a high level, our work involved cleaning and processing thousands of SARS-CoV-2 sequences, writing Python scripts for eliminating corrupted sequences, identifying segregating sites and single nucleotide polymorphisms, plotting allele frequencies at sites of interest over time, and creating other visualizations to understand potential location-mediated mutation patterns. From this study, we identified a triple mutant on the coronavirus nucleocapsid protein that may hold implications for understanding virus adaptation to human hosts.

  • The Github repository for this project can be found here.


read the paper abstract below:

Abstract

Severe acute respiratory syndrome coronavirus type 2 (SARS-CoV-2) has devastated populations globally, and continues to pose extreme challenges to not only public health, but to the scientific methods utilized to understand and analyze epidemiology. Because the spread of SARS-CoV-2 is rapid and ever-changing in its dynamics, many recent research efforts in the sphere of computational biology have been largely focused on painting a more comprehensive picture of virus-host interactions and viral evolution. While existing work has, with varying degrees of success, identified several mutations in the coronavirus genome that may be linked to transmission susceptibility and other important factors for understanding infection, little work has successfully characterized mutation evolution in the context of small-scale geographic location. Thus, this study builds on previous work by analyzing single nucleotide polymorphisms in location-tagged SARS-CoV-2 sequences via evaluations of allele frequency fluctuations over time and genotype-annotated phylogenetic trees. Through the present investigation and analysis, a triple mutant from GGG to AAC at loci 28881, 28882, and 28883 was identified on the nucleocapsid protein of SARS-CoV-2. Further phylogeographic analysis suggests that this triplet of mutations may confer a selective advantage, paving the way for further analysis about viral adaptation to human hosts.

Previous
Previous

Music Recommendation Chatbot Game

Next
Next

Senior Thesis (in progress)