Title : Evolutionary trends and positive selection sites in major SARS-CoV-2 variants
Abstract:
The SARS-CoV-2 virus was first detected in Wuhan China in 2019. The virus rapidly dissipated to various continents leading to a global pandemic by acquiring rapid mutations in different regions of the viral genome including the highly studied spike protein. The mutation rate of the SARS-CoV-2 genome has thus been estimated at 1 × 10-3 substitutions per base (30 nucleotides/genome) per year under neutral genetic drift conditions or 1 × 10-5 to 1 × 10-4 substitutions per base in each transmission event. The rapid mutation/evolution of the virus has resulted in the emergence of variants of interest (VOIs), variants under monitoring (VUMs), and variants of concern (VOCs).
This study was undertaken to determine whether positive selection sites and mutation rate changes across SARS-CoV-2 variants. Furthermore, we also wanted to determine whether positive selection sites and mutation rate are the same throughout different Spike protein domains which is a major determinant of virus transmission and pathogenesis as well as immune response.
Fourteen different SARS Cov-2 variants were identified for further analysis. A sampling strategy was developed to limit the number of sequences to be analyzed. We selected 2 sequences from each variant per week per state from the more than 16 million full length genomes in the GISAID EpiCoV database. Each variant was aligned using the MAFFT multiple alignment software. Mutation rates for different regions were calculated using Mega 7 software.
This study will help gain a better understanding of the frequency, distribution and nature of mutations in SARS-CoV2. Our findings could provide important insights into vaccine design and drug discovery along with predictions into future evolution of novel variants. A detailed analysis of our findings will be presented.