Lindsey R. Pierce and Carol A. Stepien
Molecular Phylogenetics and Evolution (2012) volume 63, pages 327-341
Abstract: Viral hemorrhagic septicemia virus (VHSv) causes one of the most important finfish diseases, affecting over 70 marine and freshwater species. Phylogenetic relationships and evolutionary radiation of its four strains and many substrains are evaluated here in light of a quasispecies model, including the novel and especially virulent new substrain (IVb) that first appeared in the North American Laurentian Great Lakes in 2003. We analyze all available RNA sequences from the glycoprotein (G), nucleoprotein (N), and non-virion (Nv) genes, with Maximum Likelihood and Bayesian approaches. Results show that the G-gene evolves at an estimated µ=2.58x10-4 nucleotide substitutions per site per year, the N-gene at µ=4.26x10-4, and Nv fastest at µ=1.25x10-3. Phylogenetic trees largely are congruent, distinguishing strains I-IV as reciprocally monophyletic with high support. VHSv appears to have originated from a marine ancestor in the North Atlantic Ocean, diverging ~267-697 years ago into two primary clades: strain IV in North America (the Northwestern Atlantic Ocean), and strains I, II, and III in the Northeastern Atlantic region (Europe). Strain IV is differentiated into three monophyletic substrains, with IVa infecting Northeastern Pacific salmonids and many marine fishes (with 44 unique G gene haplotypes), IVb endemic to the freshwater Great Lakes (11 haplotypes), and a newly-designated IVc in marine/estuarine North Atlantic waters (five haplotypes). Our results depict an evolutionary history of relatively rapid population diversifications in star-like patterns, following a quasispecies model. This study provides a baseline for future tracking of VHSv spread and interpreting its evolutionary diversification pathways.
Figure 1. Maps showing the distributions of VHSv strains (I–IV, colored), substrains (symbols, Ia–e and IVa–c), and sequenced isolates from: (A) Europe and Asia, and (B) North America. The first isolate and date for each strain are geographically referenced.
Figure 2. (A) Distribution and (B) G-gene haplotype network of VHSv-IVb variants in the Laurentian Great Lakes and inland waterbodies (designated by watershed) in TCS 1.21 (Clement et al., 2000). Raw data are from Thompson et al. (2011). Squares in haplotype network are sized according to the number of observed isolates and are colored according to the map. Lines denote a single mutational step between haplotypes; small, unlabeled circles represent hypothesized unsampled haplotypes. Parentheses below the isolate name contain the number of documented occurrences and outbreak year(s).
Fig. 3. Number of transitions (open symbols) and transversions (closed symbols) versus pairwise genetic distances (calculated in MEGA) for common isolates sequenced for the three genes: (A) G transitions (R2=0.99, F=34,101, df=1, 76, p<0.001) and transversions (R2= 0.99, F=5616, df=1, 76, p<0.001), (B) N transitions (R2=0.97, F=2190, df=1, 76, p<0.001) and transversions (R2=0.97, F=2817, df=1, 76, p<0.001), and (C) Nv transitions (R2=0.99, F=10,044, df=1, 76, p<0.001) and transversions (R2=0.97, F=2408, df=1, 76, p<0.001). The slopes of all lines significantly differ (F=8130, df=3, 152, p<2.2x10-16). Lack of overlap of transitions and transversions indicates low saturation.
Figure 4. Regression of mean numbers of non-synonymous (dN) versus synonymous substitutions (dS) per nucleotide site (calculated with the Jukes-Cantor (1969) method in SNAP) for the common isolates sequenced for the three genes. (Regression equations: G – R2=0.88, F=531, df=1, 76, p<0.001; N – R2=0.92, F=914, df=1, 76, p<0.001; Nv – R2=0.98, F=3646, df=1, 76, p<0.001). The ratio of dN/dS significantly varies among the three genes (F=2818, df=5, 228, p<2.2x10-16).
Figure 5. Maximum Likelihood phylogenetic consensus trees of VHSv sequences for: (A) G-gene, (B) N-gene – with inset showing substrain IV relationships from partial sequences, (C) Nv-gene, (D) all three genes combined, and (E) genes G and N combined. Trees are congruent with those from our 50% majority rule Bayesian analyses. Numbers in black triangles=number of unique sequences per subgroup. Values above nodes=% support from 1,000 bootstrap pseudo-replications/Bayesian posterior probability. Values in parentheses and italics=estimated divergence time (years). Snakehead rhabdovirus (AF147498) was used as the outgroup.