As of 14 February 2020, 64,473 such cases have been confirmed, with 1,384 deaths attributed to the virus. These official case numbers are likely an underestimate because of limited reporting of mild and asymptomatic cases, and the virus is clearly capable of efficient human-to-human transmission. Based on the possibility of spread to countries with weaker healthcare systems, the World Health Organization has declared the COVID-19 outbreak a Public Health Emergency of International Concern (PHEIC). There are currently neither vaccines nor specific treatments for this disease.
SARS-CoV-2 is the seventh member of the Coronaviridae known to infect humans. Three of these viruses, SARS CoV-1, MERS, and SARS-CoV-2, can cause severe disease; four, HKU1, NL63, OC43 and 229E, are associated with mild respiratory symptoms. Herein, we review what can be deduced about the origin and early evolution of SARS-CoV-2 from the comparative analysis of available genome sequence data. In particular, we offer a perspective on the notable features in the SARS-CoV-2 genome and discuss scenarios by which these features could have arisen. Importantly, this analysis provides evidence that SARS-CoV-2 is not a laboratory construct nor a purposefully manipulated virus.
The genomic comparison of both alpha- and betacoronaviruses (family Coronaviridae ) described below identifies two notable features of the SARS-CoV-2 genome: (i) based on structural modelling and early biochemical experiments, SARS-CoV-2 appears to be optimized for binding to the human ACE2 receptor; (ii) the highly variable spike (S) protein of SARS-CoV-2 has a polybasic (furin) cleavage site at the S1 and S2 boundary via the insertion of twelve nucleotides. Additionally, this event led to the acquisition of three predicted O-linked glycans around the polybasic cleavage site.
Mutations in the receptor binding domain of SARS-CoV-2
The receptor binding domain (RBD) in the spike protein of SARS-CoV and SARS-related coronaviruses is the most variable part of the virus genome. Six residues in the RBD appear to be critical for binding to the human ACE2 receptor and determining host range1. Using coordinates based on the Urbani strain of SARS-CoV, they are Y442, L472, N479, D480, T487, and Y4911. The corresponding residues in SARS-CoV-2 are L455, F486, Q493, S494, N501, and Y505. Five of these six residues are mutated in SARS-CoV-2 compared to its most closely related virus, RaTG13 sampled from a Rhinolophus affinis bat, to which it is ~96% identical2 (Figure 1a). Based on modeling1 and biochemical experiments3,4, SARS-CoV-2 seems to have an RBD that may bind with high affinity to ACE2 from human, non-human primate, ferret, pig, and cat, as well as other species with high receptor homology1. In contrast, SARS-CoV-2 may bind less efficiently to ACE2 in other species associated with SARS-like viruses, including rodents and civets1.
The phenylalanine (F) at residue 486 in the SARS-CoV-2 S protein corresponds to L472 in the SARS-CoV Urbani strain. Notably, in SARS-CoV cell culture experiments the L472 mutates to phenylalanine (L472F)5, which is predicted to be optimal for binding of the SARS-CoV RBD to the human ACE2 receptor6. However, a phenylalanine in this position is also present in several SARS-like CoVs from bats (Figure 1a). While these analyses suggest that SARS-CoV-2 may be capable of binding the human ACE2 receptor with high affinity, the interaction is not predicted to be optimal1. Additionally, several of the key residues in the RBD of SARS-CoV-2 are different to those previously described as optimal for human ACE2 receptor binding6. In contrast to these computational predictions, recent binding studies indicate that SARS-CoV-2 binds with high affinity to human ACE27. Thus the SARS-CoV-2 spike appears to be the result of selection on human or human-like ACE2 permitting another optimal binding solution to arise. This is strong evidence that SARS-CoV-2 is not the product of genetic engineering.
Polybasic cleavage site and O-linked glycans
The second notable feature of SARS-CoV-2 is a predicted polybasic cleavage site (RRAR) in the spike protein at the junction of S1 and S2, the two subunits of the spike protein (Figure 1b)8,9. In addition to two basic arginines and an alanine at the cleavage site, a leading proline is also inserted; thus, the fully inserted sequence is PRRA (Figure 1b). The strong turn created by the proline insertion is predicted to result in the addition of O-linked glycans to S673, T678, and S686 that flank the polybasic cleavage site. A polybasic cleavage site has not previously been observed in related lineage B betacoronaviruses and is a unique feature of SARS-CoV-2. Some human betacoronaviruses, including HCoV-HKU1 (lineage A), have polybasic cleavage sites, as well as predicted O-linked glycans near the S1/S2 cleavage site.
While the functional consequence of the polybasic cleavage site in SARS-CoV-2 is unknown, experiments with SARS-CoV have shown that engineering such a site at the S1/S2 junction enhances cell–cell fusion but does not affect virus entry10. Polybasic cleavage sites allow effective cleavage by furin and other proteases, and can be acquired at the junction of the two subunits of the haemagglutinin (HA) protein of avian influenza viruses in conditions that select for rapid virus replication and transmission (e.g. highly dense chicken populations). HA serves a similar function in cell-cell fusion and viral entry as the coronavirus S protein. Acquisition of a polybasic cleavage site in HA, by either insertion or recombination, converts low pathogenicity avian influenza viruses into highly pathogenic forms11-13. The acquisition of polybasic cleavage sites by the influenza virus HA has also been observed after repeated forced passage in cell culture or through animals14,15. Similarly, an avirulent isolate of Newcastle Disease virus became highly pathogenic during serial passage in chickens by incremental acquisition of a polybasic cleavage site at the junction of its fusion protein subunits16. The potential function of the three predicted O-linked glycans is less clear, but they could create a “mucin-like domain” that would shield potential epitopes or key residues on the SARS-CoV-2 spike protein. Biochemical analyses or structural studies are required to determine whether or not the predicted O-linked glycan sites are utilized.
Theories of SARS-CoV-2 origins
It is improbable that SARS-CoV-2 emerged through laboratory manipulation of an existing SARS-related coronavirus. As noted above, the RBD of SARS-CoV-2 is optimized for human ACE2 receptor binding with an efficient binding solution different to that which would have been predicted. Further, if genetic manipulation had been performed, one would expect that one of the several reverse genetic systems available for betacoronaviruses would have been used. However, this is not the case as the genetic data shows that SARS-CoV-2 is not derived from any previously used virus backbone17. Instead, we propose two scenarios that can plausibly explain the origin of SARS-CoV-2: (i) natural selection in a non-human animal host prior to zoonotic transfer, and (ii) natural selection in humans following zoonotic transfer. We also discuss whether selection during passage in culture could have given rise to the same observed features.
Selection in an animal host. As many of the early cases of COVID-19 were linked to the Huanan seafood and wildlife market in Wuhan, it is possible that an animal source was present at this location. Given the similarity of SARS-CoV-2 to bat SARS-like CoVs, particularly RaTG13, it is plausible that bats serve as reservoir hosts for SARS-CoV-2. It is important, however, to note that previous outbreaks of betacoronaviruses in humans involved direct exposure to animals other than bats, including civets (SARS) and camels (MERS), that carry viruses that are genetically very similar to SARS-CoV-1 or MERS-CoV, respectively. By analogy, viruses closely related to SARS-Cov-2 may be circulating in one or more animal species. Initial analyses indicate that Malayan pangolins ( Manis javanica ) illegally imported into Guangdong province contain a CoV that is similar to SARS-CoV-218,19. Although the bat virus RaTG13 remains the closest relative to SARS-CoV-2 across the whole genome, the Malayan pangolin CoV is identical to SARS-CoV-2 at all six key RBD residues (Figure 1). However, no pangolin CoV has yet been identified that is sufficiently similar to SARS-CoV-2 across its entire genome to support direct human infection. In addition, the pangolin CoV does not carry a polybasic cleavage site insertion. For a precursor virus to acquire the polybasic cleavage site and mutations in the spike protein suitable for human ACE2 receptor binding, an animal host would likely have to have a high population density – to allow natural selection to proceed efficiently – and an ACE2 gene that is similar to the human orthologue. Further characterization of CoVs in pangolins and other animals that may harbour SARS-CoV-like viruses should be a public health priority.
Cryptic adaptation to humans. It is also possible that a progenitor to SARS-CoV-2 jumped from a non-human animal to humans, with the genomic features described above acquired through adaptation during subsequent human-to-human transmission. We surmise that once these adaptations were acquired (either together or in series) it would enable the outbreak to take-off, producing a sufficiently large and unusual cluster of pneumonia cases to trigger the surveillance system that ultimately detected it.
All SARS-CoV-2 genomes sequenced so far have the well adapted RBD and the polybasic cleavage site, and are thus derived from a common ancestor that had these features. The presence of an RBD in pangolins that is very similar to the one in SARS-CoV-2 means that this was likely already present in the virus that jumped to humans, even if we don’t yet have the exact non-human progenitor virus. This leaves the polybasic cleavage site insertion to occur during human-to-human transmission. Following the example of the influenza A virus HA gene, a specific insertion or recombination event is required to enable the emergence of SARS-CoV-2 as an epidemic pathogen.
Kristian G. Andersen
Department of Immunology and Microbiology,
The Scripps Research Institute,
La Jolla, CA USA.