Naming COVID-19


last update: 18 November 2020


First steps


When the
virus first appeared there was some uncertainty about what to call it. Some called it the "Wuhan coronavirus" or "Chinese coronavirus", but on the 12 January 2020, the World Heath Organisation provisionally called it "2019-nCoV". This meant 2019 (year), n (novel), Co (corona) and V (virus).

On the 7 February 2020 the Coronavirus Study Group of the
International Committee on Taxonomy of Viruses (ICTV-CSG) named it "Severe Acute Respiratory Syndrome Coronavirus 2" or SARS-CoV-2. The '2' clearly showed that it was closely related to the original SARS-CoV virus from 2002-03, however some people incorrectly assumed that the '2' meant that the virus was a direct descendent.

Since 1973 the International Committee on Taxonomy of Viruses (ICTV) is the global authority on the designation and naming of viruses. And the World Heath Organisation is responsible for naming new human infectious diseases.

It was in
Nature (16 November 1968) that 'coronaviruses' was proposed by an informal group of eight virologists. The idea was that there were viruses that looked different from the usual viruses responsible for influenza. These viruses appeared to be "more or less rounded in profile" and they had a kind of halo recalling the solar corona. The virologists suggested that these viruses were members of a previously unrecognised group. The term 'coronaviruses' was officially approved in 1971.

So the word 'corona' has nothing to do with the
Mexican brand of beer that originated in 1925.

The situation today


On the 11 February 2020, the World Heath Organisation officially renamed "2019-nCoV" as "COVID-19", with CO (corona), VI (virus), D (disease), and 19 (meaning 2019).

In humans, there are 7 coronaviruses (HCoV's) known to cause both the common cold and more severe respiratory diseases. Of those, human coronaviruses HCoV-229E, HCoV-NL63, HCoV-OC43 and HCoV-HKU1 are routinely responsible for mild respiratory illnesses like the common cold, but they can cause severe infections in immunocompromised individuals. However, three coronaviruses have caused deadly outbreaks: SARS-CoV, MERS-CoV, and SARS-CoV-2.

Some history


Historically, the range of diseases and hosts were the two key characteristics used to define viruses, given that the virus itself is invisible to the naked eye.

Hosts, in the biological sense, are the organisms that "provide nourishment and shelter to guests", and this can range from animals hosting parasites to cells harbouring pathogens. In the broadest sense a pathogen is any organism that can produce disease, i.e. anything that negatively affects the structure or function of all or part of an organism (excluding some kind of external injury). Diseases are best recognised though their symptoms and medical signs.

The Wikipedia article on the
history of virology mentions that in 1886 the tobacco mosaic virus was the first non-bacterial infectious agent discovered, but the word 'virus' was only suggested in 1898.

Comparison Virus-Bacteria

Single virus particles are only about 20 to 250 nanometers in diameter, unlike bacteria which tend to about 100 times larger. Only the largest viruses can be seen under a light microscope at the highest resolution. The SARS-CoV-2 virus particle varies between 60 to 140 nanometers in diameter and a properly worn N95 mask can filter approximately 99.8% of particles having an average diameter of 100 nanometers. For comparison the 3-ply blue disposable surgical masks are expected to achieve at least 95%, whereas a silk or linen scarf can be expected to achieve 60% to 65%.

Death Toll


Viral
phenotypic features include those that, like a disease, are predominantly shaped by virushost interactions, e.g. distinctive features such as range of hosts, infectivity, virulence and mutation rates, etc. (but could also be as simple as temperature sensitivity).

A virus' host range is the range of cell types and host species a virus is able to infect.

Another way of
virus characterisation could be the architecture of virus particles.

Originally it was suggested that a virus was a "contagious living fluid", but it was in 1898 that it was suggested that a virus was in fact a tiny particle. It is true that already in 1887 variola (smallpox) and vaccinia could just about be seen with a light microscope, however, most viruses are far too small. It was only with the invention of the electron microscope that during the period 1938 through the early 1940's different virions could be identified as particulates. In fact in 1943 there was a classification system based on the perceived structure of the virus, but the technique was complex and expensive and only in the 1960's was it possible to acquire information about the architecture of virus particles. Check out "A Short History of the Discovery of Viruses".

So, the
host of a given virus may be uncertain, and the virus pathogenicity also remains unknown for a major (and fast-growing) proportion of viruses. The problem is that many coronaviruses have been discovered in metagenomics studies using sequencing technology of environmental samples. These studies have identified a close-to-exponential number of diverse viruses that circulate in nature and have never been characterised at the phenotypic level.

Thus, the
genome sequence is the only characteristic that is known for the vast majority of viruses, and needs to be used in defining specific viruses. In this framework, a virus is defined by a genome sequence that is capable of autonomous replication inside cells and dissemination between cells or organisms. The virus may or may not be harmful to its natural host.

Experimental studies may be performed for a fraction of known
viruses, while computational comparative genomics is used to classify (and deduce characteristics of) all viruses.

Accordingly,
virus naming is not necessarily connected to disease but rather informed by other characteristics.

In view of the above advancements and when confronted with the question of whether the
virus name for the newly identified human virus should be linked to the (incompletely defined) disease that this virus causes, or rather be established independently from the virus phenotype, the Coronavirus Study Group decided to follow a phylogeny-based line of reasoning to name this virus whose ontogeny can be seen below.

Virus Naming

It's easy enough to imagine that when an influenza virus is found it is identified and named according to well established and internationally approved methods, standards and procedures. However with newly emerging viruses the situation might be quite different. The new virus genome sequence might be similar, but not identical, to an existing virus. So how much difference is needed to consider the new virus a member of a new, district group? If the candidate virus is sufficiently different from existing groups, it's considered distinct and novel.
The
ICTV oversees the official classification of viruses, whereas the WHO names diseases. Check out this article for more information on the classification of 2019-nCoV (early name for COVID-19) and the naming of SARS-CoV-2

Naming coronaviruses


The International Committee on Taxonomy of Viruses (ICTV) has adopted a naming convention based upon a slightly simplified version of the standard biological classification system. The general system groups organisms into a taxonomic rank creating a taxonomic hierarchy. The principal ranks in modern use are domain, kingdom, phylum, class, order, family, genus, and species.

The species of modern man (Homo sapiens) is part of the genus Homo, tribe Hominini, subfamily Homininae, family Hominidae, infraorder Simiiforme, suborder Haplorhini, order Primates, class Mammalia, phylum Chordata, and kingdom Animalia. All in the domain of Life.

On the Taxonomy page of ICTV they list (as of July 2019) a total of 6,590 species. They also introduce above kingdom a higher taxonomic rank with four different realms. However I've read that some viruses are left as unassigned species, many are left unranked, and there are an increasing number that have not been identified and classified. According to the ICTV website it was in 1975 that the genus coronavirus was upgraded to the family coronaviridae (at the time it was called "avian infectious bronchitis virus"). In 1996 the family coronaviridae was expanded to include two genera, the coronavirus and the torovirus. In 2018-19 things accelerated with three rounds of changes resulting in the following accepted classification:-

Realm Riboviria - all viruses that use an RNA-dependent polymerase for replication
Kingdom Othornavirae - genomes made of RNA and which encode an RNA-dependent RNA polymerase (RdRp)
Phylum Pisuviricota - RNA viruses which include all positive-strand and double-stranded RNA viruses which infect eukaryotes but not prokaryotes (i.e. infects organisms whose cells have a nucleus enclosed within a nuclear envelope, such as animals, plants and fungi, but does not infect bacteria or single-celled organisms)
Class Pisoniviricetes - positive-strand RNA viruses that infect eukaryotes
Order Nidovirales - enveloped positive-strand RNA viruses that infect vertebrates and invertebrates
Suborder Cornidovirineae - appears to group 56 reference strains (Cornidovirineae appears to collect together a discrete evolutionary lineage based upon sequencing or morphology, but no information was found in the literature)
Family Coronaviridae - enveloped positive-strand RNA viruses that infects amphibians, birds, and mammals
Subfamily Letovirinae - infects amphibians
Subfamily Orthocoronavirinae - infects birds and mammals (i.e. coronavirus).

Whilst SARS-CoV-2 occurs only in humans as a clinical respiratory infection, the coronaviridae family infects also vertebrates causing diseases in animals such as turkeys (enteritis), mice (viral hepatitis), pigs, dogs, foals (gastroenteritis), cats (peritonitis), calves (neonatal diarrhoea), birds and rats (pneumonia).

Within the subfamily Orthocoronavirinae (all are enveloped positive-strand RNA viruses) there are four genera. The genera Alphacoronavirus and Betacoronavirus are viruses that infect mammals, including humans. The Gammacoronavirus infects birds, ducks, geese and the Beluga whale, whereas Deltacoronavirus infects mostly birds and some mammals. The genus Alphacoronavirus includes a number of bat viruses as well as the human coronaviruse species HCoV-229E and HCoV-NL63. The genus Betacoronavirus includes a number of different animal viruses as well as the human coronaviruse species HCoV-HKU1 and HCoV-OC43, as well as the deadly species SARS-CoV, MERS-CoV, and SARS-CoV-2.

For completeness we note that species HCoV-229E is the only virus in the subgenus Duvinacovirus, whereas the species HCoV-NL63 is one of two viruses in the subgenus Setracovirus (the second is a NL63-related bat strain). The species HCoV-HKU1 is one of 5 species in the subgenus Embecovirus, and in the same subgenus we also find HCoV-OC43 which is a member of the Betacornoavirus 1 species which infects both cattle and humans. The species MERS-CoV (Middle East Respiratory Syndrome) is one of 4 species in the subgenus Merbecovirus, whilst the two strains SARS-CoV and SARS-CoV-2 are alone in the subgenus Sarbecovirus.

The Classification of SARS-CoV-2


There was a recognition that with the recent advances in comparative genomics and metagenomics a close-to-exponential number of diverse viruses have been uncovered. The problem was that the existing taxonomy framework was inadequate to depict the relationships in the 'virosphere'. The idea was that major virus clades could be identified by key evolutionary events. So-called Virus Hallmark Genes were expected to be conserved across evolutionary groups of viruses, and these genes were expected to be responsible for the key functions in virus replication and virion morphogenesis. The approach was successful in assigning a substantial majority of viruses to one of four evolutionary independent virus realms.
However hallmark genes are not universally shared among all
viruses, and it's now thought that viruses have several distinct points of origin, i.e. that they cannot be united under a single highest taxon rank on evolutionary grounds. The hierarchy shown above was proposed in 2019 and classified all RNA viruses in a realm called Riboviria. This opened up the possibility of a kingdom called Othornavirae for all RdRp viruses, with five phyla strongly related to genome sequence data analysis.

So the International Committee on Taxonomy of Viruses (ICTV) has retained a hierarchical taxonomy inspired by a modern 'tree of life'. Some experts see that with the need to accommodate new discoveries the 'tree' has lost its coherence, and therefore its authority. Alternative classification systems have been proposed based upon structure, host species, or genome length. The Baltimore classification groups viruses into seven categories based on the biochemistry of their replication strategies, nucleotide character, but as the basis for a phylogeny, it conflicts with the observation that some viruses with similar functions and structural proteins have different types of genomes.

As far as I understand things one approach today is to create a kind of
virus vector in a Euclidean “genome space'' calculated from various properties of the genomes of a candidate virus. The distances between such vectors are used for classification, although it is not clear that the distance between two points in genome space is a meaningful metric of genome similarity.

Several researchers have gone so far as to propose alignment-free methods simply due to the computational complexity of genome sequence alignment and the rate at which new viruses are sequenced.

The
viral genome captures a record of fingerprints of the evolutionary history of the virus and should in principle provide the basis for calculating relationships between any set of viruses. It has been shown that the genome encodes history including recombination events and virus-species coevolution. So the use of full genomes in viral classification, now possible because of modern sequencing techniques, offers the promise of weighing all salient features of a given virus more objectively than the ICTV's consensus system. Indeed, we have already mentioned that expert consensus has highlighted the need for the ICTV classification to incorporate viruses known only from metagenomic information, which would require categories to be defined in terms of genetics, as opposed to phenomenology.

I'm going to close this webpage by looking at just
one development concerning viral taxonomy. The approach taken is viral classification based upon sequence alignment, i.e. matching sequences of DNA, RNA, or proteins that are thought to have a functional, structural, or evolutionary relationship. From gene alignment a global tree can be built up based upon a distance metric on the space of viral genomes. From this, clusters are extracted and a taxonomy constructed. As you might expect the procedures was not that simple and it employed a number of machine leaning algorithms, notably the so-called t-SNE (t-distribution stochastic neighbour embedded) used to visualise complex data in two or three dimensions. To cut a long story short we will just look at two results.

Plt of Viruses with Baltimore Classification

Above we have over 5,800 viruses grouped into 78 clusters based upon similar genes. The colouring is design to highlight the Baltimore classification, so we can see that yellow is for all positive-sense single-strand RNA viruses which includes all Coronaviridae (including coronaviruses).
Below we have the same
viruses group by host kingdom, so we now see all Animalia hosts in orange, i.e. all viruses that infect animals. This group of viruses is the focus of public health concerns. Established viruses from wildlife hosts can switch into humans and then be transmitted within the human population. The list is long and includes measles, smallpox, influenza, HIV-1, Dengue, SARS and CoV, Ebola, Hendra,…. Fortunately most viral host transfers to new hosts cause only single infections or limited outbreaks, and (despite the present-day situation) it is still rare for a virus to cause an epidemic in a new host. With the recent focus on COVID-19 we tend to ignore the fact that HIV jumped from primates to humans some 70 years ago, has infected hundreds of millions of people, and continues to infect around 2 million people annually. We forget that nearly 700,000 people died from AIDS-related illnesses in 2019.

Plot of Viruses by Host Kingdom

The authors of this work argued that the future is genetic classification using only quantitative features, and they suggest that it's a starting point for quantitative modelling in the broad areas of genetics and evolution.