Q&A on Viruses
last update: 10 April 2021
On this webpage we will look at the virus, but not specifically the SARS-CoV-2 virus nor the disease COVID-19 it creates in humans. I've tried to develop this webpage as a Q&A (question and answer) driven by the questions I had, and the answers I found in diverse publications.
What were my questions?
Firstly, are viruses important?
Secondly, what exactly is a virus?
Thirdly, are there a lot of viruses that attack humans?
Fourthly, are all viruses bad?
Fifth, are viruses alive?
Sixth, how do you kill a virus?
Seventh, where did viruses come from?
Lastly, viruses and vaccines, what next?
What's so important about viruses?
There are two aspects to the answer…
Firstly viruses kill more people per year than cancer, so the need for more and better vaccines is a 'number one' priority. And now all the more so with COVID-19.
Secondly, ….. do we really need a second reason? No, but viruses potentially could become a powerful therapeutic tool, and they are scientifically interesting and increasingly a topic of frontier research.
The World Health Organisation (WHO) has a "top 10 causes of death" where it includes infectious diseases as part of 'communicable' causes of death, as opposed to non-communicable causes and fatal injuries. The WHO highlights "lower respiratory infections" and "diarrhoeal diseases" as 'communicable'. This analysis dates from the pre-COVID 2019.
As of mid-March 2021 nearly 2.7 million people died of COVID-19 worldwide.
But what happens when we look beyond COVID-19 and the "top 10"? Already in 1996 the WHO stated that "infectious diseases kill over 17 million people every year" (including 9 million children). Still in 1996, the report stated that at least 30 new diseases had emerged in the period 1975-1995, many having no treatment, cure or vaccine.
In addition the World Health Organisation also highlights the fact that vaccination prevents more than 20 life-threatening diseases, and currently prevents 2-3 million deaths every year.
Viral and bacterial infections are probably the world's second biggest killer after heart related diseases and illnesses.
In creating this webpage I came across a massive amount of new information, and occasionally a non-evident (for me) insight. One of those insights was the fact that viruses can evolve very quickly, acquire new genes, and alter their behaviour. So it seen reasonable to assume that over the past hundreds of thousands of years viruses have built and tested many different versions, and what we see today is a collection of the most successful ones. The key question is 'successful for what"? Almost certainly success for a virus won't mean just virulence or pathogenicity, or the maximum ability to cause disease, injury or death, but much more likely it simply means an ability to survive alongside their target populations over time. And this would be equally true for viruses of humans, animals, plants, fish, or even bacteria.
Wikipedia has a "list of vaccine topics", and many of those vaccines are for viral infections well known to us all, e.g. the traditional flu shot (first used in the 1930's), the MMR vaccine against measles, mumps and rubella (1971) or the MMRV vaccine which is a MMR vaccine plus the chickenpox vaccine (2005), and the polio vaccine (1950).
And there are also vaccines against bacterial diseases, e.g. the DPT/DTP vaccine against diphtheria, whooping cough, and tetanus (1949), the BCG vaccine against tuberculosis (1921), and the Pneumovax vaccine against pneumonia (1945).
The annual seasonal production of the flu shot is estimated at about 1.5 billion doses.
Since 1971 the MMR vaccine has been given to more than 500 million children worldwide.
The requirement for the MMRV vaccine exceeds 120 million doses annually for routine immunisation.
The production of the polio vaccine appears to be about 50 million doses annually.
DPT/DTP production annually exceeds 500 million doses.
BCG is given to about 100 million children annually.
There exists a longterm commitment to produce 200 million doses of Pneumovax annually.
And there are also vaccines that are in strong demand somewhere in the world, but are not routinely used in Europe:-
Requirements for the hepatitis B vaccine could be around 40 million annually.
There is a requirement for 100 million HPV vaccines annually.
The requirement for rabies vaccines is for humans 90 million annually, and 180 million for dogs.
It is said that there are about 70 million doses of the smallpox vaccine in storage.
Production capacity for the typhoid vaccine exceeds 100 million doses annually.
Current demand for cholera vaccination appears to be about 20 million doses annually.
As of mid-March 2021 the work on COVID-19 vaccines has produced over 200 candidates, with over 50 vaccines in human clinical trials, 18 in efficacy testing, and there are at least 6 different vaccines in use in different parts of the world. More than 8 billion doses of vaccines have been pre-ordered.
In an article from 2020 the American Cancer Society listed several viruses linked with cancer in humans:-
At least a dozen human papillomaviruses (HPV) are known to cause cancer, and a few of these can cause cervical cancer, and there are vaccines against 90% of these infections.
The Epstein-Barr virus (EBV) is a type of herpes virus, whilst not in itself a major risk, it increases the risk of getting nasopharyngeal cancer and certain types of lymphomas.
Hepatitis is caused by a variety of viruses, and in some cases can lead to liver cancer.
Human immunodeficiency virus (HIV) can cause acquired immune deficiency syndrome (AIDS), which weaken a persons immune system and can reduce resistance to other viruses that might lead to cancer.
Whilst I dislike anecdotal evidence being used as a statement of principle, I was nevertheless impressed by a report in 2017 on phage therapy. It's worth the read.
Virotherapy is about converting viruses into therapeutic agents to treat diseases (see this review). There are millions of viruses "in the wild" and only hundreds are identified each year. Most of the time these new viruses are linked to existing diseases and suffering, but during their characterisation, there is almost always the question about potential beneficial properties they may have (e.g. beneficial to their host and possibly to plants, animals or even man). Could natural and laboratory-modified viruses be used to target and kill cancer cells, to treat a variety of genetic disorders as gene and cell therapy tools, or to serve as vaccines or vaccine delivery agents?
Cancer is one of the main targets, and it all started from early observations of cancer regression in patients suffering from unrelated viral infections. In the past two decades, viruses from a variety of different families (e.g., Adenoviridae, Herpesviridae, Rhabdoviridae, Parvoviridae, Picornaviridae, Reoviridae, and Poxviridae) have been studied for their potential use as anticancer agents. Due to their tropism for tumours and their ability to replicate selectively in and eventually lyse cancer cells without harming noncancerous cells, they are referred to as oncolytic viruses. Trials are ongoing for the treatment of various cancer types, including hepatocellular carcinoma, glioblastoma multiforme, colorectal cancer, and cancers of the lung, breast, prostate, pancreas, bladder, and ovaries. In 2015, the first oncolytic virus therapy based on a herpesvirus was approved by the US Food and Drug Administration and European Medicines Agency for the treatment of melanoma lesions in the skin and lymph nodes.
In contrast to oncolytic virus therapies, where the treatment is based on virus replication and cell death, non-replicating viruses are being used as vectors for corrective gene delivery. The goal of virus-mediated gene therapy is the delivery and expression of therapeutic genes to desired target cells to restore the function of a defective gene. Viral gene therapy uses the natural capacity of virus particles to protect the encapsidated nucleic acid from degradation and to deliver the DNA to the nucleus. For the ideal gene therapy vector the viral wild-type genome is almost entirely substituted with a recombinant transgene expression cassette. This aspect is a major difference compared with the oncolytic viruses used in anticancer therapies, which encode many viral genes. In hundreds of ongoing clinical trials, the most commonly used vectors for gene therapy are adenoviruses, retroviruses/lentiviruses, and adeno-associated viruses (AAVs). Each system has its pros and cons that must be considered prior to use to ensure efficient gene delivery and expression for clinical success. Recent successes in various clinical trials have been achieved especially using lentiviral and AAV vectors. Notably, an AAV1 vector for the treatment of lipoprotein lipase deficiency was approved as the first viral gene therapy medical product in the Western world by the European Medicines Agency in 2012. Another example of a successfully completed AAV vector clinical trial involves an AAV8 vector expressing human factor IX for the treatment of hemophilia B. A single injection of these AAV particles resulted in a more than 90% reduction in the number of bleeding episodes in study participants over a period of more than three years, with no toxic side-effects. One downside is that AAV gene therapy is currently the most expensive therapeutic, at about $1 million per treatment.
We know that some of the viruses infecting humans are capable of causing severe and often lethal diseases, but other viruses can be manipulated to be beneficial to human health. These viruses offer the potential to cure cancer, correct genetic disorders, or fight pathogenic viral infections. In addition, viruses are used in many genetic studies to determine molecular mechanisms or are used as insecticides, and some have been reported to increase drought tolerance in plants. Virologists strive to balance the “bad” reputation of viruses by promoting the many “good” things that viruses can do.
In a review of the first congress on Viruses of Microbes, hosted by Institute Pasteur in 2010, the 'current' renaissance in virology was mentioned. The key points were:-
The discovery of giant viruses in amoebae. The mimivirus, with its large genome, can exceed many bacteria in both particle size and genome size. The distinction between viruses and different types of cells became blurred and opened up (again) the discussion on the origins of viruses and the definition of organisms (i.e. are viruses living? - see a later Q&A on this webpage).
In support of the idea that viruses are alive, it was shown that some giant viruses actually host their own parasites (i.e. virophages) that reproduce within the intracellular virus factories (i.e. viroplasm) much like viruses reproduce within cells. Two examples of physically similar virophages were in fact genetically unrelated, suggesting that these 'super-parasites' evolved independently on multiple occasions.
The concept of the 'virocell' was proposed. On the one hand we have the virus particle or virion (the virus capsid or protein shell that enclosed the viruses's generic material). And on the other hand we have the entity that is formed when an infecting virus subverts the functional systems (cellular processes) of an infected cell for viral reproduction. It is this 'virocell' that appears to possess all features of a living organism.
With the emergence of viral metagenomics and ecological genomics it became obvious that virus particles are the most abundant biological entities on Earth, and that viruses carry many more novel, uncharacterised genes than cellular life forms.
Furthermore, viruses were recognised as major agents of biological evolution through their key role in horizontal transfer of genes, as well as geochemical change (i.e. because cell lysis by viruses is central to the cycles of the major elements and sediment formation).
The finally point was that work on the viral structural biology of the capsid showed a remarkable similarity across the three domains of cellular life, supporting the idea of evolutionary continuity across the viral world.
What exactly is a virus?
Let's kick-off with Wikipedia's definition of a virus. It tells us that a virus is a submicroscopic infectious agent that replicates only inside the living cells of an organism. Viruses infect all types of life forms, from animals and plants to microorganisms, including bacteria and archaea.
For information: The highest level of biological classification that divides cellular life into Archaea, Bacteria and Eukaryote domains was only introduced in 1990. Archaea are a primitive form of prokaryotic microorganism, and they were only identified as being separate from bacteria in 1977. The first observed archaea were extremophiles, living in extreme environments, such as hot springs and salt lakes with no other organisms (life forms). In fact most archaea live in low-oxygen environments, and cannot be cultured in laboratories (only about 30% of bacteria can be grown in a laboratory). Archaea are similar in size and shape to bacteria, however like eukaryote they poses genes and several metabolic pathways, notably enzymes involved in transcription and translation (however their genes and metabolic pathways are designed for survival in their specific environment). Archaea are also a part of the microbiota of all life forms, and in the human microbiota they are found in the gut, mouth and on the skin (a large variety of bacteria is also found in the gut and on skin). Bacteria are microscopic primitive single-celled prokaryotic microorganisms with a simple structure, and are considered the smallest living entities. Bacteria live in both a symbiotic and parasitic relationship with plants and animals. Many bacteria possess photosynthetic pigments and can perform photosynthesis to prepare their own food, whereas archaea are phototrophs, using sunlight as a source of energy. Transfer messenger RNA is found in archaea but not in bacteria. Bacteria can be pathogenic, whereas archaea are not. Viruses attack both archaea and bacteria, and they share many defence systems against them. More viruses that attack archaea will certainly be isolated in the future. Eukaryote includes all animals, plants and fungi. Many of the technical terms will be explained later on this webpage.
Another way to look at defining a virus is to think of it as a very, very tiny biological capsule, which to proliferate by self-replication needs to commander the reproductive machinery and metabolism of a host cell. Some experts simply define the virus as a microscopic parasite that infects susceptible host cells in order to produce more viruses. Yet others prefer to underline the fact that viruses (infectious diseases) kill nearly twice as many people as cancer.
Technically all viruses are so-called obligate parasites, in that they lack metabolic machinery of their own to generate energy or to synthesise proteins, so they depend upon host cells to carry out these vital functions. Despite not containing its own metabolic machinery the assembly of a virus remains a complex process involving a large number of protein-protein, protein-nucleic acid, and protein-lipid interactions.
The above image is one view of the Zika capsule taken from the virus particle explorer (dated 13 June 2020). Zika is a virus that resulted in a recent epidemic in which more than 700,000 people were probably infected, although only 18 deaths were reported.
It all started in 1892 when Dmitri Ivanovsky (Russian, 1864-1920) describing a non-bacterial pathogen infecting tobacco plants. In 1898 Martinus Beijerinck (Dutch, 1851-1931) followed up with the discovery of the tobacco mosaic virus. It was 1935 that Wendell M. Stanley (American, 1904-1971) created pure crystals of the tobacco mosaic virus and found that they could still infect plants, he concluded that viruses were not living organisms.
Above we can see on the right a negative contrast electron micrograph of the tobacco mosaic virus stained with uranyl acetate (the white bar represents 100 nanometres). On the left we have a model of the capsule (called a virion), which is about 20 nanometres in diameter and about 300 nanometre long. We can see that the virion consists of an outer coating of protein subunits (the capsid) shielding the inner coiled RNA.
There is a lot of scientific jargon in the world of viruses. So the capsid is this outer shell of protein composed of subunits called capsomers, and the nucleo-capsid is when the capsid-shell contains nucleic acid (the general name given to both DNA and RNA). The capsid has three functions. Firstly the capsid protects the nucleic acid from being digested by enzymes, i.e. proteins that act as biological catalysts increasing chemical reactions involved in transforming organic compounds. Secondly, the capsid contains special sites on its surface that allow the virion to attach to a host cell. Thirdly, the capsid provides the viral proteins that enable the virion to penetrate the host cell cytoplasmic membrane, and then into the host cytoplasm. And of course it must be stable enough to survive in extracellular environments while it waits to find a host. The overall aim is to place the viral RNA in a liquid suspension of protein molecules that will self-assemble a new capsid and become a new functional and infectious virus particle.
Nucleic acid encodes the genetic information for the synthesis of all the proteins. Double-stranded DNA (dsDNA) is responsible for this in most types of cells, however most viruses maintain all their genetic information with single-stranded RNA (ssRNA). There are two types of viral RNA. In most cases the viral RNA is termed a 'plus strand' because it acts directly as a messenger RNA (mRNA) for direct synthesis (translation) of viral proteins. A few viruses have 'negative strands' of RNA, and in these cases the virion needs an enzyme, called RNA-dependent RNA polymerase (RdRp transcriptase), which first must catalyse the production of a complementary messenger RNA (mRNA) from the virion genomic RNA before viral protein syntheses can occur.
Now more than 6,000 virus species have been described in detail, but we know that there are millions of different viruses in the environment. Viruses are found in almost every ecosystem on Earth and are the most numerous type of biological entity.
Because the virus needs the cellular machinery of the host organism to reproduce, it is often simply called a parasite, and there is an open debate as to whether viruses are even living organisms.
Wikipedia also has a webpage dedicated to the history of virology, and lists some of the great discoveries, e.g. poliovirus (1908), yellow fever virus (1927), mumps virus (1934), Dengue fever virus (1943), Varicella zoster virus (1952), measles virus (1954), rhinovirus (1956), Rubella virus (1962), Hepatitis B virus (1963), Norovirus (1972), Ebola virus (1976), and HIV (1983).
Above we have a transmission electron micrograph of the SARS-CoV-2 virus particle (i.e. virion). SARS-CoV-2 is a close relative of the original SARS-CoV virus from the outbreak in 2002-2003. However, despite them being closely related, there are around 6,000 genetic differences (i.e. a surprising 20% of the genome), and even more differences with other coronaviruses. So the question is which of these changes, or combination of changes, makes SARS-CoV-2 more deadly than other coronaviruses? The virus has 14 genes in its genome, coding for 27 different viral proteins. These proteins are biomolecules responsible for one or more biological processes in an organism, and are composed of one or more long chains of amino acid residues. So those 6,000 genetic differences result in 380 amino acid changes. It's the changes in those amino acids, and what those changes do to protein functions, that give each virus its unique character and makes one more dangerous than another.
Above we have the SARS-CoV-2 virion and its proteins. I must admit it looks pretty nasty, and we can see one of the famous spikes attached to the human ACE2 receptor. If you are really interested, the identifiers are described here.
However, the SARS-CoV-2 is like other coronaviruses, a sphere with spikes radiating out of it. In electron microscope images, these spikes form a crown (i.e. the corona). In infection, the spikes attach to human cells and control the virus genes entering the cells. Different coronavirus spikes bind to different receptors on the surface of the host cell. SARS-CoV-2 and SARS-CoV, for example, bind to different receptors than the MERS virus, resulting in different pathologies. Every virus has its own form of these spikes, and the large variation in these spikes is a challenge to creating a SARS-CoV-2 vaccine. Unfortunately because virus surfaces vary so much, antigens change and a vaccine for one virus might not easily recognise another.
Let's start with the basics. Viruses can be extremely simple in design, consisting of nucleic acid surrounded by a protein coat known as a capsid. The capsid is composed of hundreds or even thousands of smaller identical protein components encoded by the viral genome. These identical proteins act as basic assembly subunits called capsomeres and are organised in a regular order to form the capsid, with or without the assistance of the viral genome. The capsid+genome combination is called a 'nucleo-capsid', i.e. a capsid enclosing nucleic acid.
Schematics are powerful tools, but we should not ignore what a virus really looks like. Above we have just one simple example, with (A) being a mature virus consisting of the 'spikes' (B, viral protein VP4 in red) emanating from the outer capsid layer composed of a sphere of 260 viral structural proteins (C, VP7 in yellow), under which you find another sphere with 260 different viral protein (D, VP6 blue), and below that the innermost capsid layer of 120 subunits (E, VP2 in green). VP1/VP3 are the transcription enzymes (F, gold) bedded in the VP2 layer (green).
Some of the smallest viruses have just a few proteins but others can have more than two thousand. Below we can see two different viruses that look quite similar. BTV is the bluetongue virus whereas Reo is the orthoreovirus. The first is an insect-borne, noncontagious virus that attacks ruminants, the second can cause respiratory problems in vertebrates.
The key point here is that the two different viruses appear to be very similar, and both are composed of 120 copies of a rather large protein. Initially the experts decided that there was a "clear structural relationship". What you can see is a protein from each, which might look very similar in the way they are folded and located in each capsid. But the reality is that there is virtually no sequence relationship between them and only 8.6% of the residues are structurally equivalent at the level of the amino acid sequence. This is the example of the challenge that molecular biology faced in the past, and the challenge today is knowing how to classify and characterise the almost infinite variety of viruses found in the wild.
Viruses can also possess additional components, with the most common being an additional membranous layer that surrounds the nucleo-capsid, called a viral envelope. The envelope is actually acquired from the nuclear envelope or plasma membrane of the infected host cell, and then modified with viral proteins called peplomers (i.e. these are the famous 'spikes' that can be found on either the capsid or viral envelope). Some viruses contain viral enzymes that are necessary for infection of a host cell and are coded for within the viral genome. In some viruses the gap between the lipid layer and the nucleo-capsid is filled with a "viral matrix" or cluster of proteins that are released to inhibit the response of the immune system and help the replication of the viral genome after invasion. A complete virus, with all the components needed for host cell infection, is referred to as a virion.
At its simplest, a virus may consist of only the viral genome and a protein shell (capsid). The shell is there to protect the viral genome from environmental and enzymatic damage, and to provide the means to find and infect a new host cell. So this protein shell should be big enough to accommodate the viral genome, and the best way to build it is to use multiple copies of the same protein.
Given that the size of a virus appeared proportional to the size of its genome, yet the genome contributed far less to the total mass of the virion than the capsid proteins, Watson and Crick proposed that the capsid must be assembled from multiple copies of proteins. Such an assembly with repeating subunits greatly reduces the amount of genetic information required. In some viruses a capsid built with a single gene product is enough, in other more complex viruses several different gene products are used. At the time this raised questions about how subunits interacted with each other, and what type of capsid architectures were possible. How did the capsid assemble, how was the viral genome encapsulated, how did the capsid interact with the host cell, how did the capsid disassemble to allow virus replication, and how was the entire process coordinated?
When the early techniques of X-ray scattering were applied to determine the molecular structures of viruses, the resulting diffraction patterns showed striking symmetry.
On the left we have a X-ray diffraction pattern of a poliovirus crystal taken in 1959, and on the right we have the optical diffraction pattern of 60 points on the surface of a sphere with icosahedral symmetry. The arrow show the high intensity along certain directions, related to 2-fold, 3-fold, and 5-fold axes of an icosahedron, and both show the same symmetry relations.
Looking at a simple spherical virus with a regular structure, and made up nucleic acid and one capsid protein, the question arose concerning the way identical copies of identical asymmetric proteins can self-assemble a regular shell with that icosahedral symmetry? The presumption was that the proteins self-assembled in a process similar to crystallisation and thus governed by the laws of statistical mechanics. The molecules have to be identical and in physically indistinguishable conditions, so the interaction of identical proteins leads to identical local environments. Arranging identical units in identical environments always produces a symmetrical structure, and there are only a limited number of geometrical symmetries. A spherical virus must have a cubic symmetry, because there should be no preferred direction in space. Only three type of cubic symmetry exist, tetrahedral, octahedral and icosahedral, and this last one allows the greatest number of asymmetric units to be use (i.e. 60 against 12 or 24).
Cubic symmetry is inherent in three of the five convex regular polyhedrons known as the Platonic solids, which includes the tetrahedron (triangular pyramid with 4 triangular faces, 6 straight edges and 4 vertices or vertex corners), cube (a form of hexahedron with 6 square faces, 12 straight edges and 8 vertices), octahedron composed of 8 equilateral triangles (8 faces, 12 straight edges and 6 vertices), dodecahedron (12 regular pentagonal faces, 30 edges and 20 vertices) and the "roundest" of the Platonic solids, the icosahedron (20 equilateral triangles faces, 30 edges and 12 vertices).
They all consist of identical faces in the shape of regular polyhedrons, they all have identical vertices which are meeting points of the same number of faces, and they all have identical edges which are the lines joining adjacent vertices.
It was again Watson and Crick who had proposed that spherical viruses exhibited cubic symmetry involving at least four three-fold rotational symmetry axes. They argued that only such symmetry in an isometric capsid allowed close packing of repeating subunits each having an identical environment around it (I think they called it "genetic economy", i.e. reducing the 'cost' of coding the capsid). The next problem was that many viruses contain more than 60 subunits in the shell, and the answer was to focus on retaining the physical essentials, with the introduction of quasi-equivalence.
The two most common forms of viral capsid are either helical (as with the tobacco mosaic virus) or a 20-sided polyhedron called an icosahedron. Helical symmetry leads to filamentous viruses and the icosahedron symmetry leads to spherical (isometric) viruses. In the above photograph we can just about see that the adenovirus has a icosahedron capsid. In most icosahedron capsids each of the 20 triangular faces is made up of three subunits, making up 60 subunits in total. However a triangular face can be divided into more equal subunits, for example the capsid of the foot-and-mouth virus actually has 80 subunits, and each subunit is made up of four viral structural proteins (as seen below).
The icosahedron outer capsid (shell) is in fact the shape of more than half of all viruses. What this means is that if you unfold our capsid onto a flat surface you would see that it is composed of 20 identical equilateral triangular facets (like tiles) and 30 edges.
Of all the Platonic solids, the regular icosahedron has the highest volume per unit surface area, suggesting that these viral capsids have explicitly evolved to use the most economic geometry. By mapping the icosahedron to a flat surface the long-range orientational order in the system becomes more apparent.
What happens if a larger capsid is needed? You could make the subunits bigger, but then you would need a longer viral genome to produce the larger proteins, which would need a larger capsid, and so on. So what happens is that more copies of the subunit are used. The next step was to map the unfolded 20 triangular faces of an icosahedron onto a lattice consisting of regular two dimensional hexagons, with the exception that each vertex from the triangulation should be a pentagon. The concept of a T-number was introduced to represent the increasing size and complexity of the capsid, and it counts the number of symmetrically distinct but quasi-equivalent triangular facets in the triangulation per face of the icosahedron. Excluding T=1 which can only be represented by a icosahedron or a dodecahedron, the other T-numbers can represent a number of different geometries (see below).
The icosahedral symmetry can still be maintained by having pentamers (a complex of five subunits) at the vertices, and hexamers (a complex of six subunits) distributed evenly at other locations. This geometry will not produce new subunits that are all identical, so they are usually called "quasi-equivalent". This quasi-equivalence produces two possible mirror-image lattices which are either left- or right-handed, 'laevo' or 'dextro', respectively. Prolate (spheroid) capsids can be built with differing arrangements of pentamers and hexamers, but then they are not strictly icosahedral. The argument made was that folding of these structures onto the icosahedron surface retained the bonding pattern of the lattice, and that there were no 'seams' formed that would suggest a weakness in the structure.
Inevitably cases emerged that were not readily explained, and a more general viral tiling theory was developed. The initial idea had been to cover the surface of the icosahedron with equilateral triangles, now the idea was to allow rhomb and kite tiles as well. Originally protein subunits corresponded to the corners of the facets of equilateral triangles, but now the idea of tiles makes it possible to still maintain symmetry whilst describing more spherical structures in a more natural way. It makes things more complicated, but fortunately only a few capsid protein patterns or 'folds' are utilised by many known viruses. The above discussion is essentially one based upon geometrical principles, and does not explain the origin of the symmetry found, nor the physical reasoning behind the self-assembly found. One thing is certain, symmetry might sound like a simple principle, but explaining the symmetrical nature of viruses is far from simple.
For information: The natural design of a virus reflects two deep forms of elegance. The first we have already mentioned, and it's the idea of "genetic economy". The second is "structural symmetry" as seen in crystallographic studies in the 1940's and 1950's. As mentioned, Watson and Crick hypothesised that the crystallographic symmetry of virus was due to the repeated use of a few protein subunits. This structural symmetry of protein units meant that if you knew the position of one of the protein then symmetry implied that you knew the position of all the others. Later it was found that genetically unrelated viruses shared very similar virion architectures, even down to the persistence of specific protein folds (the so-called icosahedral virus with subunits that assemble into a shape with icosahedral symmetry). A course-grain, but very effective, analysis finally broke down when faced with high-resolution crystallography. The new approach corresponded to tiling anomalous virus structures with 'darts' and 'kites'. This form of tiling was found to be the simplest way to tile a plane that possessed five-fold rotational symmetry, and now remarkably the same result was found for tiling a sphere with icosahedral symmetry. The next step was to understand the pathways to self-assembly, an even more complex subject and for more information the reader is invited to check-out "Building Polyhedra by Self-Assembly: Theory and Experiment" by Ryan Kaplan, et.al.
This topic is quite complicated and we need some way of visualising what we are describing. Below we have a schematic of a typical virus. In (A) we have the schematic of a T=3 capsid, with the quasi-equivalent subunits coloured red, green and blue. 'T' is just the so-called triangulation number which describes how each of the triangular faces are subdivided into smaller tiles. In (B) we have the schematic cross-section showing an encapsulated RNA genome. In (C) we have the ribbon diagram of the X-ray crystallographic structure, and the subunits are again in red, green and blue. The icosahedral asymmetric subunit is shown as a black triangle with the position of the twofold, threefold and fivefold symmetry axes indicated. In (D) we can see the crystallographic asymmetric unit provided by X-ray analysis.
In this model, a special nucleation event initiates the assembly process, and in subsequent assembly steps, each protein that binds to the growing structure undergoes a conformational change that creates a new binding site for the next protein to assemble. This was an elegant but oversimplified description, the problem being that the analysis techniques often imposed icosahedral symmetry and were thus unable to highlight subtle structural differences. In this model there are 178 copies of the coat protein, and a single copy of a maturation protein which is there to carry out lysis, i.e. escape from the host ('maturation' is the final step in cellular differentiation). It is also thought that the resulting weak point in the capsid caused by this maturation protein could provide the exit route for the viral RNA upon infection.
The challenge of any virus is to assemble a stable structure that must partially dissemble upon host infection, release its genome, and initiate a new cycle of virus replication. This simple model has a single-stranded genome so it has to assemble its capsid proteins around a condensed viral genome. There are a lot of constraints. Firstly how do the viral capsid proteins know to interact with their own genome among a multitude of nucleic acids present within the host. Then the genome can't afford to have knots or tangles that would stop it leaving freely the capsid. And amazingly the quality of the RNA reconstruction is impressive, and includes secondary and tertiary structural elements suggesting that genomes adopt essentially the same structure in each individual particle. This is just one example, but it highlights how experts try to unfold and understand the very complex processing taking place inside a virus.
Keeping on the topic of symmetry for a moment longer, we have seen that it's a key principle in building viral structures. However, a highly symmetrical structure is, almost by definition, very static, whereas biology demands the possibility of movement. For this reason symmetry mismatches are built into the overall structural features of the virus. One way could be to simply have an extra feature on the surface of the capsid, or the feature could appear here and there on the surface whilst retaining some local symmetry. Lastly, the feature itself might retain some flexibility. Helical viruses co-assemble the capsid and genome at the same time, whereas icosahedral viruses first build an empty procapsid which is then subsequently packaged, using specialised viral structural proteins. In some types of bacteriophages the packaging takes place at one of the locations of the five-fold symmetry of the capsid, and a tail is assembled there after packaging (this order of virus are called caudovirales). In our example the tail consists of 5 proteins, one involved in the packaging, the tailspike protein needed for receptor binding and destruction (see this example), and three proteins that prevent the contents of the capsid from leaking out again.
If we turn to the contents of the capsid we see the DNA or RNA packed inside, retaining some symmetry in the outer packing layers, but not in the inner layers. The symmetry mismatch is unique for the genome within an otherwise symmetrical virus. It's even more complex if the genome is segmented as each segment is a structural entity. In one example the genome segments inside the capsid were cone-like spools meeting in the centre of the virus.
Before moving on into a more detailed description of the virus, it is perhaps useful to understand the basics of virus reproduction (called viral replication). There are six basic stages:-
Firstly, the virus needs to attach itself to a host. There are specific viral proteins on the surface (capsid or an outer phospholipid envelope) of the virus that interact with specific receptors on the host's cellular surface, and this determines the range of hosts that that any specific virus can attack. Meeting the right host cell can be quite a random event, but the match virus-host is very specific, and viruses only infect particular types of cells or particular hosts.
Now the virus needs to fuse the viral and cellular membranes so that the viral capsid can enter inside the host. Viruses that don't have complex envelopes will look to inject their nucleic acid into the host cell, leaving the empty capsid on the outside.
After the viral genome has been uncoated, transcription or translation is started. There are a variety of ways this can happen, but they all result in a de novo synthesis of viral structural proteins and genome.
With the new set of viral proteins and the genome, the viral proteins need to package the newly replicated viral genome creating a new virion that can be released from the host. Simple viruses can have a capsid composed of only 3 viral proteins, other capsid can contain over 60 different viral proteins. These larger capsids can employ multiple assembly lines and use scaffold proteins to build the final structure.
"Viral shedding" is through lysis, which results in the death of the infected host (cytolytic viruses) and the release of all the newly formed virions into the environment. The other way is through budding, which involves acquiring a viral phospholipid envelope, and leaving the host without killing it (cytopathic viruses). The virus can also lay dormant within the cell leaving the host to continue normally (lysogenic cycle).
Simplified diagrams of a virus are useful, but they are not very helpful if we want to better understand what a capsid and a virion really are.
As a more complete example we are going to look at the description of a virus from the cystoviridae family (so a cystovirus). To be honest this is quite a unique group of viruses, but they have proven to be very useful in the study of many facets of molecular virology. The virus shown above was discovered in 1973, and the cystoviridae family is the only known bacteriophage to have a lipid envelop around their protein capsid (it obtains the lipids from the host cytoplasmic membrane).
The virus is also unique in that it is again the only known bacteriophage to possess a genome consisting of three strands (segments called short, medium and long) of double-stranded RNA (dsRNA) used for coding 13 viral proteins (I've read that there is also a non-structural protein called P14). Each genome segment has distinct noncoding regions at the termini used as signals for packaging and viral replication. Genes are clustered into functional groups comprising those encoding the polymerase complex, envelope-associated proteins and the nucleo-capsid, for example, the 'long' segment codes for five proteins including P1, the major coat protein of the polymerase complex. Polymerase is an enzyme that synthesises long chains of polymers or nucleic acids. DNA polymerase and RNA polymerase are used to assemble DNA and RNA molecules.
Non-infected cells do not normally possess long double-stranded RNA (dsRNA), so its presence provokes a strong immune response. This means that the dsRNA must be protected at all times within the closed capsid, and never enter the host cytoplasm. For this reason the dsRNA virus capsid must be a complete molecular machine capable of encapsidating the plus polarity single-stranded RNA (ssRNA) genomic precursors, synthesising the minus strands inside the particle (replication), making plus-strands from the dsRNA genomes (transcription), and finally extruding the newly made plus-strand transcripts to the exterior. There are a number of different ways to assemble the capsid, but the cystovirus is the only dsRNA viruses known to package their genome into a preformed procapsid, i.e. a preformed shell. Presumably because of the host cell immune system, the genomic RNA is always encapsulated in the procapsid in a single-stranded form. Once inside the procapsid the polymerase replicates the ssRNA to form the dsRNA.
In the context of the SARS-CoV-2 this virus also has spikes made of the viral protein P3 protruding from the membrane and anchored by the viral protein P6.
Cystoviridae is a family of virulent bacteriophages that attack pseudomonas, a wide variety of bacteria that can attack hospitalise patients (nosocomial infections), but is best known as a plant pathogen (Pseudomonas syringae) of kiwifruit, beets, wheat, barley, peas, etc.
Above we have a typical bacteria which are massive compared to a typical virus. The smallest bacteria are about 0.4 micrometers in diameter and 1 micrometer long, compared to viruses that range from 20 to 250 nanometers is diameter (so a virus is 10 to 100 times smaller than an average bacteria). An average bacterium DNA contains 1.8 million base pairs and about 1,700 genes, with very little repetition. An average virus can contain anything between 2,000 and 2.5 million base pairs, however even the largest viruses may only have 200 genes.
For information: We start with human DNA distributed over 23 pairs of chromosomes, and each cell has two copies. This is collectively called the human genome, with each containing around 30,000 genes, although a large part of the human genome does not encode proteins (i.e. introns are non-coding regions of the DNA, or the RNA transcript encoded from it, and they are eliminated by splicing before translation).
Replication occurs in the cell nucleus and involves unwinding the strands of DNA to make a copy consisting of two daughter strands each containing half of the original DNA. There is even a proofreading and repair process to correct inevitable replication errors, but mutations can always sneak through. Transcription is the process where DNA is copied to mRNA, which carries the information needed for protein synthesis. This also involves partially unwinding the DNA strands, however only one strand is transcribed (this is the so-called antisense strand). RNA polymerase enzymes join ribonucleotides together to synthesise a precursor messenger RNA (pre-mRNA). When this process is completed the DNA molecule re-winds to re-form the double helix. Then the pre-mRNA is edited to create a mature mRNA in a process called RNA splicing, i.e. removing all introns or non-coding regions of the RNA transcript. The mature mRNA is transported out of the cell nucleus into the cell cytoplasm, and to the ribosome (i.e. the cell's protein synthesis factory). There it joins with transfer RNA (tRNA) in a complex routine to synthise proteins (the process is called translation).
In our virus the strands of double-stranded RNA (dsRNA) are found within a spherical virion having three structural layers, a double-shelled virally encoded protein core, which is in turn surrounded by an outer lipid bilayer envelope (total outside diameter is about 85 nanometres).
Above we see the location of the viral proteins on the left, in the middle a 3-D reconstruction of the nucleo-capsid surface, and on the right an electron micrograph of more than 20 virus particles attached to a pilus receptor of the host (the black bar represents 200 nanometers). I understand that virions (virus particles) can also attach directly to the outside lipopolysaccharide layer of the host bacterium.
The innermost polymerase complex core consists of four virally encoded proteins. Firstly the structural framework of the polymerase complex core (or procapsid) is formed by 120 copies of the major capsid structural protein P1, and is arranged in the form of a icosahedral lattice. Then there are about 10 copies of the RNA-dependent RNA polymerase (RdRp is the protein P2), which are essential in all RNA containing viruses because they are responsible for the replication of the genome and for the synthesise of viral mRNA's by transcription. There are also 72 copies of the hexameric packaging motor NTPase P4, and the fourth protein is the assembly cofactor P7.
The envelope encloses the nucleo-capsid, consisting of two concentric viral protein layers, i.e. the nucleo-capsid surface shell and the polymerase complex core. The nucleo-capsid surface shell (so the middle layer) consists of 200 copies of viral protein P8 which forms what is called a laevo-icosahedron, i.e. a 20 face icosahedral lattice with a left twist. This layer also includes the lysis enzyme P5. We can see that the surface lattice is penetrated at intervals by a turret-like P4 viral protein protruding from the inner shell.
This outmost layer is the lipid-protein envelope consisting of phospholipids synthesised from the host cell membrane components. It is composed of five membrane proteins, P3, P6, P9, P10 and P13. Four virally encoded membrane proteins ("fusogenic protein" P6, "major envelope protein" P9, "putative holin protein" P10, and a "minor membrane protein" P13) are integral to the layer, whereas the host attachment spikes (formed by P3) are anchored to the envelope via the fusogenic protein P6 ('fusogenic' just means facilitating fusion of cells).
For information: We have already noted that this type of virus is about 85 nanometres in diameter, however the nucleo-capsid is less than 60 nanometres in diameter. The virion is about 40 nanometres in diameter, and the outer protein shell is about 5 nanometres thick. The most prominent feature are the 12 outwardly protruding 'spikes' (protein P3) on the viruses uneven surface. The three segments of double-stranded RNA (dsRNA) are tightly compacted into a space of only about 25 nanometres in diameter. The inner capsid is 'porous' meaning the viral transcripts can leave but not the dsRNA. The total genome size of our particular virus is 13,300 base pairs, with the segment sizes ranging from 2,900 to 6,400 base pairs.
It's fine to mention nanometres, but just how small is that? Well, the first thing to remember is that you cannot see viruses using an optical microscope. A human hair had a diameter of 80,000 to 100,000 nanometres, so our virion is much, much smaller than a human hair. A human red blood cell is about 6,000 to 8,000 nanometres across, so still far bigger that our virion, but that DNA double helix we mentioned is only about 2-3 nanometres in diameter. You could imagine covering our virion in gold, but it would only take you a handful of gold atoms, since each gold atom is about ⅓ of a nanometre in diameter.
No matter how we describe a virus it is still almost impossible to imagine it's small size, particular as compared to the damage it is able to do to all types of life. The numbers are overwhelming, particularly bacteriophages, where a millilitre of seawater can contain up to 10 million phages, and a gram of soil can have 100 times more. And even more amazingly the turnover time for the entire phage population in the worlds oceans and seas is only a few days, and during those few days they infect and kill perhaps as much as 30% of all sea-born bacteria.
A virus weighs around 10 attograms, and yes we can weight viruses down to about 1 attograms. This is about the same weight as a cluster of gold atoms. The physical limit today is about a zeptogram, but the aim to get down to yoctograms or individual hydrogen atoms. This level of precision would allow proteins to be classified by their weights. About 70% of the weight of a virus are the proteins, whereas the viral genome constitutes only about 10% of the weight.
The process of viral replication in the host bacterial cell is, at times, similar to a military invasion. This virus exploits a particular feature of many bacteria, i.e. they have what are called pili, or hair-like appendages on their surface. These pili are antigenic, and for our specific target bacteria the pili are a specific type called a 'type IV pili' which means that they can generate motile forces, i.e. they attach to something and then contract, working like a grappling hook. So the pili are bacterial adhesins, are an essential step in pathogenies or infection, and are required for microbial colonising of a new host. These pili are usually about 0.3 to 1.0 micrometres long and about 7 nanometres in diameter, and are distributed all over the bacterial cell surface. The below schematic from this reference outlines the lifecycle of our virus.
In (A) the virally encoded protein P3 (the famous 'spike') mediates binding of the virus to a pilus, which retracts pulling the virus particle (virion) into contact with the bacterial outer membrane (ready for endocytosis or entry into the bacterial cell).
In (B) the fusogenic protein P6 can work to fuse the viral envelope with the bacterial outer membrane and release the 'nucleo-capsid' into the periplasmic space, i.e. a gel-like matrix that sits between the bacterial outer membrane and the inner bacterial cytoplasmic membrane.
In (C) the thin inner wall which is now exposed and can be digested by peptidase, a nucleo-capsid associated muramidase (lysis enzyme P5) which is designed to break down the peptidoglycan layer of cell membranes.
In (D) the bacterial cytoplasm is now exposed, and protein P6 causes a fusion of the viral envelope with the bacterial outer membrane releasing the inner nucleo-capsid into the host cytoplasm (it is this inner capsid, which if you remember is also called the 'polymerase complex'). The entry process is a kind of endocytic process, a cellular process in which chemical substances are brought into a cell, i.e. substances are surrounded by an area of the cell membrane, which then buds off inside the cell.
In (F) the entire virion has entered the host bacterial cell and starts immediately the transcription of the three precursor genomic segments (i.e. the short, medium and long positive-sense RNA segments), which are then released through passive channels in P4. Remember, our capsid is fully equipped to do this.
In (H) the proteins P1, P2, P4 and P7 form the empty procapsid, ready to become a polymerase complex. What actually happens is that 6 P4 monomers (just molecules that react together to form larger polymer chains) react together to form a hexamer (a particular type of oligomer), mediated by nucleoside triphosphate (NTP) and magnesium cations. NTP is a vital biomolecule as a building block of nucleic acids and are incorporated by many polymerases, and I presume it is available inside the bacteria, as are magnesium cations which are an important nutrient in bacteria. The P4 hexamers nucleate the assembly with the major capsid protein P1 (i.e. the first step in self-assembly). After this more P4 hexamers and the minor capsid protein P7 are recruited to stabilise the nucleation complex, and the result is an empty procapsid.
In (J), once the short, medium and long segments have been packaged, the single-stranded RNA (ssRNA) genomes are replicated and late phase transcription begins. From the late transcribed genome (short and medium segments), the surface shell and envelope proteins are translated, starting with the P8 shell.
In (K) the P8 shell is formed around the expanded genome containing the polymerase complex. RNA-dependent RNA polymerase (RdRp is the protein P2) replicates the ssRNA to double-stranded RNA (dsRNA) and the capsid undergoes conformational changes and expands (and the P8 shell is completed). At this point the double-layer procapsid is now called a nucleo-capsid.
In (L) the lipid-protein envelope is formed around the nucleo-capsid, although the exact process is not clearly understood, but it appears to directly depend upon the major envelope protein P9 and the non-structural protein P12.
The example of the Trojan Horse comes to mind, because the genomic double-stranded RNA (dsRNA) remains enclosed by the inner capsid, 'hiding' from the host antiviral proteins. It does this because if dsRNA is found in a cell, it usually means that it has been infected. There are specific cellular mechanisms that can change the behaviour of the cell in case of infection, e.g. it can activate what is called RNA silencing, or even trigger its own death. So to avoid detection, the genome of a dsRNA virus is always shielded inside its protein shell. Also viral replication and transcription processes take place within the shell, and therefore this innermost protein shell must also include the necessary enzymes for these activities. Another reason why the viruses must have their own dsRNA dependent polymerases, is that the cellular polymerases cannot transcribe dsRNA. Because of the multitude of functionalities it incorporates, the inner core of the virus is often termed the 'polymerase complex' or transcription complex. And, if you remember, it was the bacteria that invited this attack by pulled the virus particle towards it using its pili.
So we see that once inside the host, the virus will try to exploit the features of the bacteria to create copies of itself, but still protecting it's dsRNA from the host's antiviral proteins in the cytoplasm. Now inside the P1 shell full-length viral genome segments are transcribed by about 10 copies of the RNA-dependent RNA polymerase (RdRp P2) and released into the host cell cytoplasm. Early in the infection cycle approximately equal amounts of mRNA molecules are transcribed from the short, medium and long segments, whereas later the short and medium segments dominate. The specific form of mRNA is called polycistronic, which just means that it is a form of mRNA that can encode two or more viral proteins, i.e. the messenger can be later cleaved into individual messages, each of which is translated into a single viral protein. This phase of transcription to mRNA is required for the synthesis of additional copies of the viral proteins for subsequent dsRNA replication. Once the viral mRNA is exported to the host cytoplasm it must be translated. This means that the viral mRNA is decoded to produce specific amino acid chains, or polypeptides (i.e. chains of about 50 amino acids), and the viral mRNA does this by hijacking the host cellular protein synthesis machinery. As these new viral proteins are synthesised some of them self-assemble into a so-called 'pro-core" or 'procapsid' particle, which I think is just another term for an empty polymerase complex. This is not an unusual process since many viruses rely on self-assembly of their capsids to protect and transport their genomic material. In addition the particle or capsid recognises the specific ssRNA and draws it inside. The 'routine' is that capsomeres (i.e. basic assembly subunits of proteins) are continually produced in the host cytoplasm while capsids are being built, packaged and leaving the host cell for a new round of infection.
So in a new capsid the polymerase complex starts to package the three positive-strand transcripts, and synthesise negative-strands. As RNA replication and packaging advance, the polymerase complex will inevitably expand (I think this becomes the nucleo-capsid). And the nucleo-capsid surface shell starts to be assembled around the polymerase complex, acquiring the protein P5 and the envelope. Spikes are assembled on the envelope surface, and finally mature versions are released through virus-induced host cell lysis.
Keeping it as simple as possible, if the virus wants to reproduce (duplicate itself) it will need to bring together inside the host bacterial cell an mRNA template, ribosomes (only found in the host), tRNA (that links mRNA with amino acid sequences of proteins) and various enzymatic factors. Firstly, the virus must produce its mature mRNA (i.e. by transcription and any post-transcriptional modifications needed). Then during translation the viruses' mRNA template is read by the host bacteria's ribosomes to determine the sequence of amino acids, and then it encodes them to form a polypeptide chain. The job of the tRNA is to deliver the correct amino acids to the ribosomes. This polypeptide chain must 'fold', and it starts with a basic form of protein structure called the primary structure (i.e. a polypeptide chain or linear sequence of amino acids). Then this chain folds to form a series of small secondary structures. These secondary structures then fold again to form a protein tertiary structure, i.e. the proteins final 3D structure which exposes its active sites so that its ready to interact with other proteins. It is these active sites that directly catalyse chemical reactions (essentially the rest of the proteins' amino acids are there to maintain the tertiary structure and keep the active sites exposed). Each active site is optimised (high specificity) to bind a particular substrate and catalyse a particular reaction (see protein dynamics).
Again trying to keep things as simple as possible (despite the jargon). In (1) we see the three dsRNA genomic segments and the newly transcribed short, medium and long RNA segments being extruded from the 'procapsid'. In (2) the cellular protein synthesis machinery translates +ssRNA producing the viral proteins P1, P2 (the RdRp), P4 and P7 co-assemble. The newly produced proteins assemble (3) into empty polymerase complexes in the 'procapsid'. Each polymerase complex in the assembled 'procapsid' packages one copy of the short, medium and long segments. The RNA's are draws inside the core using the hexameric packaging motor P4, which results in an expansion of the 'procapsid', and this is where it gets renamed and becomes a 'nucleo-capsid'. Upon packaging (5), multiple RNA's are replicated inside the 'nucleo-capsid' by P2, forming double-stranded RNA. Not shown, but additional proteins are translated from the short and medium segments and they go to make up additional layers on the 'nucleo-capsid' before the mature particle in released by lysis of the host cell.
So to summarise, once inside the host cell cytoplasm, polycistronic mRNA is transcribed by the viral RNA-dependent RNA polymerase (RdRp P2) inside the core particle. The mRNA is then released into the host cell cytoplasm through passive channels in protein P4, this ensures that dsRNA is never exposed to the cytoplasm itself. Full-length transcripts from each of the dsRNA segments are synthesised. These transcripts are then used as templates for translation by the host polymerase into proteins which assemble into new viral particles (virions).
Above we have a description of the different intermediary stages in the cystovirus assembly pathway. It's nice to write "which assemble into new viral particles", but what does that really mean? Experts think that the assembly of virus capsids is essentially a thermodynamic process. The starting point is the hydrophobic nature of the viral coat proteins which means that they naturally aggregate to exclude water molecules. The hydrophobic effect is essential to life, and is also the same principle that separates a mixture of oil and water. The net attractive interaction is very weak, so it actually proceeds through a cascade of lower-order reactions, in particular nucleation-and-growth. This involves the construction of repeating units of the hexamer protein P4 (see oligomers) joining together with three-dimensional molecules of protein P1 (monomers). P1 being the major coat protein of the polymerase complex, exposes on its surface inside the capsid, specific RNA binding sites in a segment-specific manner. This assures precise packaging of all three single-stranded genomic precursors. My (very limited) understanding is that many viruses package their genome into capsid using packaging motors power by the hydrolysis of ATP (adenosine triphosphate) with MgADP (adenosine diphosphate) being the substrate for ATP synthesis. In simple terms ATP supplies the energy needed to do work in a biological system. So P4 is a hexametric ATPase that serves as the RNA packaging motor in double-stranded RNA bacteriophages from the Cystoviridae family. What this means is that the so-called "packaging NTPase P4" acts as a molecular motor converting chemical energy into mechanical energy. We saw that P4 has a duel role, it is an active motor during packaging, and passive pore used for the exit of nascent transcripts transcribed by the polymerase P2.
P1 and P4 then interact with protein P2 and protein P7 to form a polymerase complex which is the viral procapsid which finds its way into the host cytoplasm. Remember that for our dsRNA virus the capsid houses a multitude of functionalities and is often termed the 'polymerase complex'. The procapsid consists of 120 copies of P1, 12 of P2, 12 of P4, and 60 of P7, and is designed to recognise the three viral single-stranded genomic precursors (i.e. the short, medium and long RNA segments) by their unique signals situated at their ends. Once inside the procapsid, P4 can start to package single copies of each.
Above we have a 3-D reconstruction where we can see in (a) the protein P8 forms the outer shell of the nucleo-capsid when it contains nucleic acid (a true 'chain mail' protecting its contents with different trimers coloured yellow, green, gold and brown). We can see the twofold, threefold and fivefold symmetry axes, with NTPase P4 sitting on the fivefold symmetry axes. As a reminder, the name nucleo-capsid is when the shell contains nucleic acid, so in (b) we can see the structural framework of the polymerase complex core (or procapsid) formed by 120 copies of the major structural protein P1 (red and blue), as well as deep in the interior the double-stranded RNA (dsRNA).
Before closing this description it is worth noting that many different virus envelopes might look quite similar in their overall size and appearance, but the internal construction can be quite different. One expert noted that in reality the structure of viruses are a lot more 'sloppy' than might be suggested on this webpage, some are conical shaped, others rod shaped, and other just look like a mess.
As an aside there are several recent publications showing that polymerase complex and nucleo-capsid can be assembled 'in vitro' from purified proteins and RNA components, and that a fully functioning virus can be created and rendered infectious.
Are there other types of viruses?
This is really a kind of sub-question to the question on the nature of a virus. To answer that question we described a particular type of virus, and of course it left open the question concerning other types of viruses. And the answer to this new question is 'Yes', there are many other types of virus.
Cellular life is defined by a Three-Domain System which includes Archaea, Bacteria and Eukaryote. The first two are all prokaryotic micro-organisms, which means that they are cellular organisms (so life forms) that lack a nuclear membrane-enclosed nucleus. All cells possess a cytoplasm (a gell-like substance composed of water and dissolved chemicals needed from growth). This is contained within a cell membrane which also contains at least one chromosome (a DNA molecule carrying the genome or genetic blueprint of the cell), and ribosomes (organelles used for the synthesis of proteins). Below we have a typical prokaryotic cell containing a cell (plasma) membrane, chromosomal DNA concentrated in a nucleoid, ribosomes, and a cell wall. Some prokaryotic cells may also possess flagella (used for locomotion), pili (are antigenic), fimbriae (adhesins), and capsules (outer envelope). For the unique characteristics of prokaryotic cells, check out this learning object.
Below we have a typical eukaryotic cell, the third domain of life, and which includes all animals, plants and fungi. As you can see there are major differences between these cell types, most notable is that eukaryotic cells have a nucleus surrounded by a complex nuclear membrane that contains multiple, rod-shaped chromosomes. For the unique characteristics of eukaryotic cells, check out this learning object.
So we have three domains of life and all of them possess genomes consisting of double-stranded DNA (dsDNA). Also they all employ a standard system of DNA replication (the most essential part of biological inheritance) and gene expression (the synthesis of a gene product that are usually proteins, but can also be RNA).
TFor information: So all living organisms consist of elementary units called cells, which membrane-enclosed compartments that contain genomic DNA (chromosomes), molecular machinery for genome replication and expression, a translation system that makes proteins, metabolic and active transport systems that supply monomers for these biological processes, and various regulatory systems. All cells reproduce by cell division, an elaborate process that ensures faithful chromosome segregation, i.e. transfer of copies of the replicated genome into daughter cells. The best characterised cells are the relatively large ones of animals, plants, fungi, and diverse unicellular organisms known as protists, such as amoebae or paramecia. These cells possess an internal cytoskeleton and a complex system of intracellular membrane partitions, including the nucleus, a compartment that encloses the chromosomes. These organisms are known as eukaryotes because they possess a true nucleus. In contrast, the much smaller cells of bacteria have no nucleus and are named prokaryotes.
In the 20th century, using the electron microscopy, tiny particles were discovered that were much, much smaller than cells, these are the viruses. Viruses, often termed selfish genetic elements, typically encode some proteins essential for viral replication, but they never contain the full complement of genes for the proteins and RNA required for translation, membrane function, or metabolism. Therefore, viruses exploit cells to produce their components.
Classifying organisms (known as taxonomy) is one of the oldest occupations of biologists and they did a good job for animals and plants, but they were not quite so successful for simpler multicellular life-forms, such as fungi and algae. And, taxonomy was nearly helpless when it came to unicellular organisms, particularly bacteria. How could they include these new tiny organisms?
In 1977 Carl Woese (American, 1928-2012) and his co-workers compared the nucleotide sequences of a molecule that is conserved in all cellular life forms, the small subunit of ribosomal RNA known as 16S rRNA. By comparing the nucleotide sequences of the 16S rRNA, they were able to derive a global phylogeny of cellular organisms for the first time. This phylogeny overturned the eukaryote-prokaryote dichotomy by showing that the 16S rRNA tree neatly divided into three major branches, which became known as the Three-Domain System with Archaea, Bacteria and Eukaryote.
TThis breakthrough was momentous for at least three reasons. First, they traced the evolution of cellular life directly by comparing molecules that actually undergo evolutionary changes. Second, the detection of the 16S rRNA sequence conservation in all forms of cellular life provided the strongest possible support for Darwin's hypothesis of the common ancestry of life on Earth. These results provided strong evidence that the last universal common ancestor (LUCA) of all cellular life really existed, although we still know little about what this ancestor was like and how it lived. Finally, the three-domain structure showed that evolutionary history is decoupled from biological organisation. Indeed, archaea and bacteria appear very similar biologically (tiny cells without much internal structure) and different from eukaryotes. However, today it is thought that LUCA is placed between bacteria on one side and archaea together with eukaryotes on the other side, implying that archaea and eukaryotes share a common ancestor to the exclusion of bacteria. A finding that emphasises that similarity of biological organisation and common ancestry are two very different things.
In contrast to fact that archaea, bacteria and eukaryotes (the three domains of life) all possess genomes consisting of double-stranded DNA (dsDNA), viruses are not limited to that one genomic structure. There are of course dsDNA viruses, but there are also viruses with single-stranded DNA (ssDNA), double-stranded RNA (dsRNA), and single-stranded RNA (ssRNA). In this last category, the ssRNA can be either positive-sense (+ssRNA, meaning it can transcribe a message, like mRNA) or it can be negative-sense (-ssRNA, indicating that it is complementary to mRNA). Some viruses even start with one form of nucleic acid in the nucleo-capsid and then convert it to a different form during viral replication.
Virus taxonomy is focussed on naming conventions for viruses, however there is an alternative called the Baltimore classification, named after David Baltimore (American, born 1938). This approach classifies viruses based on the way they produce messenger RNA (mRNA) and thus replicate in a host cell. Above we can see seven groups, which also include groups VI and VII which are retro-transcription viruses. These last two groups are for viruses that once inside the host's cytoplasm produce DNA from their RNA genomes (using a reverse transcriptase), which is then incorporated into the host's genome by an integrase enzyme, at which point the host treats the viral DNA as part of its own genome and goes ahead to produce new copies of the virus.
Below we are going to describing very quickly these different types of virus, so you can also jump over this section if you wish (the basic reference is Viruses).
Since viruses lack ribosomes (and thus rRNA), they cannot be classified within the Three-Domain Classification scheme for cellular organisms. Ribosomes are the macromolecular machines in living cells that perform the biological protein synthesis which is needed to survive. The central them of the Baltimore classification is "all viruses must synthesise positive-strand mRNA's from their genomes, in order to produce proteins and replicate themselves". Seven families (or classes) of viruses are recognised, depending on the way the viral genome produces its mRNA. Viruses can replicate DNA and/or RNA, synthesise RNA from DNA or vice versa, but lack a complete system to make proteins. So they have to rely on host cell ribosomes. On the other hand, host cells can synthesise proteins only from mRNA strands.
Class I is for DNA viruses with a double-stranded DNA (dsDNA) genome, so they have a genome exactly the same as the host cell that they are infecting. For this reason, many host enzymes can be utilised for viral replication and/or viral protein production. The flow of information follows a conventional pathway: dsDNA → mRNA → protein, with a DNA-dependent RNA-polymerase producing the mRNA and the host ribosome producing the viral protein. The flow of information DNA → RNA → protein is a central dogma of molecular biology.
In the case of viruses no viral genome encodes a complete system of translating proteins, so they must first synthesis viral mRNA which will then produce the viral proteins, so as such the virus is completely dependent upon the translational machinery of the host cell. So how does it high-jack the host's machinery?
The virus often employs strategies for control of gene expression, to insure that particular viral products are made at specific times in the virus replication. For example, one of the early viral proteins modifies the host RNA polymerase so that it will no longer recognise the host's own commands. A further modification (catalysed by middle-stage viral proteins) further modified the host RNA polymerase so it will recognise viral genes coding for late-stage proteins. This insures an orderly production of viral proteins.
There are several animal viruses with dsDNA genomes, such as the pox viruses and the adenoviruses. The herpesviruses have several notable features, such as the link with cancer and the ability of the viruses to remain in a latent form within their host.
You will be relieved to note that I do not intend to detail all the different classes of virus.
Class III are double-stranded RNA (dsRNA) viruses that infect bacteria, fungi, plants, and animals, such as the rotavirus that causes diarrheal illness in humans.
However, host cells do not utilise dsRNA in any of their processes and have systems in place to destroy any dsRNA found in the cell. Thus the viral genome, in its dsRNA form, must be hidden or protected from the host enzymes. Host cells also lack RNA-dependent RNA polymerase (RdRp), necessary for replication of the viral genome, so the virus must provide this enzyme itself. The viral RdRp acts as both a transcriptase to transcribe mRNA, as well as a replicase to replicate the RNA genome.
Class V are negative-sense single-stranded RNA (-ssRNA), and include influenza, rabies, and Ebola.
Since the genome of -ssRNA viruses cannot be used directly as mRNA, the virus must carry an RNA-dependent RNA polymerase within its capsid. Upon entrance into the host cell, the plus-strand RNA generated by the polymerase are used as mRNA for viral protein production. When virus genome are needed the plus-strand RNA are used as templates to make -ssRNA.
And finally a quick mention of prions which are infectious agents that completely lack nucleic acid of any kind, being made entirely of viral protein. They are in fact misfolded proteins which are able to transmit their misfolded shape onto normal variants of the same protein. Prion diseases may be genetic, infectious or sporadic disorders, and are associated with several fatal and transmissible neurodegenerative diseases in both humans and a variety of animals, including bovine spongiform encephalopathy (BSE or “mad cow disease”), Creutzfeld-Jakob disease in humans, and scrapie in sheep. Prions have been termed "infectious proteins" that only attacks the nervous system. With prions there is a distinction to make between contagion and infection, with contagious diseases spread by contact, and infectious diseases spread by infectious agents. So if something is contagious it is always infectious, but something infectious is not always contagious (i.e. prions are not contagious). Prions can bind to metal and plastic surfaces without losing infectivity, and are extraordinary resistant to conventional sterilisation procedures. The typical route is by eating something contaminated, i.e. the BSE outbreak occurred from BSE-prion contaminated foodstuff.
For information: At the beginning of this last section we mentioned that both prokaryotic and eukaryotic cells possessed genomes consisting of double-stranded DNA (dsDNA). We also mentioned that the genome (the DNA molecule) was divided into one or more chromosomes (46 for humans). In eukaryotic cells the chromosomes are concentrated and enclosed in a compartment called a nucleus. Below we have a typical graphic outlining the basic packaging arrangement.
One problem with this type of graphic is that it hides more that it shows. The nucleus is 10 micrometres in diameter, so very much smaller than a cell (average diameter of a human cell is about 100 micrometres). The distance between a base pair is about 0.34 nanometres, and the total length of the DNA in a human cell is 2.04 metres. So that's about 6.4 billion base pairs per human cell, distributed over 46 chromosomes of differing sizes. DNA content is not directly correlated to the size or complexity of the organism. For example, the frog has almost the same number of base pairs as a human, and a 'simple' amoeba has nearly 100 times more base pairs. A word of warning, the figures quoted above might not be the 'latest' set of accepted figures, for example you can find alternative sources that tell us that the cell nucleus is 2-4 micrometres in diameter. Also some texts can confuse the reader because they write about the average haploid human cell (i.e. for 23 chromosomes) having a total length of DNA of "over 1 metre" and about 3 billion base pairs of DNA.
So the task is to pack that 2.04 metres inside a sphere 10 micrometres in diameter, and of course remembering to leave enough access for all the enzymes and regulatory proteins so they can organise replication and transcription. Not surprisingly everyone thought that the DNA must be folded and coiled up, but the question was how?
The basic process is as follows:-
A cell nucleus has a set of 46 long chains of double-stranded DNA (dsDNA) consisting of all those nucleotide base pairs. Each long chain is typically home to tens of millions to hundreds of millions of base pairs along with numerous repetitive sections. Human cells, other than human sex cells, are diploid and are in fact home to two examples of 23 different chromosome (called sister chromatids). Human sex cells (egg and sperm cells) contain a single set of chromosomes and are known as haploid. In addition each linear chromosome has two telomeres, one at each end, and a centromere that is used to link pairs of sister chromatids.
There is often some confusion about the term chromatid and chromosome. So a chromosome is a long DNA molecule tied together with proteins, and usually found in a compact dense complex called chromatin. Mitosis is a part of the cell cycle in which replicated chromosomes are separated into two new cells, and the cycle is divided into a number of phases. The first step is DNA replication and when complete all the chromosomes have been replicated, i.e. each chromosome now consists of two sister chromatids. After a checkpoint to ensure that there is no damage, the cell enters the mitotic phase. So chromatid refers to one strand of the two identical threadlike stands of a replicated DNA molecule. The identical copies of the chromosome are called sister chromatids and are joined at the centromere. It is this butterfly fly form that is often presented as "chromosomes" simply because they are what you can see under a microscope. Once the sister chromatids separate, each becomes known as a daughter chromosome.
In the above graphic we have a pair of homologous chromosomes, because one set is maternal and the other paternal. They will pair up inside the cell during fertilisation. We can see that these 'homologs' have the same genes (heritable units) in the same loci (i.e. positions). An allele is one of two or more versions of the same gene, e.g. the gene that controls the blood group has six possible alleles. For a given gene locus if the two chromosomes contain the same allele they are homozygous. If an individual carries two copies of the allele then it can be a "dominant allele" which codes for a dominant trait, or it can be a "recessive allele" which codes for a recessive trait. In the Wikipedia description of dominance there is a section that addresses common misconceptions that is worth a read since it highlights the fact that dominance is not an inherent feature of an allele or its phenotype (observable trait). A diploid organism (i.e. with pairs of chromosomes) with two different alleles at the same gene locus is called heterozygote for that particular allele (and they also can be dominant or recessive).
In eukaryotic cells the genes are interspersed throughout a chromosome and a typical chromosome contains between a few hundred and several thousand different genes. A gene consists of two parts, exons that are used to produce proteins, whilst introns are the parts not used. The process of gene expression involves an intermediary step where an exact pre-mRNA copy of the gene sequence is created and then the introns are removed to create a functional messenger RNA (mRNA).
I've not seen any real reasoning about why we have 46 long strands, nor why they are different lengths, nor why certain genes are on certain chromosomes, etc.
In any case the first level of packing is the binding of the chromosomal DNA to histones. In the above graph the 'histone' looks like a little drum, but in reality histones are sub-grouped in to core histones and a linker histone (H1). Four pairs of different core histones (H2A, H2B, H3 and H4) are assembled into an octamer (each core histones is a basic protein with a structure of about 102-135 amino acids). In fact the four 'pairs' don't act as pairs, instead the eight histones link together to form a core particle around which 146 base pairs of DNA are wrapped. Below we look down through the superhelical axis, and we can see the DNA and core histones H2A (yellow), H2B (red), H3 (blue) and H4 (green).
Often this wrapping is presented as a spool of thread around a reel, but the reality is that the DNA segment makes 1.67 left-hand superhelical turns around the octamer. The segment of DNA wound around the eight core histones is called a nucleosome, which is the fundamental subunit of chromatin. A nucleosome is around 11 nanometres in diameter and about 5.5 nanometres long, and a human cell nucleus contains about 30 million nucleosomes. Each nucleosome is connected to the next by section of linker DNA consisting of between 38 and 53 base pairs (technically a nucleosome also includes one section of linker DNA). Once inside the nucleus the DNA segment and the octamer bonds will be broken and reformed during the processes of replication and transcription, and as you can imagine with 30 million nucleosomes there is a constant remodelling of the chromatin structure as bonds are broken and formed. It is in fact the histones that determine which DNA segments are transcribed, since a transcription can only take place if the histone allows access to the segment (in response to certain signals). One description highlighted the fact that the DNA segment is negatively charged, and the histones are positively charged, so this helps keep the two tightly packed together, but the electrostatic interaction can be neutralised during the processes of replication and transcription. Another way to look at it is that the histone octamer is stable only in the presence of DNA.
It's interesting to know that the nucleosome is universal to all eukaryotes and has remained conserved through evolution, e.g. the difference in H4 between a cow and pea plant is only two amino acids in more than 100.
The linker histone (H1) sits on the entry and exist sites of the DNA to keep the DNA correctly wrapped around the core histones (it also wraps another 20 base pairs with it, so making two full turns around the octamer). A term often used to describe the histones and the DNA segments, is a 11-nanometre fibre like "beads on a string", with the DNA as the string holding together the core histone "beads" (as seen below).
Strictly speaking H1 sits on top of the nucleosome 'bead", and a nucleosome plus the H1 histone is called a chromatosome. The building of a nucleosome provides a packing ratio of about five to ten (usually said to be an average of seven), and is just the first step in a long process leading to a 50,000-fold increased compactness of DNA as compared to unpacked DNA.
Given that each chromosome contains over 100 million base pairs, wrapping 166 of them around a octamer means that there are several hundreds of thousands nucleosomes still to pack and compress.
The next step in compacting the DNA is to move from a 10-nanometre fibre to a 30-nanometre fibre, with a packing ratio of about 50. This involves finding a way to fold up and stack the "beads on a string". As seen below, one idea was the so-called solenoid model (a) which involved forming a compact left-handed helix, with each turn of the helix involving about 6 nucleosomes. However, the suggestion now is that with the help of the H1 histone the folding and stacking yields more of a zig-zag (b). The difference appears to depend on the exact experimental conditions and there is still the question about how well they mimic 'in vivo' conditions. One suggestion is that the solenoid model might still be valid if H1 is not there or with longer linker DNA sections.
It's worth mentioning that the processes of replication and transcription requires that the two strands of DNA come apart temporarily, thus allowing polymerase access to the DNA template. But being tightly wound into 30-nanometre fibres makes it difficult for the enzymes to open and copy the DNA. So there is a process where the histones temporarily move to expose underlying DNA sequences. Once completed the chromatin is retuned to its compact state.
The next step is to take the 30-nanometre fibre and form DNA supercoils with a diameter of about 700 nanometres. For this our chromatosomes will need some help in the form of scaffold/matrix attachment regions. As far as I can see the next step is really in two phases of coiling or compacting, i.e. the creation of a 300-nanometre wide looped fibre which is then coiled again to produce the 700 nanometre wide chromatid (which I think some experts called a super solenoid). So the 30-nanometre fibre is looped (radial loop domain) by being anchored to discrete sites on an 'extended' scaffold. The loops vary in length between 15 and 30 micrometres. Then in a second part the scaffolding is condensed into a supercoil.
What we have seen is that the nucleic acid has gone through four levels of structural arrangement, primary (linear sequence of nucleotide), secondary (local folds), tertiary (3D folds of subunits) and quaternary. Each step resulted in a higher-order of chromatin fibres. The DNA in a chromosome can be found in one of two states, euchromatin or heterochromatin.
When the DNA is condensed or highly packed into chromatin in the quaternary structure it is called heterochromatin, and in this state gene transcription is inhibited. However every so often the DNA can be exposed for transcription. Euchromatin is a de-condensed state reminiscent of those unfolded "beads on a string", i.e. the wrapping is loose so that the raw DNA can be accessed and participate in the active transcription of DNA to messenger RNA (mRNA) products. Below we can clearly see the condensed and de-condensed states in the chromosome.
We can also see a nucleus with the nuclear envelope and nuclear pores (small arrows), and we can see that the heterochromatin tends to collect in clumps and often near the nucleus inner wall and the nucleolus. Originally it was thought that heterochromatin was not transcribed, however today we know that heterochromatin is transcribed but is continuously turned over via RNA-induced transcriptional silencing.
Finally we now have the genome (the DNA molecules) packed inside the nucleus in the form of chromatin fibres. We don't intend to go into cell division cycle, but the question remains, what does it all look like packing into that small nucleus? Below we have a very recent reconstruction of the chromosomes inside a cell nucleus, which is probably significantly different from the images you might have seen in books and the press.
In fact what we see above is called the "chromosome territorial model". Despite the apparent disorder, during the interphase (below on the left) chromosomes occupy defined regions of the nucleus. In the later part of interphase the new DNA molecules formed are not distinct but intertwined. Below on the right we have the prophase which is marked by the initiation of condensation of chromosomal material. It becomes untangled and the chromosomes appear composed of two chromatids attached together at the centromere.
A chromosome territory is typically about 1-2 micrometers in diameter and actually consists of subdomains, whilst at the same time they mingle with their neighbours. However the territories appear to be transmitted from parent to daughter cells. Nor are the territories randomly distributed, because some chromosomes often sit next to other specific chromosomes. It would appear that the arrangement patterns of chromosomes are specific to both cell type and tissue type. Changing the position of a chromosome appears to alter gene expression, and may even be linked to disease.
Are there a lot of viruses that attack humans?
On this webpage we already mentioned that Wikipedia's webpage on history of virology lists some viruses that attack humans, namely poliovirus (first identified in 1908), yellow fever virus (1927), mumps virus (1934), Dengue fever virus (1943), Varicella zoster virus (1952), measles virus (1954), rhinovirus (1956), Rubella virus (1962), Hepatitis B virus (1963), Norovirus (1972), Ebola virus (1976), and HIV (1983). To this list we might add Chikungunya (1952), coronavirus (late 1920's), gastroenteritis (1825), herpesvirus (1888), rotaviruses (1973), shingles (1831), SARS (2003), influenza (5th century BC), MERS (2012), rabies (ca. 1950 BC), Zika (1947), and of course SARS-CoV-2 (2019). And given that we haven't mentioned the numerous lesser known viruses that attack humans (as well as other animals), we might well think that there are a lot of viruses that attack humans. This is not to detract from the human misery and death caused by viruses, but we would be very wrong to assume that we are their preferred target.
For information: The study of ancient DNA has established that many diseases have been co-evolving with humans for many 1,000's of years. For example, the human papillomavirus co-evolved with ancestral Africans from at least ca. 500,000 years ago, and malaria-causing parasites are thought to have been at the origin of certain human immune-regulating genes at least 70,000 years ago. The strong selection of pathogens in their interaction with the human immune system has been used to help track the migration out of Africa of behaviourally modern Homo sapiens more than 100,000 years ago. The "Pleistocene disease baseline" is defined as when most modern diseases derived from pre-65,000-year-old sub-Saharan Africa. Current evidence suggests that the viruses for Hepatitis B, measles HIV, human papillomaviruses, Dengue fever, and smallpox almost certainly have an African origin (and as do cholera, sleeping sickness, malarias, Leishmaniasis, and the plague). The potential impact of disease on prehistoric humans is illustrated by the fact that about 60% of contemporary hunter-gatherers succumb to disease before the age of 15 years old. In addition the idea that palaeopathological examples of cancer only dates to the past 500 years (i.e. cancer is a modern lifestyle disease) is based upon the rarity of earlier evidence. But some evidence does exist that showing that Australopithecus and early Homo did suffer from neoplastic tumours, which could date beck to more than 1 million years ago (in most cases of early death there are few or no skeletal lesions). The reality is that oncogenesis has existed from at least 2 million years, and as a pathology it is found in almost all multicellular organisms that appeared during the transition to metazoan life some 1 billion years ago. Two 30,000-year-old were recently discovered in the Siberian permafrost and have been reactivated, just showing how easy it might be for old pathogens to reappear with global warming. This article goes into the details of ancient oncogenesis and human evolution. What is clear is that disease epidemics are not new and they have and will continue to devastate human populations (just think of plague, smallpox, influenza, Zika, Ebola and now SARS).
Firstly it's important to note that animal viruses are recognised by the disease they cause, plant viruses by the disease and plant species that serve as host, and microbial viruses by the organism they infect.
We now know that viruses, particularly bacteriophages, are the most common and abundant biological entities on Earth. They appear also to be the bulk of the genetic diversity on Earth. Viruses are major ecological and even geological agents that in large part shape such processes as energy conversion in the biosphere and sediment formation in water bodes by killing off populations of abundant, ecologically important organism such as cyanobacteria or eukaryotic algae.
Today, approximately 75% of viruses and 50% of bacteria known to cause disease in humans are zoonotic and can be transmitted between animals and people. A wide variety of viruses, mirroring their human analogs, are ubiquitous among animals in nature. A first estimate put the number of still unknown mammal viruses at more than 300,000, but a separate estimate placed the number that might be harmful to humans at 10,000 to 12,000. The idea is that a 10-year program could be developed to detect and identify all viruses harmful to humans. It is often considered that all animal species can become infected by at least one virus (that includes avian, bovine, bat, canine, caprine, cervine, equine, feline, rodent, ovine, porcine, etc. and primates). Some viruses can affect all mammals, e.g. rabies, whereas other viruses infect single animal species, e.g. retroviruses infect cats but not ruminants or horses. Some viruses are deadly to animals and are difficult to control, e.g. foot-and-mouth disease, African swine fever or avian influenza. As an example it was estimated that between 1997 and 2011 more than 15 million animals were slaughtered because of foot-and-mouth disease, and the total long-term cost of this disease was estimated at over $8 billion. Some viruses affect pets, e.g. distemper or equine viral arteritis. Yet other viruses affect animals in the wild, e.g. myxomatosis in rabbits and ranaviruses in amphibians. The US has estimated that losses are the equivalent of $17.5 billion annually due to livestock diseases (this includes virus, bacteria, fungal diseases, and nutritional defects).
An additional indirect cost is that associated with visits to emergency and outpatient services, hospital admissions, etc. of people affected by viruses or bacteria in food products. In Europe in 2008 this was estimated at around €250 million per year, but the US estimated the cost in 2011 at over $2 billion when including all societal costs. And it was mentioned that non-typhoidal Salmonella infections killed about 150,000 people annually in the period 2007-2012 (mostly associated with contaminated eggs or chicken). We can compare this with the 150,000 deaths annually from rabies, HAT (sleeping sickness), and Leishmaniasis. And we should not forget that the Chinese estimated that rabies only in their pets cost them about $0.5 billion annually.
In another report dating from 2016 it was estimated that globally in excess of $4 billion was spent annually on veterinary services and animal health. Also in the period 2000 to 2015 it was estimated that major animal disease outbreaks had cost $12.1 billion globally (mostly TSE).
The economic impact of SARS was estimated at about $16 billion, and who knows what the full economic effect of SARS-CoV-2 will be. In an early warning project (2009-2019) more than 240 novel viruses were found in areas where animals and people live in close contact and depend upon the same natural resources. This included new coronaviruses, but not specifically SARS-CoV-2, but the warning was clear. In a different report which focussed only on the total impact of a new global influenza pandemic of the type seen in 1918 the estimated cost was $500 billion (including mortality, income loss, reduction in size of labour force, and social measures to break transmission). Another 2019 report estimated that a global pandemic could cost anything up to $3 trillion depending upon severity and the cost and time taken to bring it under control. What was certain was that the poor and vulnerable always suffer most, and the rich least.
Arboviruses are unique in infecting both vertebrates and blood-sucking arthropods (e.g. insects, spiders, ticks, etc.). The arthropod acquires the virus from infected vertebrate, it becomes established in their salivary glands, and is subsequently transmitted to other vertebrates by bite. There are at least 500 of these animal viruses classified as arboviruses, in addition there are viruses that don't infect the arthropods but simply use it as a transport vector. And there are several hundred plant viruses that also depend upon arthropods as a delivery vector. It would appear that as humans domesticated lands in previously undisturbed ecosystems they became new targets for existing animal viruses, the most notable being yellow fever, equine encephalomyelitis, dengue fever and other related hemorrhagic fever viruses.
Phytoviruses (plant viruses) are highly prevalent in plants worldwide, including vegetables and fruits. Humans, and more generally animals, are exposed daily to these viruses, among which several are extremely stable. It is currently accepted that a strict separation exists between plant and vertebrate viruses regarding their host range and pathogenicity, and plant viruses are believed to infect only plants. Accordingly, plant viruses are not considered to present potential pathogenicity to humans and other vertebrates. Notwithstanding this, there are many examples where phytoviruses circulate and propagate in insect vectors. In addition, there is a close relation between some plant and animal viruses, and almost identical gene repertoires. Moreover, plant viruses can be detected in non-human mammals and humans samples, and there is evidence of immune responses to plant viruses in invertebrates, non-human vertebrates and humans, and of the entry of plant viruses or their genomes into non-human mammal cells and bodies after experimental exposure.
Certainly in the last 20 years there has been an increase in the incidence of food-borne diseases worldwide, with viruses now recognised as a major cause of these illnesses. The viruses implicated in food-borne disease are the enteric viruses, which are found in the human gut, excreted in human faeces, and transmitted by the fecal-oral route. Many different viruses are found in the gut, but not all are recognised as food-borne pathogens. The enteric virus pathogens found in human faeces include noroviruses, enteroviruses, adenoviruses, hepatitis A virus (HAV), hepatitis E virus (HEV), rotaviruses, and astroviruses, most of which have been associated with food-borne disease outbreaks. Noroviruses are the major group identified in food-borne outbreaks of gastroenteritis, but other human-derived and possibly animal-derived viruses can also be transmitted via food.
Four of the enteric viruses, noroviruses, hepatitis A virus (HAV), rotaviruses, and astroviruses, are reported to comprise 80% of all food-borne illnesses in the United States, with noroviruses by far the greatest contributor at an estimated 23 million cases per year. Enteric viruses are generally resistant to environmental stressors, including heat and acid, and most resist freezing and drying and are stable in the presence of many types of solvents. The resistance of enteric viruses to environmental stressors allows them to resist both the acidic environment of the mammalian gut and also the proteolytic and alkaline activity of the duodenum so that they are able to pass through these regions and colonise the lower digestive tract. These properties also allow survival of enteric viruses in acidic, marinated, and pickled foods, frozen foods, and lightly cooked foods such as shellfish. Most enteric viruses are believed to have a low infectious dose of 10–100 particles or possibly even less. Hence, although they do not multiply in food, enough infectious virions may survive in food, be consumed, and cause disease.
In an analysis from 2011 there were at least 1,300 distinct virus species that infected plants. Viruses that attack plants are similar to those that infect other organisms, however they have relatively small genomes and have, in general, a simpler particle structure. No virus has evolved a mechanism to directly penetrate the plant cell wall and enter a plant cell. So one technique is to infect pollen or ovule so the plant begins life infected. Or they can enter through wounds to the plant cell wall and cell membrane. Most plant viruses use insects to move from plant to plant. They don't infect the insects but can travel on or in the insect. The plant virus then takes advantage of the hosts transport system, i.e. plasmodesmata which connects individual plant cells and phloem vessels. This enables the virus to travel to distant sites in the plant. So the routine is to enter a plant cell, replicate, move to neighbouring cells, move into the phloem, then roots, and then to cells in the apical dome (leaf tips) of the plant. This process can be quite fast, taking only 4-5 days from an initial infection to an infection of the main plant stem. The result is malformations, stunted growth, and even necrosis.
According to one survey the most important individual viruses in terms of economic impact were those that attacked (in order) tobacco, tomatoes, cucumbers, potatoes, cauliflowers, and plums. The author of the survey thought that the collective impact of several different viruses on rice, wheat, maize and vegetable crops was likely to have the most economical impact. It should be noted that plants are also the target of a vast number of bacterial pathogens as well (some affect a broad range of plants and others have very specific targets, e.g. rice). Generally, viruses, bacteria and fungi that infect plants do not cause infection in humans.
In one review mention was made of the emergence of germiniviruses from the 1980's (and in particular the begomoviruses). Already in 2003 the crop and economic losses due to germiniviruses was estimated at between $1.3 billion and $2.3 billion for cassava in Africa, $5 billion for cotton in Pakistan, $300 million for grain in India, and $140 million for tomatoes in Florida. Plant viruses are just one crop pest, which are estimated to affect between 20% and 40% of global crop production. Plant diseases are estimated to cost the global economy around $220 billion, and invasive insects another $70 billion annually. However, it was logical to focus on viruses causing diseases in crops, etc. but more recently experts have found an abundance of viruses in wild plants, most asymptomatic. An infection rate of 60% has been mentioned, and as many as 90% of the nucleic acid sequences are novel (not in public databases). The insect vector remains the most common means for viruses to move between plants, and the plant-insect-virus relationships is certainly one of the most ancient, and therefore one of the most intimate and complex. In fact viruses have been credited with conferring drought tolerance on a number of plants and making others less attractive to fungal attacks.
In 2011 the world consumed more than 150 million tons of seafood, for a value exceeding $215 billion, but already wild fisheries were being replaced by aquaculture, and one additional major cost is marine infectious diseases, including the impact of marine viruses. These diseases both reduce survival and affect seafood quality. There are well over 70 different marine infectious diseases that affect everything from oysters to Atlantic salmon. It's difficult to know the situation with wild fisheries, but in aquaculture the control of marine infectious diseases is a major economic barrier. According to one expert 25% of marine infectious diseases are due to viruses, 34% to bacteria, 19% to protists, and 18% to metazoans (animals). Viruses are know to infect abalone, oysters, shrimps, lobsters, herring, red sea bream, pilchards, and salmon, both in terms of growth and mortality. At least according to one report the principle viruses are aquabirnaviruses (e.g. infectious pancreatic necrosis), hematopoietic necrosis virus and more recently betanodaviruses (e.g. nodavirosis). These viruses can expand rapidly, have a high mortality rate, and can quickly devastate a fish farm. Other viruses affect the weight, appearance, etc. of fish, and directly affect their value. From the 1980's effective anti-viral vaccines appeared. Marine viruses are not transmittable to domestic animals or people. With global warming and increased ocean acidification most experts predict that marine infectious diseases will continue to increase.
Are all viruses bad for us?
Given the present circumstance with the virus SARS-CoV-2 we can be forgiven for thinking that all viruses are pathogens, i.e. nasty things that produce disease. However many viruses in bacteria, insects, plants, fungi and animals appear to be beneficial to their hosts. It's not certain why things have evolved in this way, but the mechanisms of these viruses are now better understood. Some viruses are vital to the survival of the host, some help the host fight for survival, and some are so well integrated with the host that it is now difficult to know where one starts and the other stops.
In the answer to the question "What exactly is a virus?", you might have noticed that there was not a quick, plain vanilla answer. Instead we quickly got bogged down in describing what one particular type of virus did. One suggestion from 2011 is that viruses are "intracellular parasites with nucleic acids that are capable of directing their own replication and are not cells".
There is a concept of symbiosis which is about two dissimilar entities living in a 'intimate' association, where survival might or might not be dependent on one or both partners. Viruses are so-called obligate symbionts, in that they cannot replicate outside of their hosts. The reality is that there are many situations where both the virus and the host benefit from their relationship, and in some cases the virus has become part of the host in a process known as symbiogenesis, leading to a new species. As an example, there are types of parasitoid wasps that lay their eggs in living insect larvae. Normally the immune system of the larvae would prevent the eggs from developing, but the wasp genes carried by the polydnavirus virion suppress this response.
A bacteriophage is a virus that infects and replicates within bacteria, and 'phages' are in fact the most populous organism on Earth. There is something called phage therapy where bacteriophages are used to treat pathogenic bacterial infections, however on the down side some other bacteriophages actually shelter the bacteria from a drug meant to eradicate a disease. Many bacteria carry a viral genome integrated into their own genome (i.e. its genetic material). These so-called lysogenic viruses remain dormant and the bacterium continues to live and reproduce normally (transmitting the dormant virus to daughter cells). The presence of a lysogenic virus makes the bacteria immune to the infectious, or lytic, form of the virus. In a population of bacteria, some of these bacterial cells will be triggered to convert from the lysogenic cycle to the lytic cycle. The results in those cells is that the virus starts to multiply rapidly, producing thousands of progeny and killing the host in the process. Once released into the extracellular environment the virus starts to kill competing bacteria that are not lysogenic for that virus. So the original host cell is sacrificed for the benefit of the remaining lysogenic population of bacteria. You can see that this technique would be particular useful when one bacteria invades new territory occupied by an alternative bacteria. However, there are bacteriophages which produce a toxin to which the host is insensitive, but which kills other types of bacteria trying to invade its space.
A nice example was when striped tulips became an obsession with the Dutch. In the 1600's one bulb could be worth as much as an entire ship and its contents. The problem was that no one knew if the striping would occur in progeny bulbs, so investing in these bulbs became a form of gambling. The reality is that the striping is due to the so-called "tulip breaking virus" which "breaks" the lock on a single colour, and causes flame-like colour striping. The virus was only discovered in 1928, and later it was discovered that it was transferred in a non-persistent manner by some species of aphids (types of greenfly or blackfly). Today the "broken" effect results from breeding and not viral infection.
The 'helpful virus' is not limited to insects, bacteria or tulips, because in patients infected with HIV-1 there is the suggestion that progress to full-blown AIDS is much slower if patients are already infected with the hepatitis G virus (now known as GB virus C), a non-pathogenic hepatitis virus that is common in humans. There are also oncolytic viruses that preferentially infect and kill cancer cells, the first one was approved in 2015.
The evolution of placentals
As a sub-question on how bad or good viruses really are, there is one story that captures the true essence of the beneficial role viruses can play in the life of humans. But first, some basic biology. Humans are vertebrates, as are sharks, birds, and all other mammals, and including many species that have since disappeared.
In 2010 this little furry-tailed mammal (see above), about the same size as a modern squirrel, was postulated to have existed about 65 million years ago. Since then there has been some controversy about this 'discovery'. But the idea was that species such as rodents and primates did not share the Earth with non-avian dinosaurs, but arose from a common ancestor, a small, insect-eating, scampering animal, shortly after the dinosaurs' demise. This little mammal was postulated to be the hypothetical ancestor of an enormously diverse collection of more than 5,100 living species.
Phylogenetics is part of biological systematics, i.e. the study of the diversification of living forms, both past and present, and the correlations and dependencies among living things through time. Phylogenetics is specifically about the evolutionary history and relationships among or within groups of organisms, e.g. species, etc. The relationships are hypothesised from inferences made on heritable traits such as DNA sequences and morphology. And as you might guess computational phylogenetics is the development and application of algorithms for phylogenetic analysis. The aim of course is to built a phylogenetic tree representing how genes, species and other taxa, had a common ancestry and how they evolved over time.
The "65 million years" is important because 66 million years ago there was the so-called Cretaceous-Paleogene extinction event, the mass extinction of ¾ of all plants and animal species on Earth (including dinosaurs). So the suggestion is that this small mammal was responsible for a 'explosion' of animals that emerged and filled the niches left vacant after the catastrophe caused by the impact of a massive comet or asteroid. The key point about this hypothetical ancestor is that our furry-tailed friend produced the largest living branch of the mammalian family tree, the so-called placentalia, i.e. mammals that keep foetuses with placentas as opposed to marsupials such a kangaroos, which raise offspring in pouches, or monotremes such as platypuses, which develop foetuses in eggs. There are alternatives to the 'explosion' model which argues that placentals appeared way before ('long-fuse' model) or just before ('short-fuse' model) the extinction of dinosaurs. There appears to be two different ways of viewing things, palaeontologists infer the rate of change of species that separated based on their morphology (through fossils, etc.) whilst molecular biologists look at the rate of change of species based upon the genetic pools of populations that have produced todays living species. One group argue that DNA evidence and fossil records suggest that the split between marsupials and placentalia occurred around 160 million years ago. On the other hand molecular data suggest that most of the modern groups of animals actually began to slowly diversify about 100 million years ago. But as far as I could tell both groups agree that within a few million years after the disappearance of dinosaurs both the fossil record and DNA evidence shows a significant increase in mammalian diversity. Is the molecular clock used to date molecular data right, or is it that they haven't yet found the right fossils?
For information: Just reading through Wikipedia, viviparity is the development of the embryo inside the body of the parent and where the young are born live, whereas oviparity is for females that lay developing eggs which hatch externally from the mother. Vivipary occurs when seeds or embryos begin to develop before detaching from the parent. These are modes of reproduction, but you often find reference to viviparous animals or mammals, meaning "gives birth to developed live young". The mammalian mode of reproduction and the placenta presents many biological challenges to theories of evolution. One problem is the biological and immunological dilemmas associated with the existence of a 'foreign body' expressing paternal genes. The first cell type to differentiate in a mammalian embryo is the trophectoderm which generates the placenta. It is the placenta that mediates the blood exchange between the mother and her 'non-self' embryo and contributes to the very complex biological changes needed for a live birth. Increasingly there is evidence that a particular class of viruses have been intimately and deeply involved in the evolution of the placenta in all mammalian lineages, i.e. providing genes that were useful for various functional and structural requirements of the placenta.
For our particular question concerning good and bad viruses, we want to focus on placental biology and the role of so-called endogenous retroviruses. The ancestral form of the placenta emerged roughly 130 million years ago, and had a massive impact in that it enabled live birth in mammals. However, the placenta exhibits an enormous variation across species. Morphology, tissue organisation, mechanisms of implantation, and even the cellular building blocks have evolved very different across different species. Today there is an increasing body of evidence that the evolution of the placenta was assisted by ancient retroviruses.
A retroviruse is a type of virus that inserts a copy of its RNA genome into the DNA of a host cell, thus changing the genome of that cell. The retroviruse produces double-stranded DNA from its RNA with the help of reverse transcriptase, a specific enzyme it carries inside its capsid. The viral dsDNA is then integrated into the host cell's chromosomes and becomes an endogenous retrovirus, i.e. they are a vertically inherited proviral sequence and a subclass of a type of gene called a transposon (a DNA sequence that can change its position within a genome). So the idea of the retrovirus is that the host will treat the viral DNA as part of its own genome, and it will transcribe and translate those viral genes along with its own. This means that it will 'naturally' produce the proteins required to assemble new copies of the virus. Retroviruses are ubiquitous and are found in all vertebrates, and many cause serious infectious diseases in humans (e.g. HIV and hepatitis B). However, the 'RNA world' is a suggested stage in the evolutionary history of life in which self-replicating RNA molecules proliferated and led to longer proteins and then the era of DNA