# 04 - 126 Microbial Genomics and Infectious Disease

### 126 Microbial Genomics and Infectious Disease

whereupon they can infect other V. cholerae. Consequently, the bacte­
riophages regulate the abundance of pathogens, with ramifications for 
the epidemiology of the disease. Whether bacteriophages contribute to 
cyclical control of other pathogens is currently unknown.
■
■SUMMARY
Bacterial pathogens display myriad mechanisms of colonization, adhe­
sion, invasion, dissemination, and manipulation of host pathways. 
Infectious diseases result when pathogens successfully establish them­
selves within the host. Symptoms are usually the result of ensuing fights 
between the pathogen and the immune system. The incredible diversity 
of virulence determinants highlights the success of the host in combat­
ing infection. Further elucidating how bacteria cause infection will 
better our understanding of both human and microbial biology and 
will provide new opportunities for successful therapeutic intervention 
against infectious and inflammatory diseases.
■
■FURTHER READING
Costa TRD et al: Secretion systems in gram-negative bacteria: Struc­
tural and mechanistic insights. Nat Rev Microbiol 13:343, 2015.
Fitzgerald KA, Kagan JC: Toll-like receptors and the control of 
immunity. Cell 180:1044, 2020.
Galluzzi L et al: Molecular mechanisms of cell death: Recommenda­
tions of the Nomenclature Committee on Cell Death 2018. Cell Death 
Differ 25:486, 2018. 
Jastrab JB, Kagan JC: Strategies of bacterial detection by inflamma­
somes. Cell Chem Biol 31:835, 2024.
Lamason RL, Welch MD: Actin-based motility and cell-to-cell spread 
of bacterial pathogens. Curr Opin Microbiol 35:48, 2017.
Remick BC et al: Effector-triggered immunity. Annu Rev Immunol 
41:453, 2023.
Ribet D, Cossart P: How bacterial pathogens colonize their hosts and 
invade deeper tissues. Microbes Infect 17:173, 2015.
Rivera-Cuevas R et al: Human guanylate-binding proteins in intra­
cellular pathogen detection, destruction, and host cell death induc­
tion. Curr Opin Immunol 84:102373, 2023.
Stones DH, Krachler AM: Against the tide: The role of bacterial 
adhesion in host colonization. Biochem Soc Trans 44:1571, 2016.
Tsolis RM, Bäumler AJ: Gastrointestinal host–pathogen interaction 
in the age of microbiome research. Curr Opin Microbiol 53:78, 2020.
Wu YW, Li F: Bacterial interaction with host autophagy. Virulence 
10:352, 2019.
Roby P. Bhattacharyya, Yonatan H. Grad, 

Deborah T. Hung

Microbial Genomics and 
Infectious Disease
Just as microscopy opened the worlds of microbiology by providing a 
tool with which to visualize microorganisms, technological advances in 
genomics provide microbiologists with powerful methods to character­
ize the genetic map that underlies all microbes with unprecedented 
resolution, thereby illuminating their complex and dynamic interac­
tions with each other, the environment, and human health. The field 
of infectious disease genomics encompasses a vast frontier of active 
research that is transforming public health and the clinical practice 
of infectious diseases. While genetics has long played a key role in 
elucidating the process of infection and impacting clinical infectious 
diseases, the ability to extend our thinking and approaches beyond 
the study of single genes to an examination of the sequence, structure, 

and function of entire genomes allows us to identify new possibilities 
for research and opportunities to change clinical practice and disease 
surveillance. From the development of diagnostics with unprecedented 
sensitivity, specificity, and speed to the design of novel public health 
surveillance tools and interventions, technical and statistical genomic 
innovations are reshaping our understanding of the influence of the 
microbial world on human health and providing us with new tools 
to diagnose, track, and combat infection. In this chapter, we explore 
the application of genomics methods to microbial pathogens and the 
infections they cause. We discuss innovations that are driving the 
development of diagnostic approaches as well as the discovery of new 
pathogens, providing insight into novel therapeutic approaches and 
paradigms, and advancing methods in infectious disease epidemiol­
ogy and the study of pathogen evolution that can inform infection 
control measures, public health responses to outbreaks, and vaccine 
development. We draw on examples in current practice and from the 
recent scientific literature as signposts that point toward ways in which 
pathogen genomics may influence infectious diseases in the short and 
long terms, and we highlight applications to SARS-CoV-2 and the 
COVID-19 pandemic. Table 126-1 provides definitions for a selection 
of important terms used in genomics.

MICROBIAL DIAGNOSTICS
The basic goals of a clinical microbiology laboratory are to establish the 
presence of a pathogen in a clinical sample, to identify the pathogen, 
and, when possible, to provide other information that can help guide 
clinical management and affect prognosis, such as antibiotic suscep­
tibility profiles or the presence of virulence factors. To date, clinical 
microbiology laboratories have largely approached these goals phe­
notypically by growth-based assays and biochemical testing. Bacteria, 
for instance, were historically grouped algorithmically into species by 
their characteristic microscopic appearance, nutrient requirements for 
growth, and ability to catalyze certain reactions. Antibiotic susceptibil­
ity is still determined in most cases by assessing bacterial growth in the 
presence of antibiotic.
CHAPTER 126
Microbial Genomics and Infectious Disease
With the sequencing revolution paving the way to easy access of 
complete pathogen genomes, we can more systematically define the 
genetic basis for these observable phenotypes. Compared with tradi­
tional growth-based methods for bacterial diagnostics that dominate 
the clinical microbiology laboratory, nucleic acid–based diagnostics 
that build on this genomic information promise improved speed, sen­
sitivity, specificity, and breadth of information. Bridging clinical and 
research laboratories, adaptations of genomic technologies have begun 
to deliver on this promise (Table 126-2).
■
■HISTORICAL LIMITATIONS AND PROGRESS 
THROUGH GENETIC APPROACHES
The molecular diagnostics revolution in the clinical microbiology labo­
ratory is well under way, born of necessity in the effort to identify and 
characterize microbes that are refractory to traditional culture meth­
ods. Historically, diagnosis of many so-called unculturable pathogens 
has relied largely on serology and antigen detection. However, these 
methods provide only limited clinical information because of their 
suboptimal sensitivity and specificity, and, for serology, the long delays 
that diminish their utility for real-time patient management and the 
inability to characterize pathogens beyond identifying past exposure. 
Newer tests to detect pathogens based on nucleic acid content have 
already offered improvements in the select cases in which they have 
been applied.
Unlike direct pathogen detection, serologic diagnosis—measurement 
of the host’s response to pathogen exposure—can typically be made 
only in retrospect, requiring both acute- and convalescent-phase 
serum samples. For chronic infections, distinguishing active from 
latent infection or identifying repeat exposure from serology alone can 
be difficult or impossible, depending on the syndrome. In addition, 
serologic diagnosis is variably sensitive, depending on the organism 
and the patient’s immune status. For instance, tuberculosis is notori­
ously difficult to identify by serologic methods; tuberculin skin testing 
using purified protein derivative (PPD) is especially insensitive in

TABLE 126-1  Glossary of Selected Terms in Genomics
TERM
DEFINITION
Contig
A DNA sequence representing a continuous fragment of a 
genome, assembled from overlapping sequences; relevant 
for de novo assembly of sequence data that do not align to 
previously sequenced genomes
Genome
The entire set of heritable genetic material within an 
organism
Horizontal gene 
transfer
The transfer of genes between organisms through 
mechanisms other than by clonal descent, such as 
through transformation, conjugation, or transduction
Genomic 
epidemiology
An approach to inferring and reconstructing transmission 
events, population dynamics, and epidemiological 
patterns using comparative analysis of microbial genome 
sequences
Metagenomics
Analysis of genetic material from multiple species directly 
from primary samples without requiring prior culture steps
Microarray
A collection of DNA oligonucleotides (“oligos”) spatially 
arranged on a solid surface and used to detect or quantify 
sequences in a sample of interest that are complementary 
(and therefore bind) to one or more of the arrayed oligos
Microbial genomewide association 
study (GWAS)
An analytic framework to test statistical associations 
between microbial genotypes and phenotypes of interest, 
such as antibiotic resistance and virulence
Mobile genetic 
elements
DNA elements that can move within a genome and can 
be transferred between genomes through horizontal gene 
transfer (e.g., plasmids, bacteriophages, and transposons)
Multilocus 
sequence typing
A method for typing organisms based on DNA sequence 
fragments from a prespecified set of genes
Next-generation 
sequencing
High-throughput sequencing using a parallelized 
sequencing process that produces millions of sequences 
concurrently, far beyond the capacity of prior dyeterminator methods
PART 5
Infectious Diseases
Nucleic acid 
amplification test 
(NAAT)
A biochemical assay that evaluates for the presence of a 
particular string of nucleic acids through amplification by 
one of several methods, including polymerase and ligase 
chain reactions
Polymerase chain 
reaction (PCR)
A type of NAAT used to amplify a specific region of DNA 
by means of specific oligonucleotide primers and a DNA 
polymerase
Single-nucleotide 
polymorphism (SNP)
Point mutations in DNA, the number of which in different 
microbial isolates is a measure of their genetic distance 
from one another
Transcriptome
The catalog of the full set of messenger RNA (mRNA) 
transcripts from a cell or organism, which are typically 
measured by microarray or by next-generation 
sequencing of complementary DNA (cDNA) via a process 
called RNA-Seq
Whole genome 
sequencing (WGS)
A process that determines the full DNA sequence of an 
organism’s genome; has been greatly facilitated by nextgeneration sequencing technology
active disease and possibly cross-reactive with vaccines or other myco­
bacteria. Even the newer interferon γ release assays (IGRAs), which 
measure cytokine release from T lymphocytes in response to Mycobac­
terium tuberculosis–specific antigens in vitro, have limited sensitivity 
in immunodeficient hosts. Neither PPD testing nor IGRAs can distin­
guish latent from active infection. Serologic Lyme disease diagnostics 
suffer similar limitations: in patients from endemic regions, the pres­
ence of IgG antibodies to Borrelia burgdorferi may reflect prior expo­
sure rather than active disease, while IgM antibodies are imperfectly 
sensitive and specific (50% and 80%, respectively, in early disease). The 
complicated nature of these tests, particularly in view of the nonspe­
cific symptoms that may accompany Lyme disease, has had substantial 
implications for public perception of Lyme disease and antibiotic mis­
use in endemic areas. Similarly, syphilis, a chronic infection caused by 
Treponema pallidum, is notoriously difficult to stage by serology alone, 
requiring multiple different nontreponemal and treponemal tests (e.g., 
rapid protein reagin and fluorescent treponemal antibody, respectively) 

in conjunction with clinical suspicion. Complementing serology, anti­
gen detection can improve sensitivity and specificity in select cases 
but has been validated only for a limited set of infections. Typically, 
structural elements of pathogens are detected, including components 
of viral envelopes (e.g., hepatitis B surface antigen, HIV p24 antigen, 
SARS-CoV-2 spike protein), cell surface markers in certain bacteria 
(e.g., Streptococcus pneumoniae, Legionella pneumophila serotype 1) 
or fungi (e.g., Cryptococcus, Histoplasma), and less specific fungal cellwall components such as galactomannan and β-glucan (e.g., Aspergillus 
and other fungi).
Given the impracticality of culture and the lack of sensitivity or 
sufficient clinical information afforded by serologic and antigenic 
methods, the push toward nucleic acid–based diagnostics originated 
in pursuit of viruses and fastidious bacteria, becoming part of the 
standard of care for select organisms in U.S. hospitals. Such tests, 
including polymerase chain reaction (PCR) and other nucleic acid 
amplification tests (NAATs), are now widely used for many viral infec­
tions, both chronic (e.g., HIV infection, hepatitis C) and acute (e.g., 
influenza, SARS-CoV-2, respiratory syncytial virus). NAATs provide 
essential information about both the initial diagnosis and the response 
to therapy and, in some cases, genotypically predict drug resistance. 
Indeed, progression from antigen detection to PCR transformed our 
understanding of the natural course of HIV infection, with profound 
implications for treatment (Fig. 126-1A). In the early years of the AIDS 
pandemic, p24 antigenemia was detected in acute HIV infection but 
then disappeared for years before emerging again with progression to 
AIDS (Fig. 126-1B). Without a marker demonstrating viremia, the role 
of treatment during HIV infection prior to the development of clinical 
AIDS was uncertain, and assessing treatment efficacy was challeng­
ing. With the emergence of PCR as a progressively more sensitive test 
(now able to detect as few as 20 copies of virus per milliliter of blood), 
viremia was recognized as a near-universal feature of HIV infection. 
Given the challenges of phenotypic assays, genotypic antiviral resis­
tance testing was also adopted early for HIV and is now the standard 
of care before the initiation of therapy in developed countries. These 
developments have been transformative in guiding therapy in early 
disease and, together with the development of less toxic therapies, have 
helped to shape policy that is moving toward ever-earlier introduction 
of antiretroviral therapy in HIV infection. Reverse transcriptase PCR 
(RT-PCR) assays are the core method for detecting SARS-CoV-2 virus 
in the acute phase, forming a critical component of the clinical and 
public health response to COVID-19, just as they were on a smaller 
scale for the related coronaviruses SARS-CoV and Middle East respi­
ratory syndrome (MERS)-CoV. Tests for SARS-CoV-2 represent the 
largest implementation of a molecular infectious disease assay to date 
and play a critical role in both clinical diagnostics and public health 
measures to contain the COVID-19 pandemic.
As they are for viral testing, nucleic acid–based tests have become 
the diagnostic tests of choice for fastidious bacteria, including the 
common sexually transmitted bacterial pathogens Neisseria gonor­
rhoeae and Chlamydia trachomatis as well as the tick-borne Ehrlichia 
chaffeensis and Anaplasma phagocytophilum. More recently, nucleic 
acid amplification–based detection has offered improved sensitivity for 
diagnosis of the important nosocomial pathogen Clostridioides difficile, 
and NAATs have provided clinically relevant information on the pres­
ence of cytotoxins A and B as well as molecular markers of hyperviru­
lence, such as the North American pulsotype 1 (NAP1) strain that is 
enriched in severe illness. The importance of genomics in selecting loci 
for diagnostic assays and in monitoring test sensitivity was highlighted 
by the emergence in Sweden of a newly recognized variant of C. tracho­
matis with a deletion that includes the gene targeted by a set of com­
mercial NAATs. By evading detection through this deletion (and thus 
avoiding treatment), this strain came to be highly prevalent in some 
areas of Sweden. While nucleic acid–based tests remain the diagnostic 
approach of choice for fastidious bacteria, this example of “diagnostic 
escape” serves as a reminder of the need for careful development and 
ongoing monitoring of molecular diagnostics.
In contrast, for typical bacterial pathogens for which culture meth­
ods are well established, growth-based assays still dominate in the

TABLE 126-2  Selected Clinical Applications of Infectious Disease Genomics
APPLICATION
TECHNOLOGY
NOTES/EXAMPLES
Organism Identification
Viral detection
PCR, RT-PCR
Identification of HIV, HBV, HCV, respiratory viruses including SARS-CoV-2 and influenza, and others for 
diagnosis and response to therapy
TB detection
PCR
Amplification of the rpoB gene for species-specific identification of Mycobacterium tuberculosis
Pathogen detection
PCR, RT-PCR, NAAT
Multiplexed identification of dozens of viruses, bacteria, yeasts, and parasites from a variety of clinical 
specimens
Bacterial detection
16S ribosomal gene 
sequencing
Targeted amplification and sequencing of regions of the 16S rRNA gene for identification of suspected 
bacterial infections undiagnosed by conventional methods
Pathogen detection
Cell-free DNA 
sequencing
Unbiased amplification and sequencing of cell-free nucleic acid from blood, with analytical comparison 
of resulting nonhuman sequences to genomes of known pathogens and contaminants, in order to identify 
circulating pathogen DNA; anecdotal clinical use to establish etiology of systemic or focal infection, though 
clinical utility and optimal use cases still evolving
Pathogen Discovery
Bacterial pathogens
Sequencing, 
metagenomic assembly
Unbiased “shotgun” sequencing of isolated nucleic acid from patient samples to identify associated 
pathogens; proofs-of-concept: new Bradyrhizobium species associated with cord colitis; Escherichia coli 
O104:H4 from 2011 diarrheal outbreak in Germany; Leptospira species from one patient’s cerebrospinal fluid; 
research use only at this time
Viral pathogens
Microarray, sequencing
Hybridization of clinical samples to microarrays from phylogenetically diverse known viruses identified the first 
SARS coronavirus and others. Direct sequencing has identified SARS-CoV-2, West Nile virus, and MERS-CoV, 
among others. Use is primarily in research.
Antibiotic Resistance
MRSA detection
PCR
Detection of the mecA gene, the genotypic cause of methicillin resistance in Staphylococcus aureus
VRE detection
PCR
Detection of the vanA or vanB gene, the main genotypic causes of vancomycin resistance in Enterococcus
MDR-TB detection
PCR, NAAT
Detection of polymorphisms in the rpoB gene from M. tuberculosis, which account for 95% of rifampin 
resistance. Other probes available for inhA and katG genes can detect up to 85% of isoniazid resistance.
Carbapenemase detection
PCR
Detection of genes encoding one of several types of enzymes (KPC, NDM, OXA-48, IMP, VIM) that hydrolyze 
carbapenems, accounting for much but not all carbapenemase resistance in Enterobacteriaceae
HIV resistance detection
Targeted sequencing
Targeted sequencing of specific genes with known resistance-conferring mutations; now the standard of care 
prior to initial therapy in the United States and Europe
Epidemiology
Outbreak and epidemic 
tracking
Sequencing
Application to tracking outbreaks and epidemics on local and international scales, including spread of 
carbapenemase-producing Klebsiella, S. aureus, M. tuberculosis, E. coli, Vibrio cholerae, Ebola virus, Zika 
virus, and influenza virus
Evolution and spread of 
pathogens
Sequencing
Sequencing collections of pathogens from individual patients or environmental reservoirs such as wastewater 
to shed light on pathogen dissemination, virulence factors, and antibiotic resistance determinants; 
innumerable examples, including V. cholerae, influenza virus, Ebola virus, Zika virus, and SARS-CoV-2
Abbreviations: HBV, hepatitis B virus; HCV, hepatitis C virus; MDR, multidrug-resistant; MERS, Middle East respiratory syndrome; MRSA, methicillin-resistant S. aureus; 
NAAT, nucleic acid amplification test; PCR, polymerase chain reaction; RT, reverse transcriptase; SARS, severe acute respiratory syndrome; TB, tuberculosis; VRE, 
vancomycin-resistant enterococci.
clinical laboratory. Informed by decades of clinical microbiology, these 
tests have served clinicians well, yet the limitations of growth-based 
tests—in particular, the delays associated with waiting for growth—
have left opportunities for improvements. Driven by this need, mass 
spectrometry–based assays that offer highly accurate organism iden­
tification within a few hours of a positive blood culture are widely 
adopted in clinical microbiology laboratories, largely supplanting 
biochemical tests for organism identification in well-resourced areas. 
Looking ahead, molecular diagnostics, greatly informed by the vast 
quantity of microbial genome sequences generated in recent years, 
offers a way forward. First, sequencing studies can readily identify key 
genes (or noncoding nucleic acids) that can be developed into targets 
for clinical assays using PCR or hybridization assay platforms. Second, 
sequencing itself is becoming cheap and rapid enough to be performed 
on clinical specimens in certain cases, with consequent unbiased detec­
tion or characterization of pathogens.
One of the biggest drivers for the implementation of novel molecu­
lar technologies in the diagnosis of infectious diseases is the desire 
for more rapid—or even real-time—pathogen identification, ideally 
with antibiotic susceptibility information on those microbes for which 
resistance to the current anti-infective armamentarium is of concern. 
Such real-time tests have the potential to transform infectious disease 
management, impacting antibiotic stewardship in the outpatient set­
ting, mortality risk in the critically ill (i.e., patients in whom early 

CHAPTER 126
Microbial Genomics and Infectious Disease
administration of effective antibiotics is the most significant factor in 
decreasing mortality risk), hospital admission, and length of hospital 
stay; the extent of this impact will depend on the economic forces 
that will help define the breadth of their deployment. On the public 
health level, such tests will likely play a role in improving antibiotic 
stewardship, thereby influencing the rise of antibiotic resistance and 
enabling surveillance of outbreaks by local, national, and international 
networks. In the United States and the United Kingdom, for example, 
public health agencies use genome sequencing to track food-borne 
pathogens and identify outbreaks and have rapidly expanded the rou­
tine use of genomics in identifying and characterizing other pathogens, 
from mycobacteria (both M. tuberculosis and nontuberculous myco­
bacteria) to N. gonorrhoeae. Further, international efforts to track the 
spread of viral diseases, particularly the vast efforts to sequence SARSCoV-2 and monitor the emergence and spread of its variants, as well 
as recent work on mpox, Ebola, and Zika outbreaks and ongoing work 
on seasonal influenza, offer opportunities for improving interven­
tions, surveillance, and prevention efforts, ranging from more accurate 
selection of the strains to include in vaccine development to improved 
design of trials to evaluate novel vaccines and therapies.
Technological innovations are lowering several critical barriers to 
the widespread adoption of genomics and other molecular methods. 
Specifically, for NAAT, the need for rapid thermal cycling and coldchain storage for reagents has significantly impeded implementation

GENERAL
MILESTONES
HIV genome
sequenced
First AIDS case
series published
First clinical
cases worldwide
HIV first
isolated
AZT (NRTI)
approved
Saquinavir (PI)
approved

Phenotypic
resistance
testing available
HIV antibody test
approved
p24 antigen test
approved
A
DIAGNOSTIC
MILESTONES
AIDS
Acute HIV
Chronic HIV
PART 5
Infectious Diseases
Relative level
years
weeks
B
FIGURE 126-1  A. Timeline of select milestones in HIV management. Genomic advances are shown in bold type. The approvals and recommendations indicated apply to the 
United States. ARV, antiretroviral; AZT, zidovudine; NRTI, nucleoside reverse transcriptase (RT) inhibitor; NNRTI, nonnucleoside RT inhibitor; PI, protease inhibitor. B. Viral 
dynamics in the natural history of HIV infection. Three diagnostic markers are shown: HIV antibody (Ab), p24 antigen (p24), and viral load (VL). Dashed gray line represents 
limit of detection. (Adapted from data in HH Fiebig et al: Dynamics of HIV viremia and antibody seroconversion in plasma donors: Implications for diagnosis and staging of 
primary HIV infection. AIDS 17:1871, 2003.)
in resource-limited settings. Recent efforts aim to overcome these chal­
lenges by developing isothermal amplification protocols and lyophilized 
reagents that do not require refrigeration or sophisticated instrumenta­
tion. For clinical sequencing, (1) the cost and speed of sequencing and 
analysis methods continue to fall precipitously; (2) automation and 
miniaturization of the preparation of a sample for sequencing prom­
ise to reduce cost and minimize the expertise needed; and (3) direct 
sequencing technologies that eliminate the complex molecular biology 
required to prepare clinical samples for sequencing are improving in 
accuracy and robustness. Further barriers exist, including the need 
for careful processing of clinical samples to minimize contamination, 
and standardized pipelines to process data and present clinicians with 
easily interpretable and readily actionable results. However, as these 
advances give rise to rapid, accurate diagnostic tests, the ultimate goal 
is to inform a clinician in real time whether antibiotics are indicated 
and, if so, which will be effective. Real-time diagnostics will allow more 
efficient deployment of our precious antibiotic arsenal, thus improving 
both societal and patient-specific outcomes in much the same way that 
a rapid, sensitive troponin assay has transformed bedside management 
of chest pain.

Nevirapine (NNRTI)
approved
First once-a-day
combination ARV
approved
Frontier: clinical
impact of rare
sequence variants
HIV genotype
recommended
before ARV start
HIV viral load
test approved
HIV genotypic
resistance
testing approved
VL
Ab
p24
Time
■
■ORGANISM IDENTIFICATION
To adapt nucleic acid detection to diagnostic tests and thus to identify 
pathogens on a wide scale, sequences must be found that are conserved 
enough within a species to identify the diversity of strains that may 
be encountered in various clinical settings, but divergent enough to 
distinguish one species from another. Until recently, this problem has 
been solved for bacteria by targeting the element of a bacterial genome 
that is most highly conserved within a species, the 16S ribosomal 
RNA (rRNA) subunit. Among many examples, this method has now 
been used to confirm Mycobacterium chimaera infections in several 
patients after cardiothoracic surgery, leading ultimately to recognition 
of a widespread outbreak. At present, 16S PCR amplification from tis­
sue specimens can be performed by specialty laboratories, though its 
sensitivity and clinical utility to date have remained somewhat limited, 
in part because of the scarcity and relative fragility of pathogen nucleic 
acid in the sampled tissue, which necessitates reliable, sensitive nucleic 
acid amplification. As such barriers are reduced through technological 
advances and as the causes of culture-negative infection are clarified 
(perhaps in part through sequencing efforts), these tests may become 
both more accessible and more helpful.

With the wealth of sequencing data now available, other regions 
beyond 16S rRNA can be targeted for bacterial species identification. 
These other genomic loci can provide additional information about a 
clinical isolate that is relevant to patient management. For instance, 
detection of the presence, or potentially even the expression, of toxin 
genes such as C. difficile toxins A and B or Shiga toxin can provide 
clinicians with additional information that will help distinguish com­
mensals or colonizing bacteria from pathogens and thus aid in prog­
nostication and management as well as in diagnosis.
Beyond bacteria, one commonly used approach to PCR-based 
pathogen detection is so-called “syndromic panels” of multiplexed PCR 
to identify common causes of clinical infection syndromes, including 
upper respiratory infection, gastroenteritis, and meningoencephalitis. 
The most frequently deployed syndromic panel is the respiratory viral 
panel, which typically includes primer sets targeting a combination 
of influenza, parainfluenza, respiratory syncytial virus, adenovirus, 
rhinovirus, enterovirus, metapneumovirus, and common-cold corona­
viruses, sometimes in conjunction with unculturable bacteria such as 
Mycoplasma, Chlamydophila, and Bordetella species. With the advent of 
the COVID-19 pandemic, SARS-CoV-2 was quickly added to this mul­
tiplex PCR panel within months of its development as a stand-alone 
PCR test. The goal of such panels is to capture common infectious 
causes of these syndromes in a single, standardized diagnostic test, 
ideally streamlining the diagnostic evaluation. The ready identification 
of a plausible etiologic agent may offer diagnostic clarity if judiciously 
used and carefully considered in the clinical context of each patient. 
The widespread adoption of SARS-CoV-2 PCR testing in the earli­
est days of the COVID-19 pandemic underscored the crucial role of 
precise etiologic pathogen diagnosis in patient care, triage, infection 
control, and epidemiology.
One challenge with PCR-based assays is the relative complexity of 
the molecular biology and consequent need for advanced technol­
ogy for implementation, including instruments and reagents. Several 
recent approaches have advanced the molecular biology of nucleic 
acid detection with the aim of increasing deployability of NAATs for 
use in resource-limited or even field settings. These methods couple 
nucleic acid detection to an enzymatic readout, enabling catalytic 
signal amplification. Several such approaches build on the intrinsic 
sensitivity, specificity, and amplification of the CRISPR (clustered, 
regularly interspaced short palindromic repeats) effectors Cas12a and 
Cas13a as nucleic acid sensors. Distinct from the famous gene editing 
CRISPR effector Cas9, these robust and versatile enzymes recognize 
short nucleic acid targets with high specificity and transduce this bind­
ing event into “collateral cleavage” of nearby nucleic acids that can be 
engineered to create a signal using fluorescent reporter constructs. 
Crucially, this biotechnology can be made to work in conjunction with 
isothermal enzymatic preamplification steps to achieve remarkable 
sensitivity, all robustly enough to withstand lyophilization that enables 
long-term storage before being reconstituted in the field. Such assays 
are still in the early stages of development, but they have shown prom­
ise and could play a critical role in global diagnostics and surveillance.
While amplification tests such as PCR and other NAATs exemplify 
one approach to nucleic acid detection, other approaches exist, includ­
ing detection by hybridization. Although not currently used in the 
clinical realm, techniques for multiplexed detection and identifica­
tion of pathogens by hybridization to microarrays or in solution are 
being developed for other purposes. Of note, these different detection 
techniques require different degrees of conservation. Highly sensi­
tive amplification methods require a high degree of sequence identity 
between PCR primer pairs and their short, specific target sequences; 
even a single base-pair mismatch (particularly near the 3′ end of 
the primer) may interfere with detection. In contrast, hybridizationbased tests are more tolerant of mismatch and thus can be used to 
detect important regions that may be less precisely conserved within 
a species, thus potentially allowing detection of clinical isolates from a 
given species with greater diversity between isolates. Such assays take 
advantage of the predictable binding interactions of nucleic acids and 
do not require enzymology, broadening the range of conditions under 
which such assays are feasible, including directly on primary clinical 

specimens. The applicability of hybridization-based methods toward 
either DNA or RNA opens the possibility of expression profiling, which 
can uncover phenotypic information from nucleic acid content.

Both PCR and hybridization methods target specific, known organ­
isms. At the other extreme, as sequencing costs decline, metagenomic 
sequencing from patient samples is increasingly feasible. This shot­
gun sequencing approach is unbiased—i.e., can detect any microbial 
sequence, however divergent or unexpected. In one example, a clinical 
sample of cerebrospinal fluid from an immunocompromised patient 
with signs and symptoms of chronic meningitis was found through 
metagenomic sequencing and analysis to contain small amounts of 
Leptospira DNA. In light of this information, retrospective PCR testing 
confirmed the diagnosis of neuroleptospirosis, which had been missed 
prior to the sequencing result. The patient was treated with penicillin 
G and clinically recovered. Increasingly, efforts are under way to bring 
whole genome sequencing to other clinical samples, including sputum 
and blood, to more readily identify pathogens. One such assay certified 
for clinical use in the United States—a shotgun metagenomic sequenc­
ing approach applied to cell-free DNA circulating in the bloodstream 
that aims to identify pathogens both in blood and other body sites—is 
gaining adoption in cases where traditional culture-based methods 
fail to identify a pathogen in patients with symptoms consistent with 
infection. This new approach brings its own set of challenges, however, 
including the need to recognize pathogenic sequences against a back­
ground of expected host and commensal sequences and to distinguish 
true pathogens from either colonizers or laboratory contaminants. The 
burgeoning field of microbiome research is driving technology devel­
opment for sequencing and analyzing complex microbial communities. 
Lessons from this field will inform diagnostic efforts.
CHAPTER 126
■
■PATHOGEN DISCOVERY
In addition to clinical diagnostic applications, novel genomic technolo­
gies, including whole genome sequencing, are being applied to clinical 
research specimens with a goal of identifying new pathogens in a vari­
ety of circumstances. The tremendous sensitivity and unbiased nature 
of sequencing is also ideal in searching clinical samples for unknown 
or unsuspected pathogens.
Microbial Genomics and Infectious Disease
Causal inference in infectious diseases has progressed since the time 
of Koch, whose historical postulates provided a rigorous framework for 
attributing a disease to a microorganism. To modernize Koch’s postu­
lates, an organism, whether it can be cultured or not, should induce 
disease upon introduction into a healthy host if it is to be implicated 
as a causative pathogen. Current sequencing technologies are ideal for 
advancing this modern version of Koch’s postulates because they can 
identify candidate causal pathogens with unprecedented sensitivity 
and in an unbiased way, unencumbered by limitations such as cultur­
ability. Yet, as direct sequencing on primary patient samples greatly 
expands our ability to recognize associations between microbes and 
disease states, critical thinking and experimentation will remain vital 
in establishing causality.
Virus discovery in particular has been greatly facilitated by new 
nucleic acid technology. These frontiers were first notably explored 
with high-density microarrays containing spatially arrayed sequences 
from a phylogenetically diverse collection of viruses. Despite bias 
toward those with homology to known viruses, novel viruses in clini­
cal samples were successfully identified on the basis of their ability to 
hybridize to these prespecified sequences. This methodology famously 
contributed to identification of the coronavirus causing severe acute 
respiratory syndrome (SARS). Once discovered, the SARS coronavirus 
was rapidly sequenced: the full genome was assembled in April 2003, 
<6 months after recognition of the first case.
With the advent of next-generation sequencing, unbiased pathogen 
discovery is now addressed through a process known as metagenomic 
assembly (Fig. 126-2), largely supplanting other methods. Sequences 
of random nucleotide fragments can be generated from clinical speci­
mens with no a priori knowledge of pathogen identity through a pro­
cess called shotgun sequencing. This collection of sequences can then be 
computationally aligned to host (i.e., human) sequences, with aligned 
sequences removed and remaining sequences compared with other

DNA extraction
host +/– microbial DNA
high-throughput
sequencing
clinical specimen
+
+
de novo assembly
of unmapped reads
“novel”
microbe
phylogenetic
comparison to
known genomes
aligned reads
genome fragments (“contigs”)
FIGURE 126-2  Workflow of metagenomic assembly for pathogen discovery. DNA is isolated from a specimen of interest (e.g., tissue, body fluid) containing a mixture of host 
DNA and nucleic acids from coexisting microbes, either commensal or pathogenic. All DNA (and RNA, if a reverse transcription step is added) is then sequenced, yielding 
a mixture of DNA sequence fragments (“reads”) from the organisms present. Except for reads that do not align (“map”) to any known sequence, these reads are aligned 
to existing reference genomes for the host or any known microbes. The unmapped reads are computationally assembled de novo into the largest contiguous stretches 
of DNA possible (“contigs”), representing fragments of previously unsequenced genomes. These genome fragments (contigs) are then mapped onto a phylogenetic tree 
based on their sequence. Some may represent known but as-yet-unsequenced organisms, while others will represent novel species. (Figure prepared with valuable input 
from Dr. Ami S. Bhatt, personal communication.)
known genomes to detect the presence of known microorganisms. 
Sequence fragments that remain unaligned suggest the presence of an 
additional organism that cannot be matched to a known, characterized 
genome; these reads can be assembled into contiguous nucleic acid 
stretches that can be compared with known sequences to construct the 
genome of a potentially novel organism. Assembled genomes (or parts 
of genomes) can then be compared with known genomes to infer the 
phylogeny of new organisms and identify related classes or traits. Thus, 
not only can this process identify unanticipated pathogens, but it can 
even identify undiscovered organisms.
PART 5
Infectious Diseases
The emergence of COVID-19 provides a dramatic example that 
illustrates advances in pathogen discovery technology in the interven­
ing 16 years since SARS-CoV was discovered: the causal coronavirus, 
SARS-CoV-2, was identified through metagenomic sequencing within 
about 1 month of the first known case and just weeks after the outbreak 
was first recognized. Sequencing and assembly were completed within 
5 days of the discovery of the new virus, and a NAAT was released 
1 day later. Given the ensuing ravages of COVID-19 and the cost of 
delays of even a few weeks in implementing this new diagnostic test in 
some locations, it is sobering to contemplate the added harm had this 
outbreak occurred even a decade earlier. This timeline illustrates the 
advancing power and speed of new diagnostic technologies but also 
underscores the pressing need for continued progress.
As metagenomic sequencing and assembly techniques become more 
robust, this technology holds great promise for identifying micro­
organisms that are associated with clinical conditions of unknown 
etiology. Conventional methods already have unexpectedly linked 
numerous conditions with specific agents of infection—e.g., cervi­
cal and oropharyngeal cancers with human papillomavirus (HPV), 
Kaposi’s sarcoma with human herpesvirus 8, and certain lymphomas 
and, more recently, multiple sclerosis with Epstein-Barr virus. Recently, 
Zika virus, first described in the 1940s, was found to be increasing 
in incidence as a cause of febrile syndromes, particularly in Central 
and South America. A concurrent increase in the incidence of micro­
cephaly was noted that temporally and geographically matched the 
Zika epidemics. Zika was suspected to be neurotropic because of a 
previously recognized association with Guillain-Barré syndrome, but 
the strongest link between Zika virus and microcephaly came when the 
virus itself was detected by both quantitative reverse transcription PCR 
(RT-qPCR) and whole genome sequencing in postmortem fetal brain 
tissue from microcephalic infants. An argument for causality was built 

taxonomic
assignment
of reads
mixed reads
microbe 1
microbe 2
host
unmapped
on the foundation of epidemiologic evidence and direct viral detec­
tion, both of which were built on nucleic acid detection and genome 
sequencing. Sequencing techniques offer unprecedented sensitivity 
and specificity for identifying foreign nucleic acid sequences that may 
suggest other such pathogen-associated conditions—from malignan­
cies to inflammatory conditions to unexplained fevers or other clini­
cal syndromes—associated with organisms from viruses to bacteria to 
parasites. Caution is needed, though: in the absence of the ability to ful­
fill Koch’s postulates, sequence-based identification of a microbe from 
patient specimens is not, on its own, sufficient to ascribe pathogenicity. 
The increasing sensitivity of these methods warrants greater rigor and 
care in defining what is “noise” and what represents a pathogen.
As sequencing-based discovery expands, microbes may be found to 
be associated with conditions not classically thought of as infectious, 
such as the link between maternal Zika virus infection and fetal micro­
cephaly. Studies of bowel flora in laboratory animals and even humans 
already suggest correlations between microbe composition and various 
aspects of metabolic and cardiovascular health. Improved methods 
for pathogen detection will continue to uncover unexpected correla­
tions between microbes and disease states, but the mere presence of 
a microbe does not establish causality. Fortunately, once the relatively 
laborious and computationally intensive metagenomic sequencing and 
assembly efforts have identified a pathogen, further detection can more 
easily be undertaken with targeted methods such as PCR or hybridiza­
tion, which may be more scalable and amenable to in situ confirma­
tion. This capacity should facilitate the additional careful investigation 
that will be required to progress beyond correlation and to draw causal 
inference.
■
■ANTIBIOTIC RESISTANCE
At present, antibiotic resistance in bacteria and fungi is conventionally 
determined by isolating a single colony from a cultured clinical speci­
men and testing its growth in the presence of drug. The requirement 
for multiple growth steps in these conventional assays has several con­
sequences. First, only culturable pathogens can be readily processed. 
Second, this process requires considerable infrastructure to support 
the sterile environment needed for culture-based testing of diverse 
organisms. Finally, and perhaps most significantly, even the fastestgrowing organisms require 1–2 days of processing for identification 
and 2–3 days for determination of susceptibilities. Some slow-growing 
organisms take even longer; for instance, weeks must pass before

drug-resistant M. tuberculosis can be identified by growth phenotype. 
Given the clinical imperative in serious illness to begin effective therapy 
early, this inherent delay in susceptibility determination has obvious 
implications for empirical antibiotic use: broad-spectrum antibiotics 
often must be chosen up front in situations where it is later shown that 
preferred narrower-spectrum drugs would have been effective or even 
that no antibiotics were appropriate (i.e., in viral infections). Even with 
this strategy, as resistant organisms become more common, the empiri­
cal choice can be incorrect, often with devastating consequences. Realtime identification of the infecting organism and information on its 
susceptibility profile would guide initial therapy and support judicious 
antibiotic use, ideally improving patient outcomes while aiding in the 
ever-escalating fight against antibiotic resistance by reserving the use of 
broad-spectrum agents for cases in which they are truly needed.
Molecular diagnostics and sequencing offer a way to accelerate 
detection of a pathogen’s antibiotic susceptibility profile. If a genotype 
that confers resistance can be identified, this genotype can be targeted 
for molecular detection. In infectious disease, this approach has most 
comprehensively come to fruition for HIV (Fig. 126-1A). (In a concep­
tually parallel application of genomic analysis, molecular detection of 
certain resistance determinants in cancers informs selection of targeted 
chemotherapy.) Extensive sequencing of HIV strains and correla­
tions drawn between viral genotypes and phenotypic resistance have 
delineated the majority of mutations in key HIV genes, such as reverse 
transcriptase, protease, and integrase, that confer resistance to the antiret­
roviral agents that target these proteins. For instance, the single amino 
acid substitution K103N in the HIV reverse transcriptase gene predicts 
resistance to the first-line nonnucleoside reverse transcriptase inhibi­
tor efavirenz, and its detection informs a clinician to choose a different 
agent. The effects of these common mutations on HIV susceptibility to 
various drugs—as well as on viral fitness—are curated in publicly avail­
able databases. Thus, genotypes are now routinely used to predict drug 
resistance in HIV, as phenotypic resistance assays are far more cumber­
some than targeted sequencing. Though it was not implemented at the 
level of individual patients, sequencing-based detection of circulating 
SARS-CoV-2 variants predicted susceptibility to monoclonal antibod­
ies due to changes in the spike protein and, thus, informed decisions 
made by the U.S. Food and Drug Administration (FDA) on granting 
and rescinding approval for such COVID-19 therapies as the pandemic 
progressed. As new therapies are introduced, this targeted sequenc­
ing–based approach to drug resistance will likely prove important in 
other viral infections.
The challenge of predicting drug susceptibility from genotype is 
more daunting for bacteria than for HIV, yet considerable progress 
has been made toward sequencing-based determination of bacterial 
antibiotic susceptibility. Bacteria have far more complex genomes 
than viruses, with thousands of genes on their chromosomes (many 
of which can functionally interact in ways that escape a priori predic­
tion) and the capacity to acquire many more through horizontal gene 
transfer of plasmids and mobile genetic elements within and between 
species. Thus, the task of comprehensively defining all possible genetic 
resistance mechanisms is orders of magnitude more complex in bac­
teria than in viruses, which typically have far more limited genomes. 
Despite these challenges, considerable progress has been made in 
recent years. In select cases where biological factors appear to have con­
strained the genotypic basis for resistance to a small, well-defined set of 
mutations, genotypic assays for antibiotic resistance are already being 
integrated into clinical practice. One important example is the detec­
tion of methicillin-resistant Staphylococcus aureus (MRSA). S. aureus is 
one of the most common and serious bacterial pathogens of humans, 
particularly in health care settings. Resistance to methicillin—the most 
effective class of antistaphylococcal antibiotics—is common, even in 
community-acquired strains. Vancomycin—an alternative drug to 
methicillin—is effective against MRSA but is measurably inferior to 
methicillin against methicillin-susceptible S. aureus (MSSA). Analysis 
of clinical MRSA isolates has demonstrated that the molecular basis for 
resistance to methicillin in essentially all cases stems from the expres­
sion of an alternative penicillin-binding protein (PBP2A) encoded by 
the gene mecA, which is found within a transferable genetic element 

called mec. This mobile cassette has spread rapidly through the S. aureus 
population via horizontal gene transfer and selection from widespread 
antibiotic use. Because methicillin resistance is essentially always due 
to the presence of the mecA gene, MRSA is particularly amenable to 
molecular detection. A PCR test for the mecA gene, which saves hours 
to days compared with standard culture-based methods, has been 
approved by the FDA to augment (but not replace) traditional culturebased susceptibility testing. Similar to MRSA, vancomycin-resistant 
enterococci (VRE) harbor one of a limited number of van genes found 
to be responsible for resistance to this important antibiotic, which 
occurs through alteration of the mechanism for cell wall cross-linking 
that vancomycin inhibits. Detection of one of these genes by PCR 
indicates resistance. More recently, multiplexed PCR assays targeting 
carbapenemase genes (those encoding the KPC, NDM, OXA-48, IMP1, and VIM carbapenemases), which are responsible for a significant 
fraction of carbapenem resistance (though not all instances), can 
predict some resistance to this crucial antibiotic class, though they are 
not comprehensive enough to confirm susceptibility if absent. Despite 
these caveats, such assays to detect even a limited set of high-value 
resistance genes are gaining use in high-resourced settings. Finally, a 
PCR assay targeting the highly conserved RNA polymerase gene serves 
not only to detect M. tuberculosis directly in sputum samples but also 
to detect resistance to rifampin, since the determinants of resistance 
to this RNA polymerase inhibitor map almost exclusively to a short 
region of this gene. Since rifampin resistance is epidemiologically 
associated with, though not causal for, multidrug resistance, this assay 
identifies strains at high risk for multidrug resistance, enhancing its 
value. This test has transformed tuberculosis testing where available, 
improving sensitivity and providing a limited measure of susceptibility 
testing to guide therapy.

CHAPTER 126
Although identification and rapid detection of monogenic resis­
tance determinants have improved, bacteria typically evolve multiple, 
diverse resistance mechanisms to most antibiotics; thus, outside of 
these edge cases, resistance prediction often requires probing for and 
integration of multiple genetic lesions, targets, or mechanisms. For 
instance, at least five distinct modes of resistance to fluoroquinolones 
are known: reduced import, increased efflux, target site mutation, 
drug modification, and shielding of the target sites by expression of 
another protein. These mechanisms are typically present in combi­
nation in clinically resistant isolates; thus, the problem of detecting 
genetic resistance is often a combinatorial one. In another clinically 
important example, while carbapenem resistance in Enterobacteriaceae 
is often explained by the presence of carbapenemases, resistance may 
also develop when other, less broad-spectrum β-lactamases are found 
in combination with porin mutations or upregulated efflux pumps. 
These more complex mechanisms prevent PCR-based carbapenemase 
detection assays from identifying other mechanisms of carbapenem 
resistance. Additionally, plasmids and transposable elements, which 
often are enriched for antibiotic resistance determinants, may be more 
technically and analytically challenging to sequence, although newer 
long-read sequencing technologies are beginning to address these 
challenges. To further complicate genetic prediction, changes in gene 
expression (which may be detectable through mutations in promoter 
regions or regulatory genes without coding mutations in known resis­
tance determinants) and even gene copy number (which may occur 
without changes in primary sequence) of resistance determinants play 
critical roles in some cases of genetic resistance. Thus, while predicting 
resistance when determinants are found is rapidly becoming feasible, 
the more clinically relevant task of predicting susceptibility when no 
known resistance determinants are found remains more difficult.
Microbial Genomics and Infectious Disease
To build on early successes with the goal of advancing beyond 
binary detection of monogenic resistance determinants, the ultimate 
frontier for genetic prediction of bacterial antibiotic resistance lies 
in more comprehensive prediction of a resistance phenotype from 
sequence information—a task similar to HIV resistance prediction. Yet 
there is no comprehensive compendium of genetic elements conferring 
resistance and their pairwise and higher-order interactions with each 
other and with the genetic background of bacterial pathogens. Non­
viral genomes are much larger than viral ones, and their abundance

and diversity are such that thousands of genetic differences often exist 
between clinical isolates of the same species, of which perhaps only one 
or a few may contribute to resistance. In addition, new mechanisms may 
emerge in the face of antibiotic deployment or with the release of new 
drugs, and genetic prediction of resistance will inevitably lag behind 
the emergence of unforeseen mechanisms. While confident prediction 
of bacterial antibiotic resistance from sequencing determinants may 
therefore seem daunting, the vast expansion of microbial sequencing 
capacity, combined with analytic methods such as microbial genomewide association studies and machine learning algorithms, offers 
powerful analytical approaches to this “needle in a haystack” problem 
and has permitted remarkable advances in the predictive power of 
sequence determinants to date. Particularly in M. tuberculosis, where 
horizontal gene transfer is minimal and the pathogen is essentially 
restricted to human hosts to facilitate more representative sampling, 
a remarkable fraction of phenotypic resistance can be explained by 
known genetic determinants. Because of these biologic advantages, as 
well as the slow and laborious growth process that impedes traditional 
phenotypic assessment, whole genome sequencing has proven quite 
effective at predicting susceptibility profiles in this organism, to the 
point that the United Kingdom now routinely performs whole genome 
sequencing in parallel with phenotypic antibiotic susceptibility testing 
for M. tuberculosis in what some hope will be a precursor to fully whole 
genome sequencing–based antibiotic susceptibility testing. Even in 
more highly variable pathogens, with sequencing of sufficient numbers 
of susceptible and resistant pathogens, and more sophisticated analyti­
cal algorithms, sequence-based prediction methods are improving in 
predictive accuracy, at least within the geographic region from which 
the test samples have been sequenced.

PART 5
Infectious Diseases
It is important to note that genotype-based analytical methods 
largely identify correlates, not necessarily surrogates or determinants, 
of resistance. In HIV diagnostics, surrogates (i.e., causal determinants 
of resistance) were found to be more reliable predictors than mere 
correlates in expanding sequencing-based resistance prediction to the 
general population. Without a mechanistic understanding of genetic 
resistance, a correlative relationship may be lineage-specific and less 
generalizable. Especially with multiple possible mechanisms of resis­
tance to a given antibiotic and ongoing evolutionary pressure resulting 
in the development and acquisition of new modes of resistance, a geno­
typic approach to diagnosing antibiotic resistance is likely to remain 
challenging and to require ongoing vigilance in constantly correlating 
genotypic with more traditional phenotypic methods. An important 
corollary benefit of a genomic approach to resistance prediction, 
anchored in phenotypic validation, could be the systematic identifica­
tion of outliers with unexplained resistance. These strains can form 
the basis for understanding newly emerging resistance mechanisms, 
which can in turn inform new drug development endeavors. Under­
standing resistance mechanisms may also help direct infection control 
efforts. For instance, the first identification of the mcr-1 (mobilized 
colistin resistance) gene on a plasmid, together with other antibiotic 
resistance determinants, heightened concern about colistin-resistant 
Enterobacterales identified first in China and later elsewhere because 
it implied transferrable multidrug resistance. Early recognition of these 
potentially dangerous strains elucidated the immediate need for strict 
containment protocols. Furthermore, metagenomic studies have high­
lighted the silent carriage of resistance genes in the microbiomes of 
returning travelers and their subsequent spread to household contacts.
In parallel with advancing sequencing technologies, progress in 
computational techniques, bioinformatics and statistics, and data stor­
age, as well as experimental confirmatory testing of hypotheses, will 
be needed to advance toward the ambitious goal of a comprehensive 
compendium of global antibiotic resistance determinants. Open shar­
ing and careful curation of new sequence information will be of para­
mount importance, as will iterative or even continuous comparison of 
predictions with ongoing phenotypic testing to assess performance and 
allow prediction algorithms to keep up with newly evolving or emerg­
ing resistance mechanisms.
We continuously observe the accumulation of new or unanticipated 
modes of resistance from ongoing evolutionary pressure caused by the 

widespread clinical use of antibiotics. Even with MRSA, perhaps the 
best-studied case of antibiotic resistance and a model of relative sim­
plicity with a single known monogenic resistance determinant (mecA), 
a genotype-based approach to resistance detection proved imperfect. 
One limitation was a recall of the initial commercial genotypic resis­
tance assay that was deployed for the identification of MRSA. A clinical 
isolate of S. aureus that emerged in Belgium expressed a variant of the 
mec cassette not detected by the assay’s PCR primers. New primers 
were added to detect this new variant, and the assay was reapproved for 
use. This example illustrates the need for ongoing monitoring of any 
genotypic resistance assay. A second limitation is that a contradiction 
can occur between genotypic and phenotypic evidence for resistance. 
Up to 5% of MSSA strains have been reported to carry a copy of the 
mecA gene that is either nonfunctional or not expressed. Thus, the 
erroneous identification of these strains as MRSA by genotypic detec­
tion would lead to administration of the inferior antibiotic vancomycin 
rather than the preferred β-lactam therapy.
These examples illustrate one of the prime challenges of mov­
ing beyond growth-based assays: genotype is merely a proxy for the 
resistance phenotype that directly informs patient care. Alternative 
approaches currently under development attempt to circumvent the 
limitations of genotypic resistance testing by returning to phenotypic 
assays, albeit more rapid ones. One such approach is informed by 
genomic methods: transcriptional profiles serve as a rapid phenotypic 
signature for antibiotic response. Conceptually, since dying cells are 
transcriptionally distinct from cells fated to survive, susceptible bac­
teria enact different transcriptional profiles after antibiotic exposure 
than resistant ones, independent of the mechanism of resistance. 
These differences can be measured and, since transcription is one of 
the most rapid responses to cell stress (minutes to hours), can be used 
to determine whether cells are resistant or susceptible much more rap­
idly than is possible if growth in the presence of antibiotics is awaited 
(days). Like DNA, RNA can be readily detected through predictable 
rules governing base pairing via either amplification or hybridizationbased methods. Changes in a carefully selected set of transcripts form 
an expression signature that can represent the total cellular response 
to antibiotic without requiring full characterization of the entire 
transcriptome. Preliminary proof-of-concept studies suggest that this 
approach may identify antibiotic susceptibility based on transcriptional 
phenotype much more quickly than is possible with growth-based 
assays. Other rapid phenotype-based approaches to antibiotic suscepti­
bility testing, including automated microscopy, ultrafine measurements 
of mass fluctuations, and others, are under development as well, with 
the former approved for clinical use.
Because of its sensitivity in detecting even very rare nucleic acid 
fragments, sequencing provides an unprecedented depth of study into 
complex populations of cells and tissues. The strength of this depth and 
sensitivity applies not only to the detection of rare, novel pathogens 
in a sea of host signal but also to the identification of heterogeneous 
pathogen subpopulations in a single host that may differ, for example, 
in drug resistance profiles or pathogenesis determinants. For instance, 
recent studies have highlighted the diversification of pathogens in 
chronic bacterial infections, such as Pseudomonas in the lungs of 
patients with cystic fibrosis or M. tuberculosis in disseminated infec­
tion, perhaps allowing for niche specialization within the host. Such 
diversification has long been recognized in chronic viral populations, 
as exemplified by HIV. Future studies will be needed to elucidate the 
clinical significance of these variable subpopulations, even as deep 
sequencing is now providing unprecedented levels of detail about 
majority and minority members of this population.
■
■HOST-BASED DIAGNOSTICS
While pathogen-based diagnostics continue to be the mainstay for con­
firming infection, serologic testing and nonspecific biomarkers—such 
as erythrocyte sedimentation rate, C-reactive protein level, and even 
total white blood cell and neutrophil counts—have long been the basis 
of a strategy for measuring host responses to aid in the diagnosis of 
infection. Even recently identified host biomarkers of bacterial infec­
tion, such as procalcitonin, have fallen short in their versatility, with

positive and negative predictive values that are thus far adequate for 
only a few narrow applications but inadequate for generalized clinical 
use. Here, too, the application of genomics is now being explored to 
improve upon this approach, given the previously described limitations 
of serologic testing and the lack of specificity of protein biomarkers 
identified to date. Rather than using antibody responses as a retrospec­
tive biomarker for infection, recent efforts have focused on transcrip­
tomic analysis of the host response as a new direction with diagnostic 
implications for human disease.
For instance, while pathogen-based diagnostic tests to distinguish 
active from latent tuberculosis infection have proven elusive, the tran­
scriptional profile of circulating white blood cells exhibits a differential 
pattern of expression of nearly 400 transcripts that distinguish active 
from latent tuberculosis; this expression pattern is driven in part by 
changes in interferon-inducible genes in the myeloid lineage. In a 
validation cohort, this transcriptional signature was able to distinguish 
patients with active versus latent disease, to distinguish tuberculosis 
infection from other pulmonary inflammatory states or infections, and 
to track responses to treatment in as little as 2 weeks, with normaliza­
tion of expression toward that of patients without active disease over 
6 months of effective therapy. Such a test could play an important role 
not only in the management of patients but also as a marker of efficacy 
in clinical trials of new therapeutic agents. More recently, a distilled 
three-transcript signature has shown promise for distinguishing active 
from latent tuberculosis, raising hopes of a deployable assay in the near 
term.
Similarly, considerable progress has been made toward identify­
ing host transcriptional signatures in circulating blood cells that 
distinguish viral from bacterial causes of upper respiratory infec­
tion, with better performance characteristics than current clinical 
parameters or available protein biomarkers. Additional host signa­
tures have been reported that distinguish among bacterial infection, 
viral infection, and inflammatory states; identify Lyme disease; 
identify influenza; and even distinguish between gram-positive and 
gram-negative bacterial infections. In some cases, results have been 
extended to different host populations—including adults and chil­
dren, and those with varying immune function—which obviously 
will be critical for generalizing such an approach. Thus, profiling 
of host transcriptional dynamics could augment the information 
obtained from studies of pathogens, both enhancing diagnosis and 
monitoring the progression of illness and the response to therapy. 
The frontier of genomic applications to understand host response 
to infection, with the potential of identifying biomarkers or even 
underlying disease biology, continues to rapidly advance, incor­
porating novel technological and computational approaches such 
as single-cell host transcriptional profiling of infected patients, to 
understand complex processes such as sepsis.
In this era of genome-wide association studies and attempts to move 
toward personalized medicine, genomic approaches are also being 
applied to the identification of host genetic loci and factors that con­
tribute to infection susceptibility. Such loci will have undergone strong 
selection among populations in which the disease is endemic. Through 
identification of the beneficial genetic alleles among individuals who 
survive in such settings, markers for susceptibility or resistance are 
being discovered; these markers can be translated to diagnostic tests 
to identify susceptible individuals to implement preventive or pro­
phylactic interventions. Further, such studies may offer mechanistic 
insight into the pathogenesis of infection and inform new methods 
of therapeutic intervention. Such beneficial genetic associations were 
recognized long before the advent of genomics, as in the protective 
effects of the negative Duffy blood group or heterozygous hemoglobin 
abnormalities against Plasmodium infection. Genomic approaches 
allow more systematic and widespread application of this principle 
to identify not only people with increased susceptibility to prevalent 
diseases (e.g., HIV infection, tuberculosis, and cholera) but also host 
factors that contribute to and thus might predict the severity of dis­
ease. In one recent example, polymorphisms in certain genetic loci, or 
specific circulating autoantibodies, were found to be associated with 
severe COVID-19.

THERAPEUTICS
Genomics has the potential to impact infectious disease therapeutics 
in two ways. By transforming the speed or type of diagnostic informa­
tion that can be attained, it can influence therapeutic decision-making. 
Alternatively, by opening new avenues to a better understanding of 
pathogenesis, providing new ways to disrupt infection, and delineating 
new approaches to antibiotic discovery or characterization of vaccine 
targets, it has the potential to facilitate the development of new thera­
peutic agents.

■
■GENOMIC DIAGNOSTICS INFORMING 
THERAPEUTICS
Efforts at antibiotic discovery are declining, with few new agents in 
the pipeline and even fewer new drugs (in particular, few agents with 
new mechanisms of action) entering the market. This phenomenon 
is due in part to the lack of economic incentives for the private sec­
tor; however, it is also attributable in part to the enormous challenges 
involved in the discovery and development of antibiotics. Most recent 
efforts have focused on broad-spectrum antibiotics; the development 
of a chemical entity that works across an extremely diverse set of 
organisms (i.e., species more divergent from each other than a human 
is from an amoeba) is far more challenging than the development of 
an agent that is designed to target a single bacterial species. Neverthe­
less, the concept of narrow-spectrum antibiotics has heretofore been 
rejected because of the lack of early diagnostic information that would 
guide the selection of such agents. Thus, rapid diagnostics providing 
antibiotic susceptibility information that can guide antibiotic selection 
in real time has the potential to alter and simplify antibiotic strategies 
by allowing a paradigm shift away from broad-spectrum drugs and 
toward narrow-spectrum agents. Such a paradigm shift clearly would 
have additional implications for antibiotic resistance, helping to limit 
selective pressure applied to pathogens and commensal bacteria dur­
ing therapy.
CHAPTER 126
In yet another diagnostic paradigm with the potential to impact 
therapeutic interventions, genomics is opening new avenues to a better 
understanding not only of different host susceptibilities to infection 
but also of different host responses to therapy. For example, the role 
of glucocorticoids in tuberculous meningitis has long been debated. 
Recently, polymorphisms in the human genetic locus LTA4H, which 
encodes a leukotriene-modifying enzyme, were found to modulate the 
inflammatory response to tuberculosis. Patients with tuberculous men­
ingitis who were homozygous for the proinflammatory LTA4H allele 
were most helped by adjunctive glucocorticoid treatment, while those 
who were homozygous for the anti-inflammatory allele were nega­
tively affected by steroid treatment. Steroids have become part of the 
standard of care in tuberculous meningitis, but this study suggests that 
perhaps only a subset of patients benefit from this anti-inflammatory 
adjunct (while others may be harmed) and further suggests a genetic 
means of prospectively identifying this subset. Thus, genomic diagnos­
tic tests may eventually approach the goal of personalized medicine, 
informing diagnosis, prognosis, and treatment decisions by revealing 
the pathogenic potential of the microbe and by detecting individual­
ized host responses to both infection and therapy.
Microbial Genomics and Infectious Disease
■
■GENOMICS IN DRUG AND VACCINE 
DEVELOPMENT
Genomic technologies are dramatically changing research on host–
pathogen interactions, with a goal of increasingly influencing the 
process of therapeutic discovery and development. Sequencing offers 
several possible avenues into antimicrobial therapeutic discovery. First, 
genome-scale molecular methods have paved the way for comprehen­
sive identification of all essential genes encoded by a pathogen, thereby 
systematically identifying critical vulnerabilities within a pathogen that 
could be targeted therapeutically. Second, genome-scale methodologies 
offer rapid ways to address the mechanism of action of newly identified 
hits from compound screens. Whole genome sequencing offers a rapid, 
unbiased way to detect mutations arising in resistant mutants dur­
ing selection. Similarly, transcriptional profiling can provide insights 
into mechanisms of action of new candidate drugs. For instance, the

transcriptional signature of cell wall disruptors (e.g., β-lactams) is 
distinct from that of DNA-damaging agents (e.g., fluoroquinolones) 
or protein synthesis inhibitors (e.g., aminoglycosides). Either approach 
can thus suggest a mechanism of action or flag compounds for pri­
oritization because of a potentially novel activity. In an alternative 
strategy for determining mechanisms of action, genome-wide RNA 
interference or CRISPR screens can be used to identify genes required 
for antimicrobial efficacy. This approach provided new insights into 
the mechanism of action of drugs that have been in use for decades 
for human African trypanosomiasis. Third, sequencing can readily 
identify the most conserved regions of a pathogen’s genomes and cor­
responding gene products; this information is invaluable in narrowing 
antigen candidates in vaccine development. These surface proteins can 
be expressed recombinantly and tested for the ability to elicit a sero­
logic response and protective immunity. This process, termed reverse 
vaccinology, has proved particularly useful for pathogens that are diffi­
cult to culture or poorly immunogenic. After decades of development, 
the utility of this approach became dramatically apparent with the 
rapid development of mRNA vaccines targeting conserved regions of 
the SARS-CoV-2 genome, fueling the most rapid development of a vac­
cine in history. Comparative genomics informed by prior coronavirus 
sequences enabled the first mRNA vaccine design to begin within days 
of the first SARS-CoV-2 sequence being made publicly available, and 
now annual updates are guided by ongoing sequencing of circulating 
variants, as discussed in detail below.

Genomics has been employed in both developing vaccines and 
defining their impact on microbial epidemiology and ecology. Exam­
ples include recent studies of influenza, malaria, S. pneumoniae, and 
HPV following vaccine introduction. Extensive sequencing of influ­
enza viruses has been valuable in understanding the modest efficacy of 
seasonal influenza vaccination, and the combination of genomics and 
antigenic cartography is used to select strains to include in subsequent 
influenza vaccines. Beyond this, sequence conservation informs efforts 
to design more robust pan-influenza or pan-coronavirus vaccines. The 
RTS,S/AS01 malaria vaccine was analyzed by targeted sequencing of 
parasites from vaccinated and control populations during a phase 3 
trial conducted at 11 sites in Africa; these analyses revealed reduced 
vaccine efficacy against parasites with amino acid mutations in the 
circumsporozoite protein targeted by the vaccine. Similarly, studies 
of the more established pneumococcal vaccines (the 7- and 13-valent 
polysaccharide conjugate vaccines, PCV-7 and PCV-13) documented 
serotype replacement: strains targeted by the vaccine have dramatically 
decreased in prevalence following widespread vaccination campaigns. 
Given that specific serotypes of HPV (e.g., types 16 and 18) clearly 
are more strongly associated than others with carcinogenesis, HPV 
vaccines have capitalized on serotype replacement, targeting vaccine 
strains to specifically prevent infection with the more dangerous sero­
types. Such a strategy, informed by pathogen genomics, aims to protect 
individuals and ideally to decrease the circulating burden of more 
virulent strains within society.
PART 5
Infectious Diseases
Large-scale gene content analysis from sequencing or expression 
profiling enables new research directions that provide novel insights 
into the interplay of pathogen and host during infection or coloniza­
tion. One important goal of such research is to suggest new therapeutic 
approaches to disrupt this interaction in favor of the host. Indeed, one 
of the most immediate applications of next-generation sequencing 
technology has come from simply characterizing human pathogens 
and related commensal or environmental strains and then finding 
genomic correlates for pathogenicity. For instance, as Escherichia coli 
varies from a simple nonpathogenic, lab-adapted strain (K-12) to a 
Shiga toxin–producing enterohemorrhagic gastrointestinal pathogen 
(O157:H7), it displays up to a 25% difference in gene content, though 
it is classified as the same species. Similarly, some isolates of Entero­
coccus—a genus notorious for its increasing incidence of resistance to 
common antibiotics such as ampicillin, vancomycin, and aminoglyco­
sides—also contain recently acquired genetic material comprising up 
to 25% of the genome on mobile genetic elements. This fact suggests 
that horizontal gene transfer plays an important role in the organisms’ 
adaptation as nosocomial pathogens. On closer study, this genome 

expansion is associated with loss of CRISPR elements, which protect 
the bacterial genome from invasion by certain foreign genetic mate­
rial, and may thus facilitate the acquisition of antibiotic resistance–
conferring genetic elements. While loss of this regulation appears to 
impose a competitive disadvantage in antibiotic-free environments, 
these drug-resistant strains thrive in the presence of even some of the 
best antienterococcal therapies. In addition to insights gained from 
genome sequencing, transcriptomic and proteomic profiling of patho­
gens under various conditions that mimic colonization or infection, 
including existence as biofilms or in polymicrobial communities, intra­
cellular infection models, antibiotic exposure, and nutrient starvation, 
has begun to reveal novel biologic features that may be targeted by the 
next generation of therapies. At the cutting edge of the host–pathogen 
interface, single-cell transcriptomic methodologies are rapidly increas­
ing in feasibility and extent, revealing previously unknown heterogene­
ity in the potential outcomes of intracellular infection.
Thus, genomic studies are transforming our understanding of infec­
tion, offering evidence of virulence factors or toxins and providing 
insight into ongoing evolution of pathogenicity and drug resistance. 
One goal of such studies is to identify therapeutic agents that can 
disrupt the pathogenic process. There is currently much interest in the 
theoretical concept of antivirulence drugs that inhibit virulence factors 
rather than killing the pathogen outright as a means to intervene in 
infection. Further, with sequencing ever more accessible and efficient, 
ongoing large-scale studies have unprecedented statistical power to 
associate clinical outcomes with pathogen and host genotypes and thus 
to further reveal vulnerabilities in the infection process that can be 
targeted for disruption. Although this is just the beginning, such stud­
ies point to a tantalizing future in which the clinician is armed with 
genomic predictors of infection outcome and therapeutic response to 
guide clinical decision-making.
EPIDEMIOLOGY OF INFECTIOUS DISEASES
Epidemiologic studies of infectious diseases have several main goals: 
to identify and characterize outbreaks, to describe the pattern and 
dynamics of an infectious disease as it spreads through populations, 
and to identify interventions that can limit or reduce the burden of 
disease. One classic, paradigmatic example is John Snow’s elucidation 
of the origin of the 1854 London cholera outbreak. Snow used careful 
geographic mapping of cases to determine that the likely source of the 
outbreak was contaminated water from the Broad Street pump, and 
by removing the pump handle, he aborted the outbreak. Whereas that 
effort was undertaken without knowledge of the causative agent of 
cholera, advances in microbiology and genomics have expanded the 
purview of epidemiology to consider not just the disease but also the 
pathogen and its genetic variants, its virulence factors, and the complex 
relationships between microbial and host populations.
Through use of genomic tools such as high-throughput sequenc­
ing, the diversity of a microbial population can be rapidly described 
with unprecedented resolution, with discrimination between isolates 
that have single-nucleotide differences across the entire genome and 
advancement beyond prior approaches that relied on phenotypes 
(such as antibiotic susceptibility profiles) or genetic markers (such as 
multilocus sequence typing). The development of statistical methods 
grounded in molecular genetics and evolutionary theory has estab­
lished analytical approaches that translate descriptions of microbial 
population diversity and structure into descriptions of the origin and 
history of pathogen spread. By linking phylogenetic reconstruction 
with epidemiologic and demographic data, genomic epidemiology 
presents the opportunity to track transmission from person to person 
and across demographic and geographic boundaries, to infer transmis­
sion patterns of both pathogens and sequence elements that confer 
phenotypes of interest, and to estimate the transmission dynamics of 
outbreaks.
■
■TRANSMISSION NETWORKS
Whole genome sequencing of pathogen genomes can be used to infer 
transmission and identify point-source outbreaks. As reported in a 
seminal paper in 2010, a study of MRSA in a Thai hospital demonstrated

the use of whole genome sequencing in reconstructing the transmis­
sion of a pathogen from patient to patient by integrating the analysis 
of accumulation of mutations over time with the dates and hospital 
locations of the infected individuals. Since then, multiple instances of 
the use of whole genome sequencing to define and motivate interven­
tions aimed at interrupting transmission chains have been reported. 
In another MRSA outbreak in a special-care baby unit in Cambridge, 
United Kingdom, whole genome sequencing extended the traditional 
infection control analysis, which relies on typing organisms by their 
antibiotic susceptibilities, to sequencing of isolates from clinical sam­
ples. This approach identified an otherwise unrecognized outbreak of 
a specific MRSA strain that was occurring against a background of the 
usual pattern of infection caused by a diverse circulating population of 
MRSA strains. The analysis showed evidence of transmission among 
mothers within the special-care baby unit and in the community and 
demonstrated the key role of MRSA carriage in a single health care 
provider in the persistence of the outbreak. In yet another example, in 
response to the observation of 18 cases of infection by carbapenemaseproducing Klebsiella pneumoniae over 6 months at the National Insti­
tutes of Health Clinical Research Center, genome sequencing of the 
isolates was used to discriminate between the possibilities that these 
cases represented multiple, independent introductions into the health 
care system or a single introduction with subsequent transmission. On 
the basis of network and phylogenetic analysis of genomic and epide­
miologic data, the authors reconstructed the likely relationships among 
the isolates from patient to patient, demonstrating the nosocomial 
spread of a single resistant Klebsiella strain. Similar approaches have 
elucidated the extent to which presumed nosocomial C. difficile, VRE, 
and carbapenem-resistant Enterobacterales represent within-hospital 
transmission—of bacterial strains or of plasmids and mobile genetic 
elements—versus independent acquisitions of unrelated pathogens. 
With these demonstrations of the potential contribution of genomics 
to hospital infection-control efforts, an important avenue of research 
seeks to develop statistical methods with which to ascertain when such 
tools are useful and their cost-effectiveness when compared with that 
of current nongenomic approaches.
Genome sequencing of clinical specimens of viruses has been used 
to understand their patterns of spread and the clinical and epidemio­
logic implications of genetic variants. As RNA viruses use an errorprone RNA-dependent RNA polymerase, they accumulate mutations 
at a rapid rate, facilitating inferences about the dynamics and patterns 
of spread. These tools have been applied to the study of outbreaks of 
well-known viruses, such as recent outbreaks of yellow fever in South 
America and mumps in the United States, as well as recent zoonotic 
pathogens, such as the coronaviruses MERS-CoV and SARS-CoV. The 
sequencing of SARS-CoV-2 in the context of the pandemic has offered 
a powerful example of the contributions that genomic epidemiology 
can make to, and its increasingly central role in, tracking the spread of 
a pathogen both locally and globally and informing policy and public 
health decision-making. Moreover, tracking variants and combining 
the genome sequences with epidemiologic line-list data enable inves­
tigation of the extent to which new variants cause differing symptom 
profiles and levels of severe disease.
The uncovering of unexpected transmission events by genomic epi­
demiology studies is motivating investigations into pathogen ecology 
and modes of transmission. Whole genome sequencing established 
the clonality of several high-profile outbreaks, enabling the discovery 
of dangerous contaminants such as Burkholderia pseudomallei in aro­
matherapy spray, Pseudomonas in eyedrops, Exserohilum in injectable 
corticosteroids, Fusarium in epidural anesthetic preparations, E. coli 
O157:H7 in beef, and Mycobacterium chimaera in the temperaturecontrol systems used during cardiac bypass. Each of these outbreaks 
caused considerable morbidity and in some cases mortality before 
being localized and prevented on the basis of investigations informed 
and accelerated by genomic epidemiology.
As more studies aim to carefully define the origins and spread of 
infectious agents using the high-resolution lens of whole genome 
sequencing, fundamental questions arise about the diversity of infect­
ing and colonizing microbial populations. Traditional microbiologic 

methods include taking a single colony from a growth plate as repre­
sentative of the population. However, the more diverse the colonizing 
or infecting pathogen population, the less representative these individ­
ual isolates are and the greater the possibility for introducing error into 
whole genome sequencing–based methods while reconstructing trans­
mission. Sequencing studies of multiple colonies of an S. aureus strain 
colonizing a single individual showed a “cloud” of diversity. What is 
the clinical significance of this diversity? What are the processes that 
generate and limit it? What amount of diversity is transmitted under 
different conditions and routes of transmission? How do the answers 
to these questions vary by infectious organism, type of infection, host, 
and response to treatment? More comprehensive descriptions of diver­
sity, population dynamics, transmission bottlenecks, and the forces that 
shape and influence the growth and spread of microbial populations 
will be a critically important focus of future investigations.

■
■ORIGINS AND DYNAMICS OF PATHOGEN SPREAD
In addition to reconstructing the transmission chains of local out­
breaks, genomics-based epidemiologic methods reveal broad-scale 
geographic and temporal spread of pathogens. Four examples include 
the origins of cholera in Haiti, the history of HIV-1 group M, the 
spread of Ebola in West Africa, and the timing and nature of spread of 
the zoonotic COVID-19 pandemic, which brought genomic tracking of 
pathogen variants to the forefront of public consciousness for a time. 
Cholera, a dehydrating diarrheal illness caused by infection with Vibrio 
cholerae, first spread worldwide from the Indian subcontinent in the 
1800s and has since caused seven pandemics; the seventh pandemic 
has been ongoing since the 1960s. An investigation into the geographic 
patterns of cholera spread in the seventh pandemic used genome 
sequences from a global collection of 154 V. cholerae strains repre­
senting isolates from 1957 to 2010. This investigation revealed that 
the seventh pandemic has comprised at least three overlapping waves 
spreading out from the Indian subcontinent (Fig. 126-3A). Further, 
analysis of the genome of an isolate of V. cholerae from the 2010 out­
break of cholera in Haiti showed it to be more closely related to isolates 
from South Asia than to isolates from neighboring Latin America, sup­
porting the hypothesis that the outbreak was derived from V. cholerae 
introduced into Haiti by human travel (likely from Nepal) rather than 
by environmental or more geographically proximal sources. A subse­
quent study that dated the time to the most recent common ancestor 
of a population of V. cholerae isolates from Haiti provided further sup­
port for a single point-source introduction from Nepal. Application of 
similar methods that integrate pathogen genome sequences, mutation 
rates, geographic locations, and phylogenetic inference to HIV-1 group 
M dated the origin of the virus to the 1920s and the city of Kinshasa 
(then called Leopoldville), the capital of the Democratic Republic of 
the Congo (then called the Belgian Congo). This work established an 
understanding of how a boom in industry and a city with extensive 
railroad connections provide a scaffolding along which a virus can 
rapidly spread geographically.
CHAPTER 126
Microbial Genomics and Infectious Disease
Genome sequencing has proven invaluable in understanding the 
geographic, demographic, climatic, and administrative factors that 
drove, sustained, and limited the 2013–2016 Ebola outbreak that rav­
aged West Africa (Fig. 126-3B) as well as the factors and patterns of 
transmission of Zika virus in the Americas and most recently the tim­
ing and origins of SARS-CoV-2 transmission in human populations. 
With the rapid availability of the SARS-CoV-2 viral genome sequence, 
data from a set of cases from early in the pandemic enabled inference of 
the time to the most recent common ancestor, supporting that SARSCoV-2 entered circulation in human populations in Wuhan, China, 
sometime in late November to early December of 2019. Subsequently, 
large, coordinated sequencing networks have been able to recreate its 
patterns of global spread, discover new variants, and monitor as these 
variants disseminate. For the first time, scientists and even the public 
were able to watch viral evolution through a population in almost real 
time. First, a single-nucleotide polymorphism (D614G in the viral 
spike protein) displaced the original strain in early 2020, followed in 
subsequent years by other variants such as Alpha, Delta, Omicron, 
and then various Omicron sublineages becoming dominant either

A
1.00
0.75
PART 5
Infectious Diseases
Variant Fraction
Omicron (and subvariants)
Founder
strain
0.50
0.25
0.00
Oct 2020
Apr 2021 Oct 2021 Apr 2022 Oct 2022 Apr 2023 Oct 2023
C
B
FIGURE 126-3  A. Transmission events inferred from phylogenetic reconstruction of 154 Vibrio cholerae isolates from the seventh cholera pandemic. Date ranges represent 
estimated time to the most recent common ancestor for strains transmitted from source to destination locations, based on a Bayesian model of the phylogeny. (Reprinted 
by permission from the Nature Publishing Group, Nature 477:462. Evidence for several waves of global transmission in the seventh cholera pandemic, A Mutreja et al © 
2011.) B. Inferred Ebola virus spread in West Africa (Liberia, red; Guinea, green; and Sierra Leone, blue) by phylogeographic methods using virus genome sequences, dates, 
and an evolutionary model. The lines reflect spread between population centroids of each administrative region, going from the thin end to the thick end and colored by a 
time scale. (Reprinted by permission from Nature Publishing Group, Nature 544:309. Virus genomes reveal factors that spread and sustained the Ebola epidemic, G Dudas 
et al © 2017.) C. SARS-CoV-2 variant proportions in the United States from the start of the COVID-19 pandemic through March 6, 2024. Each major variant family that rose to 
dominance in the United States is labeled. (Variant proportions are taken from https://covariants.org, accessed April 21, 2024, which derives underlying variant data from 
the Global Initiative on Sharing All Influenza Data [GISAID].)
regionally or globally (Fig. 126-3C) as their mutation profile conferred 
advantages in increasingly immune populations.
One compelling area of innovation in molecular epidemiology is 
the development of wastewater-based surveillance. Wastewater sur­
veillance has been used to monitor for the appearance of poliovirus, 
among others, but its widespread and widely publicized use for SARSCoV-2 in “nowcasting” quantitative estimates of epidemic trends has 
galvanized interest in using wastewater to monitor disease trends and 
pathogens and their characteristics. The U.S. Centers for Disease Con­
trol and Prevention has sought to establish a national wastewater sur­
veillance system, building on the SARS-CoV-2 work. These tools can 
help provide a population-level view of the first appearance of a patho­
gen, variant, or antibiotic resistance determinant and can also help 
monitor the population burden and hence the epidemic curve. Work 
remains to develop robust, repeatable, and accurate methods across 
many pathogens, particularly those that can replicate or exchange DNA 

Delta
Alpha
in wastewater environments. Nonetheless, this data source has great 
promise for providing early signals to decision-makers to help inform 
public health actions.
These efforts illustrate the remarkable promise of genome sequenc­
ing in improving outbreak response strategies by elucidating previously 
hidden origins and paths of disease spread and details of the forces 
that shape epidemics. The combination of in-the-field sequencing 
with portable sequencing platforms, rapid data sharing, and rapid 
open analysis through sites such as nextstrain.org offers a paradigm by 
which real-time genomic epidemiology, including wastewater-based 
epidemiology, may contribute to “weather maps,” enabling prediction 
of epidemic patterns in space and time and thus providing guidance for 
public health interventions to slow or control their spread.
Increasing numbers of investigations into the spread of many patho­
gens are contributing to a growing atlas of maps describing routes, 
patterns, and tempos of microbial diversification and dissemination,

not just for agents of emerging infectious diseases but for common 
pathogens as well. Such studies will create a vast amount of data that 
can be used to investigate the diversity and microbiologic links within 
distinct niches and the patterns of spread from one niche to another. 
The increasingly broad adoption of genome sequencing by health 
care and public health institutions ensures that the available catalog of 
genome sequences and associated epidemiologic data will grow very 
rapidly. For example, updating from the pulsed-field gel electropho­
resis techniques that have been used to define strains of food-borne 
pathogens since the late 1980s, PulseNet—the U.S. Centers for Disease 
Control and Prevention network for monitoring these pathogens—has 
instituted routine genome sequencing. The COVID-19 pandemic fur­
ther underscores the importance of building a new global public health 
infrastructure in which sequencing plays a central role to facilitate early 
disease discovery, rapid and close tracking of spread, and develop­
ment of diagnostics and targeted effective interventions. With higherresolution description of microbial diversity and of the dynamics of 
that diversity over time and across epidemiologic and demographic 
boundaries and evolutionary niches, we will gain even greater insights 
into the relationships of transmission routes and patterns of historical 
spread.
■
■EPIDEMIC POTENTIAL
Defining pathogen transmissibility is a critical step in the development 
of public health surveillance and intervention strategies because this 
information can help to predict the epidemic potential of an outbreak. 
Transmissibility can be estimated by a variety of methods, including 
inference from the growth rate of an epidemic and the generation 
time of an infection (the mean interval between infection of an index 
case and infection of the people infected by that index case). Genome 
sequencing and analysis of a well-sampled population provide another 
method by which to derive similar fundamental epidemiologic param­
eters. One key measure of transmissibility is the basic reproduction 
number, defined as the number of secondary infections generated from 
a single primary infectious case. When the basic reproduction number 
is >1, an outbreak has epidemic potential; when it is <1, the outbreak 
will become extinct. On the basis of sequences from influenza virus 
samples obtained from infected patients very early in the 2009 H1N1 
influenza pandemic, the basic reproduction number was estimated 
through a population genomic analysis at 1.2; this result provided 
greater confidence to estimates derived by traditional epidemiologic 
data, which ranged from 1.4 to 1.6. In addition, with the assumption 
of a molecular clock model, sequences of H1N1 samples together with 
information about when and where the samples were obtained have 
been used to estimate the date and location of the pandemic’s origin, 
providing insight into disease origins and dynamics. Integrating viral 
genomics with other types of data—such as the timing and nature of 
mitigation efforts and the impact of those efforts on mobility—will 
expand the toolkit with which to assess the impact of public health 
interventions on slowing and controlling disease spread. These tools 
may also be applied to institutional infection control: with the devel­
opment of return-to-work protocols, sequencing offers one option to 
help learn the extent to which infections arose from within-institution 
spread. Because the magnitude and intensity of the public health 
response are guided by the predicted size of an outbreak, the ability 
of genomic methods to cast light on a pathogen’s origin and epidemic 
potential adds an important dimension to the contributions of these 
methods to infectious disease epidemiology.
■
■PATHOGEN EVOLUTION
Beyond describing transmission and dynamics, pathogen genomics 
can provide insight into the evolution of pathogens and the interactions 
of selective pressures, the host, and pathogen populations, which can 
have implications for clinical decision-making and the development 
of vaccines and therapeutics. From a clinical perspective, this process 
is central to the acquisition of antibiotic resistance, the generation of 
increasing pathogenicity or new virulence traits, the evasion of host 
immunity and clearance (leading to chronic infection), and vaccine 
efficacy.

Microbial genomes evolve through a variety of mechanisms, includ­
ing mutation, duplication, insertion, deletion, recombination, and 
horizontal gene transfer. Segmented viruses (e.g., influenza virus) can 
reassort gene segments within multiply infected cells. The pandemic 
2009 H1N1 influenza A virus, for example, appears to have been 
generated through reassortment of several avian, swine, and human 
influenza strains. Such potential for the evolution of novel pandemic 
strains has precipitated concern about the possible evolution to trans­
missibility of virulent strains that have been associated with high 
mortality rates but have not yet exhibited efficient human infectivity. 
Experiments with H5N1 avian influenza, for example, have defined 
five mutations that render it transmissible, at least in ferrets—the 
animal model system for human influenza. Studies that examine the 
genomes of pathogens collected longitudinally from individual infec­
tions have similarly demonstrated their evolution as they adapt to host 
environments and new immune and therapeutic pressures.

The continuous antigenic evolution of seasonal influenza offers an 
example of how studies of pathogen evolution can impact surveillance 
and vaccine development. Frequent updates to the annual influenza 
vaccine are needed to ensure protection against the dominant strains. 
These updates are based on anticipating which viral populations from 
a pool of substantial locally and globally diverse circulating viruses will 
predominate in the upcoming season. Toward that end, sequencingbased studies of influenza virus dynamics have shed light on the global 
spread of influenza, providing concrete data on patterns of spread and 
helping to elucidate the origins, emergence, and circulation of novel 
strains. Through analysis of >1000 influenza A H3N2 virus isolates 
over the 2002–2007 influenza seasons, Southeast Asia was identified as 
the usual site from which diversity originates and spreads worldwide. 
Further studies of global isolate collections have shed further light on 
the diversity of circulating virus, showing that some strains persist 
and circulate outside of Asia for multiple seasons. Similar studies with 
SARS-CoV-2, including genome sequences from orders of magnitude 
more clinical specimens than the aforementioned influenza study, have 
helped identify where new SARS-CoV-2 variants have emerged and 
informed vaccine composition.
CHAPTER 126
Microbial Genomics and Infectious Disease
Not only do genomic epidemiology studies have the potential to 
help guide vaccine selection and development, but they also help to 
track what happens to pathogens circulating in the population in 
response to vaccination. By describing pathogen evolution under the 
selective pressure of a vaccinated population, such studies can play a 
key role in surveillance and identification of virulence determinants 
and perhaps may even help to predict the future evolution of escape 
from vaccine protection. The seven-valent pneumococcal conjugate 
vaccine (PCV-7) targeted the seven serotypes of S. pneumoniae respon­
sible for the majority of invasive disease at the time of its introduction 
in 2000; since then, PCV-7 has dramatically reduced the incidence of 
pneumococcal disease and mortality. However, sequencing of >600 
Massachusetts pneumococcal isolates from 2001 to 2007 has shown 
that, in the pneumococcal population, previously rare nonvaccine 
serotypes are replacing vaccine serotypes and that some vaccine strains 
have persisted despite vaccination by recombining the vaccine-targeted 
capsule locus with a cassette of capsule genes from non-vaccine-targeted 
serotypes. Studying the virulence of these persistently circulating 
strains can help to rationally update vaccine composition.
The large collections of pathogen genome sequences are driving 
development of tools to decipher the genetic basis for antibiotic resis­
tance, virulence, and infection risk. Some pathogens have distinct types 
of clinical manifestations, the basis for which we are just beginning to 
unravel with the aid of genomics. For example, Listeria is a food-borne 
pathogen that can cause both central nervous system infections and 
maternal/neonatal infections. Although all Listeria isolates are treated 
the same from a public health perspective, variation in outcomes 
exists and appears to be linked to the strains’ genomic background. 
Molecular analysis of a national reference laboratory’s collections of 
well-characterized specimens, based on the fraction of immunocom­
petent people in which they caused disease, revealed that some clonal 
complexes of Listeria appear to be more virulent than others. Linking 
epidemiology and comparative genomics then enabled enumeration

of putative virulence factors that contribute to the clinical phenotypes 
as well as identification and confirmation of a novel gene cluster that 
mediates central nervous system tropism. This approach illustrates 
progress toward a future in which we can link pathogen identification 
with risk, thereby informing resource use and allocation.

GLOBAL CONSIDERATIONS
While cutting-edge genomic technologies are largely implemented in 
the developed world, their application to infectious diseases perhaps 
offers the biggest potential impact in less developed regions where the 
burden of these infections is greatest. This globalization of genomic 
technology and its extensions has already begun in each of the areas 
of focus highlighted in this chapter; it has occurred both through the 
application of advanced technologies to samples collected in the devel­
oping world and through the adaptation and importation of technolo­
gies directly to the developing world for on-site implementation as they 
become more globally accessible.
Genomic characterization of the pathogens responsible for such 
important global illnesses, such as tuberculosis, malaria, trypanoso­
miasis, cholera, and COVID-19, has led to insights in diagnosis, treat­
ment, and infection control. For instance, with the increasing burden of 
drug-resistant tuberculosis in the developing world, a molecular diag­
nostic test has been developed to detect rifampin-resistant tuberculo­
sis. The genetic basis for rifampin resistance has been well defined by 
targeted sequencing: characteristic mutations in the molecular target of 
rifampin, RNA polymerase, account for the vast majority of instances 
of rifampin resistance. At least in areas that can afford to implement it, 
a rapid, automated PCR assay that can detect both M. tuberculosis and a 
rifampin-resistant allele of RNA polymerase directly in clinical samples 
has been implemented in parts of Africa and Asia, transforming the 
recognition and management of incident tuberculosis and multidrug 
resistance where they are most prevalent. Since rifampin resistance 
frequently accompanies resistance to other antibiotics, this test can 
suggest the presence of multidrug-resistant M. tuberculosis within 
hours instead of weeks, without the infrastructure required for culture.
PART 5
Infectious Diseases
High-resolution genomic tracking of the spread of epidemics—from 
cholera to Ebola to Zika to COVID-19—has yielded insights into 
which public health measures may prove most effective in controlling 
local epidemics. Many genomic tracking efforts have involved close 
collaborations with local scientists and public health officials, and 
considerable investment in sequencing infrastructure in sub-Saharan 
Africa has made on-location epidemic tracking in the event of another 
such outbreak feasible. Such investment can not only enable real-time 
outbreak recognition and tracking but also provide the infrastructure 
needed to capitalize on the many other benefits of high-throughput 
sequencing as they are developed. The early returns of such invest­
ments are exemplified by the rapid reporting of genome sequences 
for SARS-CoV-2, with substantial insights from sequencing efforts 
across the world; perhaps most notably, the Omicron variant was first 
identified in South Africa, enabling preparations for what became a 
global sweep in the subsequent weeks. Overall, sequencing efforts have 
become cheaper and have moved closer to point-of-care with each 
passing year. As these technologies synergize with efforts to globalize 
information-technology resources, global implementation of genomic 
methods promises to spread state-of-the-art methods for diagnosis, 
treatment, and epidemic tracking of infections to areas that need these 
capabilities the most.
GENOMICS AND THE COVID-19 PANDEMIC
The COVID-19 pandemic, which began in 2019 and spread worldwide 
in 2020, resulted in hundreds of millions of documented infections 
and millions of deaths and serves as a prime example of the pandemic 
potential of infectious pathogens. It also demonstrated the central 
role that genomic tools now play in response to infectious outbreaks, 
ranging from enabling diagnostics and vaccines to tracking evolution, 
virulence, and transmissibility of the pathogen. The rapid discovery of 
SARS-CoV-2 and sequencing of its genome was complete within weeks 
of the recognition of the clinical syndrome. The rapid public sharing of 
this genome sequence led directly to two key interventions: diagnostic 

assay development via RT-qPCR and vaccine design. Crucially, vaccine 
development was informed by homology of the SARS-CoV-2 sequence 
to SARS and MERS coronaviruses. The dominant antigen of those 
viruses, the surface protein Spike, was well characterized, enabling 
the design of the first SARS-CoV-2 vaccines to begin the day after 
the genome sequence was shared. The progress of the most rapidly 
developed and validated vaccine in human history was unquestionably 
accelerated by genomic technology. Whole genome sequencing has 
also played a large role in outbreak tracking and confirmation of case 
clusters in institutional settings such as hospitals or congregate living 
facilities, in helping to distinguish reinfections from recrudescence 
or prolonged viral shedding, in monitoring spread through societ­
ies, and in tracking pathogen evolution, including the emergence of 
new variants of concern with altered transmissibility, severity, and/
or partial evasion of the immune response generated to prior versions 
of the virus, vaccines, or monoclonal antibody therapeutics. Finally, 
cutting edge genomic methods including single-cell transcriptional 
profiling and genome-wide association studies are contributing to our 
understanding of the wide variability in outcomes of SARS-CoV-2 
infection, ranging from asymptomatic carriage to death. Overall, just 
as the global response to the COVID-19 pandemic underscores the 
indispensable role that genomics methods have come to play in the 
clinical and public health management of infectious diseases, the dev­
astating impact of this pandemic reveals the urgent need for further 
development and implementation of tools for disease surveillance and 
response.
SUMMARY
By illuminating the genetic information that encodes the most fun­
damental processes of life, genomic technologies are transforming 
many aspects of medicine. In infectious diseases, methods such as 
next-generation sequencing and genome-scale expression analysis 
offer information of unprecedented depth about individual microbes 
as well as microbial communities. This information is expanding our 
understanding of the interactions of microorganisms with each other, 
their human hosts, and the environment. Despite technological and 
financial barriers that slow the widespread adoption of large-scale 
pathogen sequencing in clinical and public health settings, genomic 
methodologies have utterly transformed the research landscape in 
infectious disease and are beginning to make meaningful inroads 
into clinical settings. As even vaster amounts of data are generated, 
innovations in data storage, development of bioinformatics tools to 
manipulate the data, standardization of methods, and training of 
end-users in both the research and clinical realms will be required. 
The cost-effectiveness and applicability of whole genome sequenc­
ing, particularly in the clinic, remain to be studied, and studies of the 
impact of genome sequencing on patient outcomes will be needed to 
clarify the contexts in which these new methodologies can make the 
greatest contributions to patient well-being. The ongoing efforts to 
overcome limitations through collaboration, teaching, and reduction of 
financial obstacles should be applauded and expanded. With advances 
in genomic technologies and computational analysis, our ability to 
detect, characterize, treat, monitor, prevent, and control infections 
has advanced rapidly in recent years and will continue to do so, with 
the hope of heralding a new era where the clinician is better armed to 
combat infection and promote human health.
■
■FURTHER READING
Bullman S et al: Emerging concepts and technologies for the discov­
ery of microorganisms involved in human disease. Annu Rev Pathol 
12:217, 2017.
Burnham CD et al: Diagnosing antimicrobial resistance. Nat Rev 
Microbiol 15:697, 2017.
Levy JI et al: Wastewater surveillance for public health. Science 379:26, 
2023.
CRyPTIC Consortium et al: Prediction of susceptibility to first-line 
tuberculosis drugs by DNA sequencing. N Engl J Med 379:1403, 2018.
Dudas G et al: Virus genomes reveal factors that spread and sustained 
the Ebola epidemic. Nature 544:309, 2017.