Skip to main content

04 - 126 Microbial Genomics and Infectious Disease

126 Microbial Genomics and Infectious Disease

whereupon they can infect other V. cholerae. Consequently, the bacte­ riophages regulate the abundance of pathogens, with ramifications for the epidemiology of the disease. Whether bacteriophages contribute to cyclical control of other pathogens is currently unknown. ■ ■SUMMARY Bacterial pathogens display myriad mechanisms of colonization, adhe­ sion, invasion, dissemination, and manipulation of host pathways. Infectious diseases result when pathogens successfully establish them­ selves within the host. Symptoms are usually the result of ensuing fights between the pathogen and the immune system. The incredible diversity of virulence determinants highlights the success of the host in combat­ ing infection. Further elucidating how bacteria cause infection will better our understanding of both human and microbial biology and will provide new opportunities for successful therapeutic intervention against infectious and inflammatory diseases. ■ ■FURTHER READING Costa TRD et al: Secretion systems in gram-negative bacteria: Struc­ tural and mechanistic insights. Nat Rev Microbiol 13:343, 2015. Fitzgerald KA, Kagan JC: Toll-like receptors and the control of immunity. Cell 180:1044, 2020. Galluzzi L et al: Molecular mechanisms of cell death: Recommenda­ tions of the Nomenclature Committee on Cell Death 2018. Cell Death Differ 25:486, 2018. Jastrab JB, Kagan JC: Strategies of bacterial detection by inflamma­ somes. Cell Chem Biol 31:835, 2024. Lamason RL, Welch MD: Actin-based motility and cell-to-cell spread of bacterial pathogens. Curr Opin Microbiol 35:48, 2017. Remick BC et al: Effector-triggered immunity. Annu Rev Immunol 41:453, 2023. Ribet D, Cossart P: How bacterial pathogens colonize their hosts and invade deeper tissues. Microbes Infect 17:173, 2015. Rivera-Cuevas R et al: Human guanylate-binding proteins in intra­ cellular pathogen detection, destruction, and host cell death induc­ tion. Curr Opin Immunol 84:102373, 2023. Stones DH, Krachler AM: Against the tide: The role of bacterial adhesion in host colonization. Biochem Soc Trans 44:1571, 2016. Tsolis RM, Bäumler AJ: Gastrointestinal host–pathogen interaction in the age of microbiome research. Curr Opin Microbiol 53:78, 2020. Wu YW, Li F: Bacterial interaction with host autophagy. Virulence 10:352, 2019. Roby P. Bhattacharyya, Yonatan H. Grad,

Deborah T. Hung

Microbial Genomics and Infectious Disease Just as microscopy opened the worlds of microbiology by providing a tool with which to visualize microorganisms, technological advances in genomics provide microbiologists with powerful methods to character­ ize the genetic map that underlies all microbes with unprecedented resolution, thereby illuminating their complex and dynamic interac­ tions with each other, the environment, and human health. The field of infectious disease genomics encompasses a vast frontier of active research that is transforming public health and the clinical practice of infectious diseases. While genetics has long played a key role in elucidating the process of infection and impacting clinical infectious diseases, the ability to extend our thinking and approaches beyond the study of single genes to an examination of the sequence, structure,

and function of entire genomes allows us to identify new possibilities for research and opportunities to change clinical practice and disease surveillance. From the development of diagnostics with unprecedented sensitivity, specificity, and speed to the design of novel public health surveillance tools and interventions, technical and statistical genomic innovations are reshaping our understanding of the influence of the microbial world on human health and providing us with new tools to diagnose, track, and combat infection. In this chapter, we explore the application of genomics methods to microbial pathogens and the infections they cause. We discuss innovations that are driving the development of diagnostic approaches as well as the discovery of new pathogens, providing insight into novel therapeutic approaches and paradigms, and advancing methods in infectious disease epidemiol­ ogy and the study of pathogen evolution that can inform infection control measures, public health responses to outbreaks, and vaccine development. We draw on examples in current practice and from the recent scientific literature as signposts that point toward ways in which pathogen genomics may influence infectious diseases in the short and long terms, and we highlight applications to SARS-CoV-2 and the COVID-19 pandemic. Table 126-1 provides definitions for a selection of important terms used in genomics.

MICROBIAL DIAGNOSTICS The basic goals of a clinical microbiology laboratory are to establish the presence of a pathogen in a clinical sample, to identify the pathogen, and, when possible, to provide other information that can help guide clinical management and affect prognosis, such as antibiotic suscep­ tibility profiles or the presence of virulence factors. To date, clinical microbiology laboratories have largely approached these goals phe­ notypically by growth-based assays and biochemical testing. Bacteria, for instance, were historically grouped algorithmically into species by their characteristic microscopic appearance, nutrient requirements for growth, and ability to catalyze certain reactions. Antibiotic susceptibil­ ity is still determined in most cases by assessing bacterial growth in the presence of antibiotic. CHAPTER 126 Microbial Genomics and Infectious Disease With the sequencing revolution paving the way to easy access of complete pathogen genomes, we can more systematically define the genetic basis for these observable phenotypes. Compared with tradi­ tional growth-based methods for bacterial diagnostics that dominate the clinical microbiology laboratory, nucleic acid–based diagnostics that build on this genomic information promise improved speed, sen­ sitivity, specificity, and breadth of information. Bridging clinical and research laboratories, adaptations of genomic technologies have begun to deliver on this promise (Table 126-2). ■ ■HISTORICAL LIMITATIONS AND PROGRESS THROUGH GENETIC APPROACHES The molecular diagnostics revolution in the clinical microbiology labo­ ratory is well under way, born of necessity in the effort to identify and characterize microbes that are refractory to traditional culture meth­ ods. Historically, diagnosis of many so-called unculturable pathogens has relied largely on serology and antigen detection. However, these methods provide only limited clinical information because of their suboptimal sensitivity and specificity, and, for serology, the long delays that diminish their utility for real-time patient management and the inability to characterize pathogens beyond identifying past exposure. Newer tests to detect pathogens based on nucleic acid content have already offered improvements in the select cases in which they have been applied. Unlike direct pathogen detection, serologic diagnosis—measurement of the host’s response to pathogen exposure—can typically be made only in retrospect, requiring both acute- and convalescent-phase serum samples. For chronic infections, distinguishing active from latent infection or identifying repeat exposure from serology alone can be difficult or impossible, depending on the syndrome. In addition, serologic diagnosis is variably sensitive, depending on the organism and the patient’s immune status. For instance, tuberculosis is notori­ ously difficult to identify by serologic methods; tuberculin skin testing using purified protein derivative (PPD) is especially insensitive in

TABLE 126-1  Glossary of Selected Terms in Genomics TERM DEFINITION Contig A DNA sequence representing a continuous fragment of a genome, assembled from overlapping sequences; relevant for de novo assembly of sequence data that do not align to previously sequenced genomes Genome The entire set of heritable genetic material within an organism Horizontal gene transfer The transfer of genes between organisms through mechanisms other than by clonal descent, such as through transformation, conjugation, or transduction Genomic epidemiology An approach to inferring and reconstructing transmission events, population dynamics, and epidemiological patterns using comparative analysis of microbial genome sequences Metagenomics Analysis of genetic material from multiple species directly from primary samples without requiring prior culture steps Microarray A collection of DNA oligonucleotides (“oligos”) spatially arranged on a solid surface and used to detect or quantify sequences in a sample of interest that are complementary (and therefore bind) to one or more of the arrayed oligos Microbial genomewide association study (GWAS) An analytic framework to test statistical associations between microbial genotypes and phenotypes of interest, such as antibiotic resistance and virulence Mobile genetic elements DNA elements that can move within a genome and can be transferred between genomes through horizontal gene transfer (e.g., plasmids, bacteriophages, and transposons) Multilocus sequence typing A method for typing organisms based on DNA sequence fragments from a prespecified set of genes Next-generation sequencing High-throughput sequencing using a parallelized sequencing process that produces millions of sequences concurrently, far beyond the capacity of prior dyeterminator methods PART 5 Infectious Diseases Nucleic acid amplification test (NAAT) A biochemical assay that evaluates for the presence of a particular string of nucleic acids through amplification by one of several methods, including polymerase and ligase chain reactions Polymerase chain reaction (PCR) A type of NAAT used to amplify a specific region of DNA by means of specific oligonucleotide primers and a DNA polymerase Single-nucleotide polymorphism (SNP) Point mutations in DNA, the number of which in different microbial isolates is a measure of their genetic distance from one another Transcriptome The catalog of the full set of messenger RNA (mRNA) transcripts from a cell or organism, which are typically measured by microarray or by next-generation sequencing of complementary DNA (cDNA) via a process called RNA-Seq Whole genome sequencing (WGS) A process that determines the full DNA sequence of an organism’s genome; has been greatly facilitated by nextgeneration sequencing technology active disease and possibly cross-reactive with vaccines or other myco­ bacteria. Even the newer interferon γ release assays (IGRAs), which measure cytokine release from T lymphocytes in response to Mycobac­ terium tuberculosis–specific antigens in vitro, have limited sensitivity in immunodeficient hosts. Neither PPD testing nor IGRAs can distin­ guish latent from active infection. Serologic Lyme disease diagnostics suffer similar limitations: in patients from endemic regions, the pres­ ence of IgG antibodies to Borrelia burgdorferi may reflect prior expo­ sure rather than active disease, while IgM antibodies are imperfectly sensitive and specific (50% and 80%, respectively, in early disease). The complicated nature of these tests, particularly in view of the nonspe­ cific symptoms that may accompany Lyme disease, has had substantial implications for public perception of Lyme disease and antibiotic mis­ use in endemic areas. Similarly, syphilis, a chronic infection caused by Treponema pallidum, is notoriously difficult to stage by serology alone, requiring multiple different nontreponemal and treponemal tests (e.g., rapid protein reagin and fluorescent treponemal antibody, respectively)

in conjunction with clinical suspicion. Complementing serology, anti­ gen detection can improve sensitivity and specificity in select cases but has been validated only for a limited set of infections. Typically, structural elements of pathogens are detected, including components of viral envelopes (e.g., hepatitis B surface antigen, HIV p24 antigen, SARS-CoV-2 spike protein), cell surface markers in certain bacteria (e.g., Streptococcus pneumoniae, Legionella pneumophila serotype 1) or fungi (e.g., Cryptococcus, Histoplasma), and less specific fungal cellwall components such as galactomannan and β-glucan (e.g., Aspergillus and other fungi). Given the impracticality of culture and the lack of sensitivity or sufficient clinical information afforded by serologic and antigenic methods, the push toward nucleic acid–based diagnostics originated in pursuit of viruses and fastidious bacteria, becoming part of the standard of care for select organisms in U.S. hospitals. Such tests, including polymerase chain reaction (PCR) and other nucleic acid amplification tests (NAATs), are now widely used for many viral infec­ tions, both chronic (e.g., HIV infection, hepatitis C) and acute (e.g., influenza, SARS-CoV-2, respiratory syncytial virus). NAATs provide essential information about both the initial diagnosis and the response to therapy and, in some cases, genotypically predict drug resistance. Indeed, progression from antigen detection to PCR transformed our understanding of the natural course of HIV infection, with profound implications for treatment (Fig. 126-1A). In the early years of the AIDS pandemic, p24 antigenemia was detected in acute HIV infection but then disappeared for years before emerging again with progression to AIDS (Fig. 126-1B). Without a marker demonstrating viremia, the role of treatment during HIV infection prior to the development of clinical AIDS was uncertain, and assessing treatment efficacy was challeng­ ing. With the emergence of PCR as a progressively more sensitive test (now able to detect as few as 20 copies of virus per milliliter of blood), viremia was recognized as a near-universal feature of HIV infection. Given the challenges of phenotypic assays, genotypic antiviral resis­ tance testing was also adopted early for HIV and is now the standard of care before the initiation of therapy in developed countries. These developments have been transformative in guiding therapy in early disease and, together with the development of less toxic therapies, have helped to shape policy that is moving toward ever-earlier introduction of antiretroviral therapy in HIV infection. Reverse transcriptase PCR (RT-PCR) assays are the core method for detecting SARS-CoV-2 virus in the acute phase, forming a critical component of the clinical and public health response to COVID-19, just as they were on a smaller scale for the related coronaviruses SARS-CoV and Middle East respi­ ratory syndrome (MERS)-CoV. Tests for SARS-CoV-2 represent the largest implementation of a molecular infectious disease assay to date and play a critical role in both clinical diagnostics and public health measures to contain the COVID-19 pandemic. As they are for viral testing, nucleic acid–based tests have become the diagnostic tests of choice for fastidious bacteria, including the common sexually transmitted bacterial pathogens Neisseria gonor­ rhoeae and Chlamydia trachomatis as well as the tick-borne Ehrlichia chaffeensis and Anaplasma phagocytophilum. More recently, nucleic acid amplification–based detection has offered improved sensitivity for diagnosis of the important nosocomial pathogen Clostridioides difficile, and NAATs have provided clinically relevant information on the pres­ ence of cytotoxins A and B as well as molecular markers of hyperviru­ lence, such as the North American pulsotype 1 (NAP1) strain that is enriched in severe illness. The importance of genomics in selecting loci for diagnostic assays and in monitoring test sensitivity was highlighted by the emergence in Sweden of a newly recognized variant of C. tracho­ matis with a deletion that includes the gene targeted by a set of com­ mercial NAATs. By evading detection through this deletion (and thus avoiding treatment), this strain came to be highly prevalent in some areas of Sweden. While nucleic acid–based tests remain the diagnostic approach of choice for fastidious bacteria, this example of “diagnostic escape” serves as a reminder of the need for careful development and ongoing monitoring of molecular diagnostics. In contrast, for typical bacterial pathogens for which culture meth­ ods are well established, growth-based assays still dominate in the

TABLE 126-2  Selected Clinical Applications of Infectious Disease Genomics APPLICATION TECHNOLOGY NOTES/EXAMPLES Organism Identification Viral detection PCR, RT-PCR Identification of HIV, HBV, HCV, respiratory viruses including SARS-CoV-2 and influenza, and others for diagnosis and response to therapy TB detection PCR Amplification of the rpoB gene for species-specific identification of Mycobacterium tuberculosis Pathogen detection PCR, RT-PCR, NAAT Multiplexed identification of dozens of viruses, bacteria, yeasts, and parasites from a variety of clinical specimens Bacterial detection 16S ribosomal gene sequencing Targeted amplification and sequencing of regions of the 16S rRNA gene for identification of suspected bacterial infections undiagnosed by conventional methods Pathogen detection Cell-free DNA sequencing Unbiased amplification and sequencing of cell-free nucleic acid from blood, with analytical comparison of resulting nonhuman sequences to genomes of known pathogens and contaminants, in order to identify circulating pathogen DNA; anecdotal clinical use to establish etiology of systemic or focal infection, though clinical utility and optimal use cases still evolving Pathogen Discovery Bacterial pathogens Sequencing, metagenomic assembly Unbiased “shotgun” sequencing of isolated nucleic acid from patient samples to identify associated pathogens; proofs-of-concept: new Bradyrhizobium species associated with cord colitis; Escherichia coli O104:H4 from 2011 diarrheal outbreak in Germany; Leptospira species from one patient’s cerebrospinal fluid; research use only at this time Viral pathogens Microarray, sequencing Hybridization of clinical samples to microarrays from phylogenetically diverse known viruses identified the first SARS coronavirus and others. Direct sequencing has identified SARS-CoV-2, West Nile virus, and MERS-CoV, among others. Use is primarily in research. Antibiotic Resistance MRSA detection PCR Detection of the mecA gene, the genotypic cause of methicillin resistance in Staphylococcus aureus VRE detection PCR Detection of the vanA or vanB gene, the main genotypic causes of vancomycin resistance in Enterococcus MDR-TB detection PCR, NAAT Detection of polymorphisms in the rpoB gene from M. tuberculosis, which account for 95% of rifampin resistance. Other probes available for inhA and katG genes can detect up to 85% of isoniazid resistance. Carbapenemase detection PCR Detection of genes encoding one of several types of enzymes (KPC, NDM, OXA-48, IMP, VIM) that hydrolyze carbapenems, accounting for much but not all carbapenemase resistance in Enterobacteriaceae HIV resistance detection Targeted sequencing Targeted sequencing of specific genes with known resistance-conferring mutations; now the standard of care prior to initial therapy in the United States and Europe Epidemiology Outbreak and epidemic tracking Sequencing Application to tracking outbreaks and epidemics on local and international scales, including spread of carbapenemase-producing Klebsiella, S. aureus, M. tuberculosis, E. coli, Vibrio cholerae, Ebola virus, Zika virus, and influenza virus Evolution and spread of pathogens Sequencing Sequencing collections of pathogens from individual patients or environmental reservoirs such as wastewater to shed light on pathogen dissemination, virulence factors, and antibiotic resistance determinants; innumerable examples, including V. cholerae, influenza virus, Ebola virus, Zika virus, and SARS-CoV-2 Abbreviations: HBV, hepatitis B virus; HCV, hepatitis C virus; MDR, multidrug-resistant; MERS, Middle East respiratory syndrome; MRSA, methicillin-resistant S. aureus; NAAT, nucleic acid amplification test; PCR, polymerase chain reaction; RT, reverse transcriptase; SARS, severe acute respiratory syndrome; TB, tuberculosis; VRE, vancomycin-resistant enterococci. clinical laboratory. Informed by decades of clinical microbiology, these tests have served clinicians well, yet the limitations of growth-based tests—in particular, the delays associated with waiting for growth— have left opportunities for improvements. Driven by this need, mass spectrometry–based assays that offer highly accurate organism iden­ tification within a few hours of a positive blood culture are widely adopted in clinical microbiology laboratories, largely supplanting biochemical tests for organism identification in well-resourced areas. Looking ahead, molecular diagnostics, greatly informed by the vast quantity of microbial genome sequences generated in recent years, offers a way forward. First, sequencing studies can readily identify key genes (or noncoding nucleic acids) that can be developed into targets for clinical assays using PCR or hybridization assay platforms. Second, sequencing itself is becoming cheap and rapid enough to be performed on clinical specimens in certain cases, with consequent unbiased detec­ tion or characterization of pathogens. One of the biggest drivers for the implementation of novel molecu­ lar technologies in the diagnosis of infectious diseases is the desire for more rapid—or even real-time—pathogen identification, ideally with antibiotic susceptibility information on those microbes for which resistance to the current anti-infective armamentarium is of concern. Such real-time tests have the potential to transform infectious disease management, impacting antibiotic stewardship in the outpatient set­ ting, mortality risk in the critically ill (i.e., patients in whom early

CHAPTER 126 Microbial Genomics and Infectious Disease administration of effective antibiotics is the most significant factor in decreasing mortality risk), hospital admission, and length of hospital stay; the extent of this impact will depend on the economic forces that will help define the breadth of their deployment. On the public health level, such tests will likely play a role in improving antibiotic stewardship, thereby influencing the rise of antibiotic resistance and enabling surveillance of outbreaks by local, national, and international networks. In the United States and the United Kingdom, for example, public health agencies use genome sequencing to track food-borne pathogens and identify outbreaks and have rapidly expanded the rou­ tine use of genomics in identifying and characterizing other pathogens, from mycobacteria (both M. tuberculosis and nontuberculous myco­ bacteria) to N. gonorrhoeae. Further, international efforts to track the spread of viral diseases, particularly the vast efforts to sequence SARSCoV-2 and monitor the emergence and spread of its variants, as well as recent work on mpox, Ebola, and Zika outbreaks and ongoing work on seasonal influenza, offer opportunities for improving interven­ tions, surveillance, and prevention efforts, ranging from more accurate selection of the strains to include in vaccine development to improved design of trials to evaluate novel vaccines and therapies. Technological innovations are lowering several critical barriers to the widespread adoption of genomics and other molecular methods. Specifically, for NAAT, the need for rapid thermal cycling and coldchain storage for reagents has significantly impeded implementation

GENERAL MILESTONES HIV genome sequenced First AIDS case series published First clinical cases worldwide HIV first isolated AZT (NRTI) approved Saquinavir (PI) approved

Phenotypic resistance testing available HIV antibody test approved p24 antigen test approved A DIAGNOSTIC MILESTONES AIDS Acute HIV Chronic HIV PART 5 Infectious Diseases Relative level years weeks B FIGURE 126-1  A. Timeline of select milestones in HIV management. Genomic advances are shown in bold type. The approvals and recommendations indicated apply to the United States. ARV, antiretroviral; AZT, zidovudine; NRTI, nucleoside reverse transcriptase (RT) inhibitor; NNRTI, nonnucleoside RT inhibitor; PI, protease inhibitor. B. Viral dynamics in the natural history of HIV infection. Three diagnostic markers are shown: HIV antibody (Ab), p24 antigen (p24), and viral load (VL). Dashed gray line represents limit of detection. (Adapted from data in HH Fiebig et al: Dynamics of HIV viremia and antibody seroconversion in plasma donors: Implications for diagnosis and staging of primary HIV infection. AIDS 17:1871, 2003.) in resource-limited settings. Recent efforts aim to overcome these chal­ lenges by developing isothermal amplification protocols and lyophilized reagents that do not require refrigeration or sophisticated instrumenta­ tion. For clinical sequencing, (1) the cost and speed of sequencing and analysis methods continue to fall precipitously; (2) automation and miniaturization of the preparation of a sample for sequencing prom­ ise to reduce cost and minimize the expertise needed; and (3) direct sequencing technologies that eliminate the complex molecular biology required to prepare clinical samples for sequencing are improving in accuracy and robustness. Further barriers exist, including the need for careful processing of clinical samples to minimize contamination, and standardized pipelines to process data and present clinicians with easily interpretable and readily actionable results. However, as these advances give rise to rapid, accurate diagnostic tests, the ultimate goal is to inform a clinician in real time whether antibiotics are indicated and, if so, which will be effective. Real-time diagnostics will allow more efficient deployment of our precious antibiotic arsenal, thus improving both societal and patient-specific outcomes in much the same way that a rapid, sensitive troponin assay has transformed bedside management of chest pain.

Nevirapine (NNRTI) approved First once-a-day combination ARV approved Frontier: clinical impact of rare sequence variants HIV genotype recommended before ARV start HIV viral load test approved HIV genotypic resistance testing approved VL Ab p24 Time ■ ■ORGANISM IDENTIFICATION To adapt nucleic acid detection to diagnostic tests and thus to identify pathogens on a wide scale, sequences must be found that are conserved enough within a species to identify the diversity of strains that may be encountered in various clinical settings, but divergent enough to distinguish one species from another. Until recently, this problem has been solved for bacteria by targeting the element of a bacterial genome that is most highly conserved within a species, the 16S ribosomal RNA (rRNA) subunit. Among many examples, this method has now been used to confirm Mycobacterium chimaera infections in several patients after cardiothoracic surgery, leading ultimately to recognition of a widespread outbreak. At present, 16S PCR amplification from tis­ sue specimens can be performed by specialty laboratories, though its sensitivity and clinical utility to date have remained somewhat limited, in part because of the scarcity and relative fragility of pathogen nucleic acid in the sampled tissue, which necessitates reliable, sensitive nucleic acid amplification. As such barriers are reduced through technological advances and as the causes of culture-negative infection are clarified (perhaps in part through sequencing efforts), these tests may become both more accessible and more helpful.

With the wealth of sequencing data now available, other regions beyond 16S rRNA can be targeted for bacterial species identification. These other genomic loci can provide additional information about a clinical isolate that is relevant to patient management. For instance, detection of the presence, or potentially even the expression, of toxin genes such as C. difficile toxins A and B or Shiga toxin can provide clinicians with additional information that will help distinguish com­ mensals or colonizing bacteria from pathogens and thus aid in prog­ nostication and management as well as in diagnosis. Beyond bacteria, one commonly used approach to PCR-based pathogen detection is so-called “syndromic panels” of multiplexed PCR to identify common causes of clinical infection syndromes, including upper respiratory infection, gastroenteritis, and meningoencephalitis. The most frequently deployed syndromic panel is the respiratory viral panel, which typically includes primer sets targeting a combination of influenza, parainfluenza, respiratory syncytial virus, adenovirus, rhinovirus, enterovirus, metapneumovirus, and common-cold corona­ viruses, sometimes in conjunction with unculturable bacteria such as Mycoplasma, Chlamydophila, and Bordetella species. With the advent of the COVID-19 pandemic, SARS-CoV-2 was quickly added to this mul­ tiplex PCR panel within months of its development as a stand-alone PCR test. The goal of such panels is to capture common infectious causes of these syndromes in a single, standardized diagnostic test, ideally streamlining the diagnostic evaluation. The ready identification of a plausible etiologic agent may offer diagnostic clarity if judiciously used and carefully considered in the clinical context of each patient. The widespread adoption of SARS-CoV-2 PCR testing in the earli­ est days of the COVID-19 pandemic underscored the crucial role of precise etiologic pathogen diagnosis in patient care, triage, infection control, and epidemiology. One challenge with PCR-based assays is the relative complexity of the molecular biology and consequent need for advanced technol­ ogy for implementation, including instruments and reagents. Several recent approaches have advanced the molecular biology of nucleic acid detection with the aim of increasing deployability of NAATs for use in resource-limited or even field settings. These methods couple nucleic acid detection to an enzymatic readout, enabling catalytic signal amplification. Several such approaches build on the intrinsic sensitivity, specificity, and amplification of the CRISPR (clustered, regularly interspaced short palindromic repeats) effectors Cas12a and Cas13a as nucleic acid sensors. Distinct from the famous gene editing CRISPR effector Cas9, these robust and versatile enzymes recognize short nucleic acid targets with high specificity and transduce this bind­ ing event into “collateral cleavage” of nearby nucleic acids that can be engineered to create a signal using fluorescent reporter constructs. Crucially, this biotechnology can be made to work in conjunction with isothermal enzymatic preamplification steps to achieve remarkable sensitivity, all robustly enough to withstand lyophilization that enables long-term storage before being reconstituted in the field. Such assays are still in the early stages of development, but they have shown prom­ ise and could play a critical role in global diagnostics and surveillance. While amplification tests such as PCR and other NAATs exemplify one approach to nucleic acid detection, other approaches exist, includ­ ing detection by hybridization. Although not currently used in the clinical realm, techniques for multiplexed detection and identifica­ tion of pathogens by hybridization to microarrays or in solution are being developed for other purposes. Of note, these different detection techniques require different degrees of conservation. Highly sensi­ tive amplification methods require a high degree of sequence identity between PCR primer pairs and their short, specific target sequences; even a single base-pair mismatch (particularly near the 3′ end of the primer) may interfere with detection. In contrast, hybridizationbased tests are more tolerant of mismatch and thus can be used to detect important regions that may be less precisely conserved within a species, thus potentially allowing detection of clinical isolates from a given species with greater diversity between isolates. Such assays take advantage of the predictable binding interactions of nucleic acids and do not require enzymology, broadening the range of conditions under which such assays are feasible, including directly on primary clinical

specimens. The applicability of hybridization-based methods toward either DNA or RNA opens the possibility of expression profiling, which can uncover phenotypic information from nucleic acid content.

Both PCR and hybridization methods target specific, known organ­ isms. At the other extreme, as sequencing costs decline, metagenomic sequencing from patient samples is increasingly feasible. This shot­ gun sequencing approach is unbiased—i.e., can detect any microbial sequence, however divergent or unexpected. In one example, a clinical sample of cerebrospinal fluid from an immunocompromised patient with signs and symptoms of chronic meningitis was found through metagenomic sequencing and analysis to contain small amounts of Leptospira DNA. In light of this information, retrospective PCR testing confirmed the diagnosis of neuroleptospirosis, which had been missed prior to the sequencing result. The patient was treated with penicillin G and clinically recovered. Increasingly, efforts are under way to bring whole genome sequencing to other clinical samples, including sputum and blood, to more readily identify pathogens. One such assay certified for clinical use in the United States—a shotgun metagenomic sequenc­ ing approach applied to cell-free DNA circulating in the bloodstream that aims to identify pathogens both in blood and other body sites—is gaining adoption in cases where traditional culture-based methods fail to identify a pathogen in patients with symptoms consistent with infection. This new approach brings its own set of challenges, however, including the need to recognize pathogenic sequences against a back­ ground of expected host and commensal sequences and to distinguish true pathogens from either colonizers or laboratory contaminants. The burgeoning field of microbiome research is driving technology devel­ opment for sequencing and analyzing complex microbial communities. Lessons from this field will inform diagnostic efforts. CHAPTER 126 ■ ■PATHOGEN DISCOVERY In addition to clinical diagnostic applications, novel genomic technolo­ gies, including whole genome sequencing, are being applied to clinical research specimens with a goal of identifying new pathogens in a vari­ ety of circumstances. The tremendous sensitivity and unbiased nature of sequencing is also ideal in searching clinical samples for unknown or unsuspected pathogens. Microbial Genomics and Infectious Disease Causal inference in infectious diseases has progressed since the time of Koch, whose historical postulates provided a rigorous framework for attributing a disease to a microorganism. To modernize Koch’s postu­ lates, an organism, whether it can be cultured or not, should induce disease upon introduction into a healthy host if it is to be implicated as a causative pathogen. Current sequencing technologies are ideal for advancing this modern version of Koch’s postulates because they can identify candidate causal pathogens with unprecedented sensitivity and in an unbiased way, unencumbered by limitations such as cultur­ ability. Yet, as direct sequencing on primary patient samples greatly expands our ability to recognize associations between microbes and disease states, critical thinking and experimentation will remain vital in establishing causality. Virus discovery in particular has been greatly facilitated by new nucleic acid technology. These frontiers were first notably explored with high-density microarrays containing spatially arrayed sequences from a phylogenetically diverse collection of viruses. Despite bias toward those with homology to known viruses, novel viruses in clini­ cal samples were successfully identified on the basis of their ability to hybridize to these prespecified sequences. This methodology famously contributed to identification of the coronavirus causing severe acute respiratory syndrome (SARS). Once discovered, the SARS coronavirus was rapidly sequenced: the full genome was assembled in April 2003, <6 months after recognition of the first case. With the advent of next-generation sequencing, unbiased pathogen discovery is now addressed through a process known as metagenomic assembly (Fig. 126-2), largely supplanting other methods. Sequences of random nucleotide fragments can be generated from clinical speci­ mens with no a priori knowledge of pathogen identity through a pro­ cess called shotgun sequencing. This collection of sequences can then be computationally aligned to host (i.e., human) sequences, with aligned sequences removed and remaining sequences compared with other

DNA extraction host +/– microbial DNA high-throughput sequencing clinical specimen + + de novo assembly of unmapped reads “novel” microbe phylogenetic comparison to known genomes aligned reads genome fragments (“contigs”) FIGURE 126-2  Workflow of metagenomic assembly for pathogen discovery. DNA is isolated from a specimen of interest (e.g., tissue, body fluid) containing a mixture of host DNA and nucleic acids from coexisting microbes, either commensal or pathogenic. All DNA (and RNA, if a reverse transcription step is added) is then sequenced, yielding a mixture of DNA sequence fragments (“reads”) from the organisms present. Except for reads that do not align (“map”) to any known sequence, these reads are aligned to existing reference genomes for the host or any known microbes. The unmapped reads are computationally assembled de novo into the largest contiguous stretches of DNA possible (“contigs”), representing fragments of previously unsequenced genomes. These genome fragments (contigs) are then mapped onto a phylogenetic tree based on their sequence. Some may represent known but as-yet-unsequenced organisms, while others will represent novel species. (Figure prepared with valuable input from Dr. Ami S. Bhatt, personal communication.) known genomes to detect the presence of known microorganisms. Sequence fragments that remain unaligned suggest the presence of an additional organism that cannot be matched to a known, characterized genome; these reads can be assembled into contiguous nucleic acid stretches that can be compared with known sequences to construct the genome of a potentially novel organism. Assembled genomes (or parts of genomes) can then be compared with known genomes to infer the phylogeny of new organisms and identify related classes or traits. Thus, not only can this process identify unanticipated pathogens, but it can even identify undiscovered organisms. PART 5 Infectious Diseases The emergence of COVID-19 provides a dramatic example that illustrates advances in pathogen discovery technology in the interven­ ing 16 years since SARS-CoV was discovered: the causal coronavirus, SARS-CoV-2, was identified through metagenomic sequencing within about 1 month of the first known case and just weeks after the outbreak was first recognized. Sequencing and assembly were completed within 5 days of the discovery of the new virus, and a NAAT was released 1 day later. Given the ensuing ravages of COVID-19 and the cost of delays of even a few weeks in implementing this new diagnostic test in some locations, it is sobering to contemplate the added harm had this outbreak occurred even a decade earlier. This timeline illustrates the advancing power and speed of new diagnostic technologies but also underscores the pressing need for continued progress. As metagenomic sequencing and assembly techniques become more robust, this technology holds great promise for identifying micro­ organisms that are associated with clinical conditions of unknown etiology. Conventional methods already have unexpectedly linked numerous conditions with specific agents of infection—e.g., cervi­ cal and oropharyngeal cancers with human papillomavirus (HPV), Kaposi’s sarcoma with human herpesvirus 8, and certain lymphomas and, more recently, multiple sclerosis with Epstein-Barr virus. Recently, Zika virus, first described in the 1940s, was found to be increasing in incidence as a cause of febrile syndromes, particularly in Central and South America. A concurrent increase in the incidence of micro­ cephaly was noted that temporally and geographically matched the Zika epidemics. Zika was suspected to be neurotropic because of a previously recognized association with Guillain-Barré syndrome, but the strongest link between Zika virus and microcephaly came when the virus itself was detected by both quantitative reverse transcription PCR (RT-qPCR) and whole genome sequencing in postmortem fetal brain tissue from microcephalic infants. An argument for causality was built

taxonomic assignment of reads mixed reads microbe 1 microbe 2 host unmapped on the foundation of epidemiologic evidence and direct viral detec­ tion, both of which were built on nucleic acid detection and genome sequencing. Sequencing techniques offer unprecedented sensitivity and specificity for identifying foreign nucleic acid sequences that may suggest other such pathogen-associated conditions—from malignan­ cies to inflammatory conditions to unexplained fevers or other clini­ cal syndromes—associated with organisms from viruses to bacteria to parasites. Caution is needed, though: in the absence of the ability to ful­ fill Koch’s postulates, sequence-based identification of a microbe from patient specimens is not, on its own, sufficient to ascribe pathogenicity. The increasing sensitivity of these methods warrants greater rigor and care in defining what is “noise” and what represents a pathogen. As sequencing-based discovery expands, microbes may be found to be associated with conditions not classically thought of as infectious, such as the link between maternal Zika virus infection and fetal micro­ cephaly. Studies of bowel flora in laboratory animals and even humans already suggest correlations between microbe composition and various aspects of metabolic and cardiovascular health. Improved methods for pathogen detection will continue to uncover unexpected correla­ tions between microbes and disease states, but the mere presence of a microbe does not establish causality. Fortunately, once the relatively laborious and computationally intensive metagenomic sequencing and assembly efforts have identified a pathogen, further detection can more easily be undertaken with targeted methods such as PCR or hybridiza­ tion, which may be more scalable and amenable to in situ confirma­ tion. This capacity should facilitate the additional careful investigation that will be required to progress beyond correlation and to draw causal inference. ■ ■ANTIBIOTIC RESISTANCE At present, antibiotic resistance in bacteria and fungi is conventionally determined by isolating a single colony from a cultured clinical speci­ men and testing its growth in the presence of drug. The requirement for multiple growth steps in these conventional assays has several con­ sequences. First, only culturable pathogens can be readily processed. Second, this process requires considerable infrastructure to support the sterile environment needed for culture-based testing of diverse organisms. Finally, and perhaps most significantly, even the fastestgrowing organisms require 1–2 days of processing for identification and 2–3 days for determination of susceptibilities. Some slow-growing organisms take even longer; for instance, weeks must pass before

drug-resistant M. tuberculosis can be identified by growth phenotype. Given the clinical imperative in serious illness to begin effective therapy early, this inherent delay in susceptibility determination has obvious implications for empirical antibiotic use: broad-spectrum antibiotics often must be chosen up front in situations where it is later shown that preferred narrower-spectrum drugs would have been effective or even that no antibiotics were appropriate (i.e., in viral infections). Even with this strategy, as resistant organisms become more common, the empiri­ cal choice can be incorrect, often with devastating consequences. Realtime identification of the infecting organism and information on its susceptibility profile would guide initial therapy and support judicious antibiotic use, ideally improving patient outcomes while aiding in the ever-escalating fight against antibiotic resistance by reserving the use of broad-spectrum agents for cases in which they are truly needed. Molecular diagnostics and sequencing offer a way to accelerate detection of a pathogen’s antibiotic susceptibility profile. If a genotype that confers resistance can be identified, this genotype can be targeted for molecular detection. In infectious disease, this approach has most comprehensively come to fruition for HIV (Fig. 126-1A). (In a concep­ tually parallel application of genomic analysis, molecular detection of certain resistance determinants in cancers informs selection of targeted chemotherapy.) Extensive sequencing of HIV strains and correla­ tions drawn between viral genotypes and phenotypic resistance have delineated the majority of mutations in key HIV genes, such as reverse transcriptase, protease, and integrase, that confer resistance to the antiret­ roviral agents that target these proteins. For instance, the single amino acid substitution K103N in the HIV reverse transcriptase gene predicts resistance to the first-line nonnucleoside reverse transcriptase inhibi­ tor efavirenz, and its detection informs a clinician to choose a different agent. The effects of these common mutations on HIV susceptibility to various drugs—as well as on viral fitness—are curated in publicly avail­ able databases. Thus, genotypes are now routinely used to predict drug resistance in HIV, as phenotypic resistance assays are far more cumber­ some than targeted sequencing. Though it was not implemented at the level of individual patients, sequencing-based detection of circulating SARS-CoV-2 variants predicted susceptibility to monoclonal antibod­ ies due to changes in the spike protein and, thus, informed decisions made by the U.S. Food and Drug Administration (FDA) on granting and rescinding approval for such COVID-19 therapies as the pandemic progressed. As new therapies are introduced, this targeted sequenc­ ing–based approach to drug resistance will likely prove important in other viral infections. The challenge of predicting drug susceptibility from genotype is more daunting for bacteria than for HIV, yet considerable progress has been made toward sequencing-based determination of bacterial antibiotic susceptibility. Bacteria have far more complex genomes than viruses, with thousands of genes on their chromosomes (many of which can functionally interact in ways that escape a priori predic­ tion) and the capacity to acquire many more through horizontal gene transfer of plasmids and mobile genetic elements within and between species. Thus, the task of comprehensively defining all possible genetic resistance mechanisms is orders of magnitude more complex in bac­ teria than in viruses, which typically have far more limited genomes. Despite these challenges, considerable progress has been made in recent years. In select cases where biological factors appear to have con­ strained the genotypic basis for resistance to a small, well-defined set of mutations, genotypic assays for antibiotic resistance are already being integrated into clinical practice. One important example is the detec­ tion of methicillin-resistant Staphylococcus aureus (MRSA). S. aureus is one of the most common and serious bacterial pathogens of humans, particularly in health care settings. Resistance to methicillin—the most effective class of antistaphylococcal antibiotics—is common, even in community-acquired strains. Vancomycin—an alternative drug to methicillin—is effective against MRSA but is measurably inferior to methicillin against methicillin-susceptible S. aureus (MSSA). Analysis of clinical MRSA isolates has demonstrated that the molecular basis for resistance to methicillin in essentially all cases stems from the expres­ sion of an alternative penicillin-binding protein (PBP2A) encoded by the gene mecA, which is found within a transferable genetic element

called mec. This mobile cassette has spread rapidly through the S. aureus population via horizontal gene transfer and selection from widespread antibiotic use. Because methicillin resistance is essentially always due to the presence of the mecA gene, MRSA is particularly amenable to molecular detection. A PCR test for the mecA gene, which saves hours to days compared with standard culture-based methods, has been approved by the FDA to augment (but not replace) traditional culturebased susceptibility testing. Similar to MRSA, vancomycin-resistant enterococci (VRE) harbor one of a limited number of van genes found to be responsible for resistance to this important antibiotic, which occurs through alteration of the mechanism for cell wall cross-linking that vancomycin inhibits. Detection of one of these genes by PCR indicates resistance. More recently, multiplexed PCR assays targeting carbapenemase genes (those encoding the KPC, NDM, OXA-48, IMP1, and VIM carbapenemases), which are responsible for a significant fraction of carbapenem resistance (though not all instances), can predict some resistance to this crucial antibiotic class, though they are not comprehensive enough to confirm susceptibility if absent. Despite these caveats, such assays to detect even a limited set of high-value resistance genes are gaining use in high-resourced settings. Finally, a PCR assay targeting the highly conserved RNA polymerase gene serves not only to detect M. tuberculosis directly in sputum samples but also to detect resistance to rifampin, since the determinants of resistance to this RNA polymerase inhibitor map almost exclusively to a short region of this gene. Since rifampin resistance is epidemiologically associated with, though not causal for, multidrug resistance, this assay identifies strains at high risk for multidrug resistance, enhancing its value. This test has transformed tuberculosis testing where available, improving sensitivity and providing a limited measure of susceptibility testing to guide therapy.

CHAPTER 126 Although identification and rapid detection of monogenic resis­ tance determinants have improved, bacteria typically evolve multiple, diverse resistance mechanisms to most antibiotics; thus, outside of these edge cases, resistance prediction often requires probing for and integration of multiple genetic lesions, targets, or mechanisms. For instance, at least five distinct modes of resistance to fluoroquinolones are known: reduced import, increased efflux, target site mutation, drug modification, and shielding of the target sites by expression of another protein. These mechanisms are typically present in combi­ nation in clinically resistant isolates; thus, the problem of detecting genetic resistance is often a combinatorial one. In another clinically important example, while carbapenem resistance in Enterobacteriaceae is often explained by the presence of carbapenemases, resistance may also develop when other, less broad-spectrum β-lactamases are found in combination with porin mutations or upregulated efflux pumps. These more complex mechanisms prevent PCR-based carbapenemase detection assays from identifying other mechanisms of carbapenem resistance. Additionally, plasmids and transposable elements, which often are enriched for antibiotic resistance determinants, may be more technically and analytically challenging to sequence, although newer long-read sequencing technologies are beginning to address these challenges. To further complicate genetic prediction, changes in gene expression (which may be detectable through mutations in promoter regions or regulatory genes without coding mutations in known resis­ tance determinants) and even gene copy number (which may occur without changes in primary sequence) of resistance determinants play critical roles in some cases of genetic resistance. Thus, while predicting resistance when determinants are found is rapidly becoming feasible, the more clinically relevant task of predicting susceptibility when no known resistance determinants are found remains more difficult. Microbial Genomics and Infectious Disease To build on early successes with the goal of advancing beyond binary detection of monogenic resistance determinants, the ultimate frontier for genetic prediction of bacterial antibiotic resistance lies in more comprehensive prediction of a resistance phenotype from sequence information—a task similar to HIV resistance prediction. Yet there is no comprehensive compendium of genetic elements conferring resistance and their pairwise and higher-order interactions with each other and with the genetic background of bacterial pathogens. Non­ viral genomes are much larger than viral ones, and their abundance

and diversity are such that thousands of genetic differences often exist between clinical isolates of the same species, of which perhaps only one or a few may contribute to resistance. In addition, new mechanisms may emerge in the face of antibiotic deployment or with the release of new drugs, and genetic prediction of resistance will inevitably lag behind the emergence of unforeseen mechanisms. While confident prediction of bacterial antibiotic resistance from sequencing determinants may therefore seem daunting, the vast expansion of microbial sequencing capacity, combined with analytic methods such as microbial genomewide association studies and machine learning algorithms, offers powerful analytical approaches to this “needle in a haystack” problem and has permitted remarkable advances in the predictive power of sequence determinants to date. Particularly in M. tuberculosis, where horizontal gene transfer is minimal and the pathogen is essentially restricted to human hosts to facilitate more representative sampling, a remarkable fraction of phenotypic resistance can be explained by known genetic determinants. Because of these biologic advantages, as well as the slow and laborious growth process that impedes traditional phenotypic assessment, whole genome sequencing has proven quite effective at predicting susceptibility profiles in this organism, to the point that the United Kingdom now routinely performs whole genome sequencing in parallel with phenotypic antibiotic susceptibility testing for M. tuberculosis in what some hope will be a precursor to fully whole genome sequencing–based antibiotic susceptibility testing. Even in more highly variable pathogens, with sequencing of sufficient numbers of susceptible and resistant pathogens, and more sophisticated analyti­ cal algorithms, sequence-based prediction methods are improving in predictive accuracy, at least within the geographic region from which the test samples have been sequenced.

PART 5 Infectious Diseases It is important to note that genotype-based analytical methods largely identify correlates, not necessarily surrogates or determinants, of resistance. In HIV diagnostics, surrogates (i.e., causal determinants of resistance) were found to be more reliable predictors than mere correlates in expanding sequencing-based resistance prediction to the general population. Without a mechanistic understanding of genetic resistance, a correlative relationship may be lineage-specific and less generalizable. Especially with multiple possible mechanisms of resis­ tance to a given antibiotic and ongoing evolutionary pressure resulting in the development and acquisition of new modes of resistance, a geno­ typic approach to diagnosing antibiotic resistance is likely to remain challenging and to require ongoing vigilance in constantly correlating genotypic with more traditional phenotypic methods. An important corollary benefit of a genomic approach to resistance prediction, anchored in phenotypic validation, could be the systematic identifica­ tion of outliers with unexplained resistance. These strains can form the basis for understanding newly emerging resistance mechanisms, which can in turn inform new drug development endeavors. Under­ standing resistance mechanisms may also help direct infection control efforts. For instance, the first identification of the mcr-1 (mobilized colistin resistance) gene on a plasmid, together with other antibiotic resistance determinants, heightened concern about colistin-resistant Enterobacterales identified first in China and later elsewhere because it implied transferrable multidrug resistance. Early recognition of these potentially dangerous strains elucidated the immediate need for strict containment protocols. Furthermore, metagenomic studies have high­ lighted the silent carriage of resistance genes in the microbiomes of returning travelers and their subsequent spread to household contacts. In parallel with advancing sequencing technologies, progress in computational techniques, bioinformatics and statistics, and data stor­ age, as well as experimental confirmatory testing of hypotheses, will be needed to advance toward the ambitious goal of a comprehensive compendium of global antibiotic resistance determinants. Open shar­ ing and careful curation of new sequence information will be of para­ mount importance, as will iterative or even continuous comparison of predictions with ongoing phenotypic testing to assess performance and allow prediction algorithms to keep up with newly evolving or emerg­ ing resistance mechanisms. We continuously observe the accumulation of new or unanticipated modes of resistance from ongoing evolutionary pressure caused by the

widespread clinical use of antibiotics. Even with MRSA, perhaps the best-studied case of antibiotic resistance and a model of relative sim­ plicity with a single known monogenic resistance determinant (mecA), a genotype-based approach to resistance detection proved imperfect. One limitation was a recall of the initial commercial genotypic resis­ tance assay that was deployed for the identification of MRSA. A clinical isolate of S. aureus that emerged in Belgium expressed a variant of the mec cassette not detected by the assay’s PCR primers. New primers were added to detect this new variant, and the assay was reapproved for use. This example illustrates the need for ongoing monitoring of any genotypic resistance assay. A second limitation is that a contradiction can occur between genotypic and phenotypic evidence for resistance. Up to 5% of MSSA strains have been reported to carry a copy of the mecA gene that is either nonfunctional or not expressed. Thus, the erroneous identification of these strains as MRSA by genotypic detec­ tion would lead to administration of the inferior antibiotic vancomycin rather than the preferred β-lactam therapy. These examples illustrate one of the prime challenges of mov­ ing beyond growth-based assays: genotype is merely a proxy for the resistance phenotype that directly informs patient care. Alternative approaches currently under development attempt to circumvent the limitations of genotypic resistance testing by returning to phenotypic assays, albeit more rapid ones. One such approach is informed by genomic methods: transcriptional profiles serve as a rapid phenotypic signature for antibiotic response. Conceptually, since dying cells are transcriptionally distinct from cells fated to survive, susceptible bac­ teria enact different transcriptional profiles after antibiotic exposure than resistant ones, independent of the mechanism of resistance. These differences can be measured and, since transcription is one of the most rapid responses to cell stress (minutes to hours), can be used to determine whether cells are resistant or susceptible much more rap­ idly than is possible if growth in the presence of antibiotics is awaited (days). Like DNA, RNA can be readily detected through predictable rules governing base pairing via either amplification or hybridizationbased methods. Changes in a carefully selected set of transcripts form an expression signature that can represent the total cellular response to antibiotic without requiring full characterization of the entire transcriptome. Preliminary proof-of-concept studies suggest that this approach may identify antibiotic susceptibility based on transcriptional phenotype much more quickly than is possible with growth-based assays. Other rapid phenotype-based approaches to antibiotic suscepti­ bility testing, including automated microscopy, ultrafine measurements of mass fluctuations, and others, are under development as well, with the former approved for clinical use. Because of its sensitivity in detecting even very rare nucleic acid fragments, sequencing provides an unprecedented depth of study into complex populations of cells and tissues. The strength of this depth and sensitivity applies not only to the detection of rare, novel pathogens in a sea of host signal but also to the identification of heterogeneous pathogen subpopulations in a single host that may differ, for example, in drug resistance profiles or pathogenesis determinants. For instance, recent studies have highlighted the diversification of pathogens in chronic bacterial infections, such as Pseudomonas in the lungs of patients with cystic fibrosis or M. tuberculosis in disseminated infec­ tion, perhaps allowing for niche specialization within the host. Such diversification has long been recognized in chronic viral populations, as exemplified by HIV. Future studies will be needed to elucidate the clinical significance of these variable subpopulations, even as deep sequencing is now providing unprecedented levels of detail about majority and minority members of this population. ■ ■HOST-BASED DIAGNOSTICS While pathogen-based diagnostics continue to be the mainstay for con­ firming infection, serologic testing and nonspecific biomarkers—such as erythrocyte sedimentation rate, C-reactive protein level, and even total white blood cell and neutrophil counts—have long been the basis of a strategy for measuring host responses to aid in the diagnosis of infection. Even recently identified host biomarkers of bacterial infec­ tion, such as procalcitonin, have fallen short in their versatility, with

positive and negative predictive values that are thus far adequate for only a few narrow applications but inadequate for generalized clinical use. Here, too, the application of genomics is now being explored to improve upon this approach, given the previously described limitations of serologic testing and the lack of specificity of protein biomarkers identified to date. Rather than using antibody responses as a retrospec­ tive biomarker for infection, recent efforts have focused on transcrip­ tomic analysis of the host response as a new direction with diagnostic implications for human disease. For instance, while pathogen-based diagnostic tests to distinguish active from latent tuberculosis infection have proven elusive, the tran­ scriptional profile of circulating white blood cells exhibits a differential pattern of expression of nearly 400 transcripts that distinguish active from latent tuberculosis; this expression pattern is driven in part by changes in interferon-inducible genes in the myeloid lineage. In a validation cohort, this transcriptional signature was able to distinguish patients with active versus latent disease, to distinguish tuberculosis infection from other pulmonary inflammatory states or infections, and to track responses to treatment in as little as 2 weeks, with normaliza­ tion of expression toward that of patients without active disease over 6 months of effective therapy. Such a test could play an important role not only in the management of patients but also as a marker of efficacy in clinical trials of new therapeutic agents. More recently, a distilled three-transcript signature has shown promise for distinguishing active from latent tuberculosis, raising hopes of a deployable assay in the near term. Similarly, considerable progress has been made toward identify­ ing host transcriptional signatures in circulating blood cells that distinguish viral from bacterial causes of upper respiratory infec­ tion, with better performance characteristics than current clinical parameters or available protein biomarkers. Additional host signa­ tures have been reported that distinguish among bacterial infection, viral infection, and inflammatory states; identify Lyme disease; identify influenza; and even distinguish between gram-positive and gram-negative bacterial infections. In some cases, results have been extended to different host populations—including adults and chil­ dren, and those with varying immune function—which obviously will be critical for generalizing such an approach. Thus, profiling of host transcriptional dynamics could augment the information obtained from studies of pathogens, both enhancing diagnosis and monitoring the progression of illness and the response to therapy. The frontier of genomic applications to understand host response to infection, with the potential of identifying biomarkers or even underlying disease biology, continues to rapidly advance, incor­ porating novel technological and computational approaches such as single-cell host transcriptional profiling of infected patients, to understand complex processes such as sepsis. In this era of genome-wide association studies and attempts to move toward personalized medicine, genomic approaches are also being applied to the identification of host genetic loci and factors that con­ tribute to infection susceptibility. Such loci will have undergone strong selection among populations in which the disease is endemic. Through identification of the beneficial genetic alleles among individuals who survive in such settings, markers for susceptibility or resistance are being discovered; these markers can be translated to diagnostic tests to identify susceptible individuals to implement preventive or pro­ phylactic interventions. Further, such studies may offer mechanistic insight into the pathogenesis of infection and inform new methods of therapeutic intervention. Such beneficial genetic associations were recognized long before the advent of genomics, as in the protective effects of the negative Duffy blood group or heterozygous hemoglobin abnormalities against Plasmodium infection. Genomic approaches allow more systematic and widespread application of this principle to identify not only people with increased susceptibility to prevalent diseases (e.g., HIV infection, tuberculosis, and cholera) but also host factors that contribute to and thus might predict the severity of dis­ ease. In one recent example, polymorphisms in certain genetic loci, or specific circulating autoantibodies, were found to be associated with severe COVID-19.

THERAPEUTICS Genomics has the potential to impact infectious disease therapeutics in two ways. By transforming the speed or type of diagnostic informa­ tion that can be attained, it can influence therapeutic decision-making. Alternatively, by opening new avenues to a better understanding of pathogenesis, providing new ways to disrupt infection, and delineating new approaches to antibiotic discovery or characterization of vaccine targets, it has the potential to facilitate the development of new thera­ peutic agents.

■ ■GENOMIC DIAGNOSTICS INFORMING THERAPEUTICS Efforts at antibiotic discovery are declining, with few new agents in the pipeline and even fewer new drugs (in particular, few agents with new mechanisms of action) entering the market. This phenomenon is due in part to the lack of economic incentives for the private sec­ tor; however, it is also attributable in part to the enormous challenges involved in the discovery and development of antibiotics. Most recent efforts have focused on broad-spectrum antibiotics; the development of a chemical entity that works across an extremely diverse set of organisms (i.e., species more divergent from each other than a human is from an amoeba) is far more challenging than the development of an agent that is designed to target a single bacterial species. Neverthe­ less, the concept of narrow-spectrum antibiotics has heretofore been rejected because of the lack of early diagnostic information that would guide the selection of such agents. Thus, rapid diagnostics providing antibiotic susceptibility information that can guide antibiotic selection in real time has the potential to alter and simplify antibiotic strategies by allowing a paradigm shift away from broad-spectrum drugs and toward narrow-spectrum agents. Such a paradigm shift clearly would have additional implications for antibiotic resistance, helping to limit selective pressure applied to pathogens and commensal bacteria dur­ ing therapy. CHAPTER 126 In yet another diagnostic paradigm with the potential to impact therapeutic interventions, genomics is opening new avenues to a better understanding not only of different host susceptibilities to infection but also of different host responses to therapy. For example, the role of glucocorticoids in tuberculous meningitis has long been debated. Recently, polymorphisms in the human genetic locus LTA4H, which encodes a leukotriene-modifying enzyme, were found to modulate the inflammatory response to tuberculosis. Patients with tuberculous men­ ingitis who were homozygous for the proinflammatory LTA4H allele were most helped by adjunctive glucocorticoid treatment, while those who were homozygous for the anti-inflammatory allele were nega­ tively affected by steroid treatment. Steroids have become part of the standard of care in tuberculous meningitis, but this study suggests that perhaps only a subset of patients benefit from this anti-inflammatory adjunct (while others may be harmed) and further suggests a genetic means of prospectively identifying this subset. Thus, genomic diagnos­ tic tests may eventually approach the goal of personalized medicine, informing diagnosis, prognosis, and treatment decisions by revealing the pathogenic potential of the microbe and by detecting individual­ ized host responses to both infection and therapy. Microbial Genomics and Infectious Disease ■ ■GENOMICS IN DRUG AND VACCINE DEVELOPMENT Genomic technologies are dramatically changing research on host– pathogen interactions, with a goal of increasingly influencing the process of therapeutic discovery and development. Sequencing offers several possible avenues into antimicrobial therapeutic discovery. First, genome-scale molecular methods have paved the way for comprehen­ sive identification of all essential genes encoded by a pathogen, thereby systematically identifying critical vulnerabilities within a pathogen that could be targeted therapeutically. Second, genome-scale methodologies offer rapid ways to address the mechanism of action of newly identified hits from compound screens. Whole genome sequencing offers a rapid, unbiased way to detect mutations arising in resistant mutants dur­ ing selection. Similarly, transcriptional profiling can provide insights into mechanisms of action of new candidate drugs. For instance, the

transcriptional signature of cell wall disruptors (e.g., β-lactams) is distinct from that of DNA-damaging agents (e.g., fluoroquinolones) or protein synthesis inhibitors (e.g., aminoglycosides). Either approach can thus suggest a mechanism of action or flag compounds for pri­ oritization because of a potentially novel activity. In an alternative strategy for determining mechanisms of action, genome-wide RNA interference or CRISPR screens can be used to identify genes required for antimicrobial efficacy. This approach provided new insights into the mechanism of action of drugs that have been in use for decades for human African trypanosomiasis. Third, sequencing can readily identify the most conserved regions of a pathogen’s genomes and cor­ responding gene products; this information is invaluable in narrowing antigen candidates in vaccine development. These surface proteins can be expressed recombinantly and tested for the ability to elicit a sero­ logic response and protective immunity. This process, termed reverse vaccinology, has proved particularly useful for pathogens that are diffi­ cult to culture or poorly immunogenic. After decades of development, the utility of this approach became dramatically apparent with the rapid development of mRNA vaccines targeting conserved regions of the SARS-CoV-2 genome, fueling the most rapid development of a vac­ cine in history. Comparative genomics informed by prior coronavirus sequences enabled the first mRNA vaccine design to begin within days of the first SARS-CoV-2 sequence being made publicly available, and now annual updates are guided by ongoing sequencing of circulating variants, as discussed in detail below.

Genomics has been employed in both developing vaccines and defining their impact on microbial epidemiology and ecology. Exam­ ples include recent studies of influenza, malaria, S. pneumoniae, and HPV following vaccine introduction. Extensive sequencing of influ­ enza viruses has been valuable in understanding the modest efficacy of seasonal influenza vaccination, and the combination of genomics and antigenic cartography is used to select strains to include in subsequent influenza vaccines. Beyond this, sequence conservation informs efforts to design more robust pan-influenza or pan-coronavirus vaccines. The RTS,S/AS01 malaria vaccine was analyzed by targeted sequencing of parasites from vaccinated and control populations during a phase 3 trial conducted at 11 sites in Africa; these analyses revealed reduced vaccine efficacy against parasites with amino acid mutations in the circumsporozoite protein targeted by the vaccine. Similarly, studies of the more established pneumococcal vaccines (the 7- and 13-valent polysaccharide conjugate vaccines, PCV-7 and PCV-13) documented serotype replacement: strains targeted by the vaccine have dramatically decreased in prevalence following widespread vaccination campaigns. Given that specific serotypes of HPV (e.g., types 16 and 18) clearly are more strongly associated than others with carcinogenesis, HPV vaccines have capitalized on serotype replacement, targeting vaccine strains to specifically prevent infection with the more dangerous sero­ types. Such a strategy, informed by pathogen genomics, aims to protect individuals and ideally to decrease the circulating burden of more virulent strains within society. PART 5 Infectious Diseases Large-scale gene content analysis from sequencing or expression profiling enables new research directions that provide novel insights into the interplay of pathogen and host during infection or coloniza­ tion. One important goal of such research is to suggest new therapeutic approaches to disrupt this interaction in favor of the host. Indeed, one of the most immediate applications of next-generation sequencing technology has come from simply characterizing human pathogens and related commensal or environmental strains and then finding genomic correlates for pathogenicity. For instance, as Escherichia coli varies from a simple nonpathogenic, lab-adapted strain (K-12) to a Shiga toxin–producing enterohemorrhagic gastrointestinal pathogen (O157:H7), it displays up to a 25% difference in gene content, though it is classified as the same species. Similarly, some isolates of Entero­ coccus—a genus notorious for its increasing incidence of resistance to common antibiotics such as ampicillin, vancomycin, and aminoglyco­ sides—also contain recently acquired genetic material comprising up to 25% of the genome on mobile genetic elements. This fact suggests that horizontal gene transfer plays an important role in the organisms’ adaptation as nosocomial pathogens. On closer study, this genome

expansion is associated with loss of CRISPR elements, which protect the bacterial genome from invasion by certain foreign genetic mate­ rial, and may thus facilitate the acquisition of antibiotic resistance– conferring genetic elements. While loss of this regulation appears to impose a competitive disadvantage in antibiotic-free environments, these drug-resistant strains thrive in the presence of even some of the best antienterococcal therapies. In addition to insights gained from genome sequencing, transcriptomic and proteomic profiling of patho­ gens under various conditions that mimic colonization or infection, including existence as biofilms or in polymicrobial communities, intra­ cellular infection models, antibiotic exposure, and nutrient starvation, has begun to reveal novel biologic features that may be targeted by the next generation of therapies. At the cutting edge of the host–pathogen interface, single-cell transcriptomic methodologies are rapidly increas­ ing in feasibility and extent, revealing previously unknown heterogene­ ity in the potential outcomes of intracellular infection. Thus, genomic studies are transforming our understanding of infec­ tion, offering evidence of virulence factors or toxins and providing insight into ongoing evolution of pathogenicity and drug resistance. One goal of such studies is to identify therapeutic agents that can disrupt the pathogenic process. There is currently much interest in the theoretical concept of antivirulence drugs that inhibit virulence factors rather than killing the pathogen outright as a means to intervene in infection. Further, with sequencing ever more accessible and efficient, ongoing large-scale studies have unprecedented statistical power to associate clinical outcomes with pathogen and host genotypes and thus to further reveal vulnerabilities in the infection process that can be targeted for disruption. Although this is just the beginning, such stud­ ies point to a tantalizing future in which the clinician is armed with genomic predictors of infection outcome and therapeutic response to guide clinical decision-making. EPIDEMIOLOGY OF INFECTIOUS DISEASES Epidemiologic studies of infectious diseases have several main goals: to identify and characterize outbreaks, to describe the pattern and dynamics of an infectious disease as it spreads through populations, and to identify interventions that can limit or reduce the burden of disease. One classic, paradigmatic example is John Snow’s elucidation of the origin of the 1854 London cholera outbreak. Snow used careful geographic mapping of cases to determine that the likely source of the outbreak was contaminated water from the Broad Street pump, and by removing the pump handle, he aborted the outbreak. Whereas that effort was undertaken without knowledge of the causative agent of cholera, advances in microbiology and genomics have expanded the purview of epidemiology to consider not just the disease but also the pathogen and its genetic variants, its virulence factors, and the complex relationships between microbial and host populations. Through use of genomic tools such as high-throughput sequenc­ ing, the diversity of a microbial population can be rapidly described with unprecedented resolution, with discrimination between isolates that have single-nucleotide differences across the entire genome and advancement beyond prior approaches that relied on phenotypes (such as antibiotic susceptibility profiles) or genetic markers (such as multilocus sequence typing). The development of statistical methods grounded in molecular genetics and evolutionary theory has estab­ lished analytical approaches that translate descriptions of microbial population diversity and structure into descriptions of the origin and history of pathogen spread. By linking phylogenetic reconstruction with epidemiologic and demographic data, genomic epidemiology presents the opportunity to track transmission from person to person and across demographic and geographic boundaries, to infer transmis­ sion patterns of both pathogens and sequence elements that confer phenotypes of interest, and to estimate the transmission dynamics of outbreaks. ■ ■TRANSMISSION NETWORKS Whole genome sequencing of pathogen genomes can be used to infer transmission and identify point-source outbreaks. As reported in a seminal paper in 2010, a study of MRSA in a Thai hospital demonstrated

the use of whole genome sequencing in reconstructing the transmis­ sion of a pathogen from patient to patient by integrating the analysis of accumulation of mutations over time with the dates and hospital locations of the infected individuals. Since then, multiple instances of the use of whole genome sequencing to define and motivate interven­ tions aimed at interrupting transmission chains have been reported. In another MRSA outbreak in a special-care baby unit in Cambridge, United Kingdom, whole genome sequencing extended the traditional infection control analysis, which relies on typing organisms by their antibiotic susceptibilities, to sequencing of isolates from clinical sam­ ples. This approach identified an otherwise unrecognized outbreak of a specific MRSA strain that was occurring against a background of the usual pattern of infection caused by a diverse circulating population of MRSA strains. The analysis showed evidence of transmission among mothers within the special-care baby unit and in the community and demonstrated the key role of MRSA carriage in a single health care provider in the persistence of the outbreak. In yet another example, in response to the observation of 18 cases of infection by carbapenemaseproducing Klebsiella pneumoniae over 6 months at the National Insti­ tutes of Health Clinical Research Center, genome sequencing of the isolates was used to discriminate between the possibilities that these cases represented multiple, independent introductions into the health care system or a single introduction with subsequent transmission. On the basis of network and phylogenetic analysis of genomic and epide­ miologic data, the authors reconstructed the likely relationships among the isolates from patient to patient, demonstrating the nosocomial spread of a single resistant Klebsiella strain. Similar approaches have elucidated the extent to which presumed nosocomial C. difficile, VRE, and carbapenem-resistant Enterobacterales represent within-hospital transmission—of bacterial strains or of plasmids and mobile genetic elements—versus independent acquisitions of unrelated pathogens. With these demonstrations of the potential contribution of genomics to hospital infection-control efforts, an important avenue of research seeks to develop statistical methods with which to ascertain when such tools are useful and their cost-effectiveness when compared with that of current nongenomic approaches. Genome sequencing of clinical specimens of viruses has been used to understand their patterns of spread and the clinical and epidemio­ logic implications of genetic variants. As RNA viruses use an errorprone RNA-dependent RNA polymerase, they accumulate mutations at a rapid rate, facilitating inferences about the dynamics and patterns of spread. These tools have been applied to the study of outbreaks of well-known viruses, such as recent outbreaks of yellow fever in South America and mumps in the United States, as well as recent zoonotic pathogens, such as the coronaviruses MERS-CoV and SARS-CoV. The sequencing of SARS-CoV-2 in the context of the pandemic has offered a powerful example of the contributions that genomic epidemiology can make to, and its increasingly central role in, tracking the spread of a pathogen both locally and globally and informing policy and public health decision-making. Moreover, tracking variants and combining the genome sequences with epidemiologic line-list data enable inves­ tigation of the extent to which new variants cause differing symptom profiles and levels of severe disease. The uncovering of unexpected transmission events by genomic epi­ demiology studies is motivating investigations into pathogen ecology and modes of transmission. Whole genome sequencing established the clonality of several high-profile outbreaks, enabling the discovery of dangerous contaminants such as Burkholderia pseudomallei in aro­ matherapy spray, Pseudomonas in eyedrops, Exserohilum in injectable corticosteroids, Fusarium in epidural anesthetic preparations, E. coli O157:H7 in beef, and Mycobacterium chimaera in the temperaturecontrol systems used during cardiac bypass. Each of these outbreaks caused considerable morbidity and in some cases mortality before being localized and prevented on the basis of investigations informed and accelerated by genomic epidemiology. As more studies aim to carefully define the origins and spread of infectious agents using the high-resolution lens of whole genome sequencing, fundamental questions arise about the diversity of infect­ ing and colonizing microbial populations. Traditional microbiologic

methods include taking a single colony from a growth plate as repre­ sentative of the population. However, the more diverse the colonizing or infecting pathogen population, the less representative these individ­ ual isolates are and the greater the possibility for introducing error into whole genome sequencing–based methods while reconstructing trans­ mission. Sequencing studies of multiple colonies of an S. aureus strain colonizing a single individual showed a “cloud” of diversity. What is the clinical significance of this diversity? What are the processes that generate and limit it? What amount of diversity is transmitted under different conditions and routes of transmission? How do the answers to these questions vary by infectious organism, type of infection, host, and response to treatment? More comprehensive descriptions of diver­ sity, population dynamics, transmission bottlenecks, and the forces that shape and influence the growth and spread of microbial populations will be a critically important focus of future investigations.

■ ■ORIGINS AND DYNAMICS OF PATHOGEN SPREAD In addition to reconstructing the transmission chains of local out­ breaks, genomics-based epidemiologic methods reveal broad-scale geographic and temporal spread of pathogens. Four examples include the origins of cholera in Haiti, the history of HIV-1 group M, the spread of Ebola in West Africa, and the timing and nature of spread of the zoonotic COVID-19 pandemic, which brought genomic tracking of pathogen variants to the forefront of public consciousness for a time. Cholera, a dehydrating diarrheal illness caused by infection with Vibrio cholerae, first spread worldwide from the Indian subcontinent in the 1800s and has since caused seven pandemics; the seventh pandemic has been ongoing since the 1960s. An investigation into the geographic patterns of cholera spread in the seventh pandemic used genome sequences from a global collection of 154 V. cholerae strains repre­ senting isolates from 1957 to 2010. This investigation revealed that the seventh pandemic has comprised at least three overlapping waves spreading out from the Indian subcontinent (Fig. 126-3A). Further, analysis of the genome of an isolate of V. cholerae from the 2010 out­ break of cholera in Haiti showed it to be more closely related to isolates from South Asia than to isolates from neighboring Latin America, sup­ porting the hypothesis that the outbreak was derived from V. cholerae introduced into Haiti by human travel (likely from Nepal) rather than by environmental or more geographically proximal sources. A subse­ quent study that dated the time to the most recent common ancestor of a population of V. cholerae isolates from Haiti provided further sup­ port for a single point-source introduction from Nepal. Application of similar methods that integrate pathogen genome sequences, mutation rates, geographic locations, and phylogenetic inference to HIV-1 group M dated the origin of the virus to the 1920s and the city of Kinshasa (then called Leopoldville), the capital of the Democratic Republic of the Congo (then called the Belgian Congo). This work established an understanding of how a boom in industry and a city with extensive railroad connections provide a scaffolding along which a virus can rapidly spread geographically. CHAPTER 126 Microbial Genomics and Infectious Disease Genome sequencing has proven invaluable in understanding the geographic, demographic, climatic, and administrative factors that drove, sustained, and limited the 2013–2016 Ebola outbreak that rav­ aged West Africa (Fig. 126-3B) as well as the factors and patterns of transmission of Zika virus in the Americas and most recently the tim­ ing and origins of SARS-CoV-2 transmission in human populations. With the rapid availability of the SARS-CoV-2 viral genome sequence, data from a set of cases from early in the pandemic enabled inference of the time to the most recent common ancestor, supporting that SARSCoV-2 entered circulation in human populations in Wuhan, China, sometime in late November to early December of 2019. Subsequently, large, coordinated sequencing networks have been able to recreate its patterns of global spread, discover new variants, and monitor as these variants disseminate. For the first time, scientists and even the public were able to watch viral evolution through a population in almost real time. First, a single-nucleotide polymorphism (D614G in the viral spike protein) displaced the original strain in early 2020, followed in subsequent years by other variants such as Alpha, Delta, Omicron, and then various Omicron sublineages becoming dominant either

A 1.00 0.75 PART 5 Infectious Diseases Variant Fraction Omicron (and subvariants) Founder strain 0.50 0.25 0.00 Oct 2020 Apr 2021 Oct 2021 Apr 2022 Oct 2022 Apr 2023 Oct 2023 C B FIGURE 126-3  A. Transmission events inferred from phylogenetic reconstruction of 154 Vibrio cholerae isolates from the seventh cholera pandemic. Date ranges represent estimated time to the most recent common ancestor for strains transmitted from source to destination locations, based on a Bayesian model of the phylogeny. (Reprinted by permission from the Nature Publishing Group, Nature 477:462. Evidence for several waves of global transmission in the seventh cholera pandemic, A Mutreja et al © 2011.) B. Inferred Ebola virus spread in West Africa (Liberia, red; Guinea, green; and Sierra Leone, blue) by phylogeographic methods using virus genome sequences, dates, and an evolutionary model. The lines reflect spread between population centroids of each administrative region, going from the thin end to the thick end and colored by a time scale. (Reprinted by permission from Nature Publishing Group, Nature 544:309. Virus genomes reveal factors that spread and sustained the Ebola epidemic, G Dudas et al © 2017.) C. SARS-CoV-2 variant proportions in the United States from the start of the COVID-19 pandemic through March 6, 2024. Each major variant family that rose to dominance in the United States is labeled. (Variant proportions are taken from https://covariants.org, accessed April 21, 2024, which derives underlying variant data from the Global Initiative on Sharing All Influenza Data [GISAID].) regionally or globally (Fig. 126-3C) as their mutation profile conferred advantages in increasingly immune populations. One compelling area of innovation in molecular epidemiology is the development of wastewater-based surveillance. Wastewater sur­ veillance has been used to monitor for the appearance of poliovirus, among others, but its widespread and widely publicized use for SARSCoV-2 in “nowcasting” quantitative estimates of epidemic trends has galvanized interest in using wastewater to monitor disease trends and pathogens and their characteristics. The U.S. Centers for Disease Con­ trol and Prevention has sought to establish a national wastewater sur­ veillance system, building on the SARS-CoV-2 work. These tools can help provide a population-level view of the first appearance of a patho­ gen, variant, or antibiotic resistance determinant and can also help monitor the population burden and hence the epidemic curve. Work remains to develop robust, repeatable, and accurate methods across many pathogens, particularly those that can replicate or exchange DNA

Delta Alpha in wastewater environments. Nonetheless, this data source has great promise for providing early signals to decision-makers to help inform public health actions. These efforts illustrate the remarkable promise of genome sequenc­ ing in improving outbreak response strategies by elucidating previously hidden origins and paths of disease spread and details of the forces that shape epidemics. The combination of in-the-field sequencing with portable sequencing platforms, rapid data sharing, and rapid open analysis through sites such as nextstrain.org offers a paradigm by which real-time genomic epidemiology, including wastewater-based epidemiology, may contribute to “weather maps,” enabling prediction of epidemic patterns in space and time and thus providing guidance for public health interventions to slow or control their spread. Increasing numbers of investigations into the spread of many patho­ gens are contributing to a growing atlas of maps describing routes, patterns, and tempos of microbial diversification and dissemination,

not just for agents of emerging infectious diseases but for common pathogens as well. Such studies will create a vast amount of data that can be used to investigate the diversity and microbiologic links within distinct niches and the patterns of spread from one niche to another. The increasingly broad adoption of genome sequencing by health care and public health institutions ensures that the available catalog of genome sequences and associated epidemiologic data will grow very rapidly. For example, updating from the pulsed-field gel electropho­ resis techniques that have been used to define strains of food-borne pathogens since the late 1980s, PulseNet—the U.S. Centers for Disease Control and Prevention network for monitoring these pathogens—has instituted routine genome sequencing. The COVID-19 pandemic fur­ ther underscores the importance of building a new global public health infrastructure in which sequencing plays a central role to facilitate early disease discovery, rapid and close tracking of spread, and develop­ ment of diagnostics and targeted effective interventions. With higherresolution description of microbial diversity and of the dynamics of that diversity over time and across epidemiologic and demographic boundaries and evolutionary niches, we will gain even greater insights into the relationships of transmission routes and patterns of historical spread. ■ ■EPIDEMIC POTENTIAL Defining pathogen transmissibility is a critical step in the development of public health surveillance and intervention strategies because this information can help to predict the epidemic potential of an outbreak. Transmissibility can be estimated by a variety of methods, including inference from the growth rate of an epidemic and the generation time of an infection (the mean interval between infection of an index case and infection of the people infected by that index case). Genome sequencing and analysis of a well-sampled population provide another method by which to derive similar fundamental epidemiologic param­ eters. One key measure of transmissibility is the basic reproduction number, defined as the number of secondary infections generated from a single primary infectious case. When the basic reproduction number is >1, an outbreak has epidemic potential; when it is <1, the outbreak will become extinct. On the basis of sequences from influenza virus samples obtained from infected patients very early in the 2009 H1N1 influenza pandemic, the basic reproduction number was estimated through a population genomic analysis at 1.2; this result provided greater confidence to estimates derived by traditional epidemiologic data, which ranged from 1.4 to 1.6. In addition, with the assumption of a molecular clock model, sequences of H1N1 samples together with information about when and where the samples were obtained have been used to estimate the date and location of the pandemic’s origin, providing insight into disease origins and dynamics. Integrating viral genomics with other types of data—such as the timing and nature of mitigation efforts and the impact of those efforts on mobility—will expand the toolkit with which to assess the impact of public health interventions on slowing and controlling disease spread. These tools may also be applied to institutional infection control: with the devel­ opment of return-to-work protocols, sequencing offers one option to help learn the extent to which infections arose from within-institution spread. Because the magnitude and intensity of the public health response are guided by the predicted size of an outbreak, the ability of genomic methods to cast light on a pathogen’s origin and epidemic potential adds an important dimension to the contributions of these methods to infectious disease epidemiology. ■ ■PATHOGEN EVOLUTION Beyond describing transmission and dynamics, pathogen genomics can provide insight into the evolution of pathogens and the interactions of selective pressures, the host, and pathogen populations, which can have implications for clinical decision-making and the development of vaccines and therapeutics. From a clinical perspective, this process is central to the acquisition of antibiotic resistance, the generation of increasing pathogenicity or new virulence traits, the evasion of host immunity and clearance (leading to chronic infection), and vaccine efficacy.

Microbial genomes evolve through a variety of mechanisms, includ­ ing mutation, duplication, insertion, deletion, recombination, and horizontal gene transfer. Segmented viruses (e.g., influenza virus) can reassort gene segments within multiply infected cells. The pandemic 2009 H1N1 influenza A virus, for example, appears to have been generated through reassortment of several avian, swine, and human influenza strains. Such potential for the evolution of novel pandemic strains has precipitated concern about the possible evolution to trans­ missibility of virulent strains that have been associated with high mortality rates but have not yet exhibited efficient human infectivity. Experiments with H5N1 avian influenza, for example, have defined five mutations that render it transmissible, at least in ferrets—the animal model system for human influenza. Studies that examine the genomes of pathogens collected longitudinally from individual infec­ tions have similarly demonstrated their evolution as they adapt to host environments and new immune and therapeutic pressures.

The continuous antigenic evolution of seasonal influenza offers an example of how studies of pathogen evolution can impact surveillance and vaccine development. Frequent updates to the annual influenza vaccine are needed to ensure protection against the dominant strains. These updates are based on anticipating which viral populations from a pool of substantial locally and globally diverse circulating viruses will predominate in the upcoming season. Toward that end, sequencingbased studies of influenza virus dynamics have shed light on the global spread of influenza, providing concrete data on patterns of spread and helping to elucidate the origins, emergence, and circulation of novel strains. Through analysis of >1000 influenza A H3N2 virus isolates over the 2002–2007 influenza seasons, Southeast Asia was identified as the usual site from which diversity originates and spreads worldwide. Further studies of global isolate collections have shed further light on the diversity of circulating virus, showing that some strains persist and circulate outside of Asia for multiple seasons. Similar studies with SARS-CoV-2, including genome sequences from orders of magnitude more clinical specimens than the aforementioned influenza study, have helped identify where new SARS-CoV-2 variants have emerged and informed vaccine composition. CHAPTER 126 Microbial Genomics and Infectious Disease Not only do genomic epidemiology studies have the potential to help guide vaccine selection and development, but they also help to track what happens to pathogens circulating in the population in response to vaccination. By describing pathogen evolution under the selective pressure of a vaccinated population, such studies can play a key role in surveillance and identification of virulence determinants and perhaps may even help to predict the future evolution of escape from vaccine protection. The seven-valent pneumococcal conjugate vaccine (PCV-7) targeted the seven serotypes of S. pneumoniae respon­ sible for the majority of invasive disease at the time of its introduction in 2000; since then, PCV-7 has dramatically reduced the incidence of pneumococcal disease and mortality. However, sequencing of >600 Massachusetts pneumococcal isolates from 2001 to 2007 has shown that, in the pneumococcal population, previously rare nonvaccine serotypes are replacing vaccine serotypes and that some vaccine strains have persisted despite vaccination by recombining the vaccine-targeted capsule locus with a cassette of capsule genes from non-vaccine-targeted serotypes. Studying the virulence of these persistently circulating strains can help to rationally update vaccine composition. The large collections of pathogen genome sequences are driving development of tools to decipher the genetic basis for antibiotic resis­ tance, virulence, and infection risk. Some pathogens have distinct types of clinical manifestations, the basis for which we are just beginning to unravel with the aid of genomics. For example, Listeria is a food-borne pathogen that can cause both central nervous system infections and maternal/neonatal infections. Although all Listeria isolates are treated the same from a public health perspective, variation in outcomes exists and appears to be linked to the strains’ genomic background. Molecular analysis of a national reference laboratory’s collections of well-characterized specimens, based on the fraction of immunocom­ petent people in which they caused disease, revealed that some clonal complexes of Listeria appear to be more virulent than others. Linking epidemiology and comparative genomics then enabled enumeration

of putative virulence factors that contribute to the clinical phenotypes as well as identification and confirmation of a novel gene cluster that mediates central nervous system tropism. This approach illustrates progress toward a future in which we can link pathogen identification with risk, thereby informing resource use and allocation.

GLOBAL CONSIDERATIONS While cutting-edge genomic technologies are largely implemented in the developed world, their application to infectious diseases perhaps offers the biggest potential impact in less developed regions where the burden of these infections is greatest. This globalization of genomic technology and its extensions has already begun in each of the areas of focus highlighted in this chapter; it has occurred both through the application of advanced technologies to samples collected in the devel­ oping world and through the adaptation and importation of technolo­ gies directly to the developing world for on-site implementation as they become more globally accessible. Genomic characterization of the pathogens responsible for such important global illnesses, such as tuberculosis, malaria, trypanoso­ miasis, cholera, and COVID-19, has led to insights in diagnosis, treat­ ment, and infection control. For instance, with the increasing burden of drug-resistant tuberculosis in the developing world, a molecular diag­ nostic test has been developed to detect rifampin-resistant tuberculo­ sis. The genetic basis for rifampin resistance has been well defined by targeted sequencing: characteristic mutations in the molecular target of rifampin, RNA polymerase, account for the vast majority of instances of rifampin resistance. At least in areas that can afford to implement it, a rapid, automated PCR assay that can detect both M. tuberculosis and a rifampin-resistant allele of RNA polymerase directly in clinical samples has been implemented in parts of Africa and Asia, transforming the recognition and management of incident tuberculosis and multidrug resistance where they are most prevalent. Since rifampin resistance frequently accompanies resistance to other antibiotics, this test can suggest the presence of multidrug-resistant M. tuberculosis within hours instead of weeks, without the infrastructure required for culture. PART 5 Infectious Diseases High-resolution genomic tracking of the spread of epidemics—from cholera to Ebola to Zika to COVID-19—has yielded insights into which public health measures may prove most effective in controlling local epidemics. Many genomic tracking efforts have involved close collaborations with local scientists and public health officials, and considerable investment in sequencing infrastructure in sub-Saharan Africa has made on-location epidemic tracking in the event of another such outbreak feasible. Such investment can not only enable real-time outbreak recognition and tracking but also provide the infrastructure needed to capitalize on the many other benefits of high-throughput sequencing as they are developed. The early returns of such invest­ ments are exemplified by the rapid reporting of genome sequences for SARS-CoV-2, with substantial insights from sequencing efforts across the world; perhaps most notably, the Omicron variant was first identified in South Africa, enabling preparations for what became a global sweep in the subsequent weeks. Overall, sequencing efforts have become cheaper and have moved closer to point-of-care with each passing year. As these technologies synergize with efforts to globalize information-technology resources, global implementation of genomic methods promises to spread state-of-the-art methods for diagnosis, treatment, and epidemic tracking of infections to areas that need these capabilities the most. GENOMICS AND THE COVID-19 PANDEMIC The COVID-19 pandemic, which began in 2019 and spread worldwide in 2020, resulted in hundreds of millions of documented infections and millions of deaths and serves as a prime example of the pandemic potential of infectious pathogens. It also demonstrated the central role that genomic tools now play in response to infectious outbreaks, ranging from enabling diagnostics and vaccines to tracking evolution, virulence, and transmissibility of the pathogen. The rapid discovery of SARS-CoV-2 and sequencing of its genome was complete within weeks of the recognition of the clinical syndrome. The rapid public sharing of this genome sequence led directly to two key interventions: diagnostic

assay development via RT-qPCR and vaccine design. Crucially, vaccine development was informed by homology of the SARS-CoV-2 sequence to SARS and MERS coronaviruses. The dominant antigen of those viruses, the surface protein Spike, was well characterized, enabling the design of the first SARS-CoV-2 vaccines to begin the day after the genome sequence was shared. The progress of the most rapidly developed and validated vaccine in human history was unquestionably accelerated by genomic technology. Whole genome sequencing has also played a large role in outbreak tracking and confirmation of case clusters in institutional settings such as hospitals or congregate living facilities, in helping to distinguish reinfections from recrudescence or prolonged viral shedding, in monitoring spread through societ­ ies, and in tracking pathogen evolution, including the emergence of new variants of concern with altered transmissibility, severity, and/ or partial evasion of the immune response generated to prior versions of the virus, vaccines, or monoclonal antibody therapeutics. Finally, cutting edge genomic methods including single-cell transcriptional profiling and genome-wide association studies are contributing to our understanding of the wide variability in outcomes of SARS-CoV-2 infection, ranging from asymptomatic carriage to death. Overall, just as the global response to the COVID-19 pandemic underscores the indispensable role that genomics methods have come to play in the clinical and public health management of infectious diseases, the dev­ astating impact of this pandemic reveals the urgent need for further development and implementation of tools for disease surveillance and response. SUMMARY By illuminating the genetic information that encodes the most fun­ damental processes of life, genomic technologies are transforming many aspects of medicine. In infectious diseases, methods such as next-generation sequencing and genome-scale expression analysis offer information of unprecedented depth about individual microbes as well as microbial communities. This information is expanding our understanding of the interactions of microorganisms with each other, their human hosts, and the environment. Despite technological and financial barriers that slow the widespread adoption of large-scale pathogen sequencing in clinical and public health settings, genomic methodologies have utterly transformed the research landscape in infectious disease and are beginning to make meaningful inroads into clinical settings. As even vaster amounts of data are generated, innovations in data storage, development of bioinformatics tools to manipulate the data, standardization of methods, and training of end-users in both the research and clinical realms will be required. The cost-effectiveness and applicability of whole genome sequenc­ ing, particularly in the clinic, remain to be studied, and studies of the impact of genome sequencing on patient outcomes will be needed to clarify the contexts in which these new methodologies can make the greatest contributions to patient well-being. The ongoing efforts to overcome limitations through collaboration, teaching, and reduction of financial obstacles should be applauded and expanded. With advances in genomic technologies and computational analysis, our ability to detect, characterize, treat, monitor, prevent, and control infections has advanced rapidly in recent years and will continue to do so, with the hope of heralding a new era where the clinician is better armed to combat infection and promote human health. ■ ■FURTHER READING Bullman S et al: Emerging concepts and technologies for the discov­ ery of microorganisms involved in human disease. Annu Rev Pathol 12:217, 2017. Burnham CD et al: Diagnosing antimicrobial resistance. Nat Rev Microbiol 15:697, 2017. Levy JI et al: Wastewater surveillance for public health. Science 379:26, 2023. CRyPTIC Consortium et al: Prediction of susceptibility to first-line tuberculosis drugs by DNA sequencing. N Engl J Med 379:1403, 2018. Dudas G et al: Virus genomes reveal factors that spread and sustained the Ebola epidemic. Nature 544:309, 2017.