Skip to main content

01 - 479 Principles of Human Genetics

479 Principles of Human Genetics

Genes, the Environment, and Disease PART 16 J. Larry Jameson, Peter Kopp

Principles of Human

Genetics IMPACT OF GENETICS AND GENOMICS

ON MEDICAL PRACTICE Over the past four decades, novel insights into human genetics and genomics have fundamentally impacted the practice of medicine, ush­ ering in a new area with a deeper understanding of the genetic basis of numerous health conditions, novel diagnostic technologies, disease prevention and management, personalized medicine, and targeted therapies. Human genetics refers to the study of individual genes, their role and function in disease, and their mode of inheritance. Genomics refers to an organism’s entire genetic information, the genome, and the function and interaction of DNA within the genome, as well as with environmental or nongenetic factors, such as a person’s lifestyle. With the characterization of the human genome, genomics not only comple­ ments traditional genetics in our efforts to elucidate the etiology and pathogenesis of disease, but it also plays a prominent and expand­ ing role in diagnostics, prevention, and therapy (Chap. 480). These transformative developments, originally emerging from the Human Genome Project, have been variably designated genomic medicine, personalized medicine, or precision medicine. Precision medicine aims at customizing medical decisions to an individual patient. For example, a patient’s genetic characteristics (genotype) can be used to optimize drug therapy and predict efficacy, adverse events, and drug dosing of selected medications (pharmacogenomics) (Chap. 72). The character­ ization of the mutational profile of a malignancy allows the identifica­ tion of driver mutations or overexpressed signaling molecules, which then facilitates the selection of targeted therapies. Genome-wide poly­ genic risk scores (PRS) for common complex diseases are beginning to emerge and may impact disease prevention in the future. Genetics has traditionally been viewed through the window of relatively rare single-gene diseases. These disorders account for ~10% of pediatric admissions and childhood mortality. Historically, genetics has focused predominantly on chromosomal and metabolic disorders, reflecting the long-standing availability of techniques to diagnose these conditions. For example, conditions such as trisomy 21 (Down’s syn­ drome) or monosomy X (Turner’s syndrome) can be diagnosed using cytogenetics. Likewise, many metabolic disorders (e.g., phenylketon­ uria, familial hypercholesterolemia) are diagnosed using biochemical analyses. The advances in DNA and RNA diagnostics have extended the field of genetics to include virtually all medical specialties and have led to the elucidation of the pathogenesis of most monogenic disorders. In addition, virtually every medical condition has a genetic component. As is often evident from a patient’s family history, many common disorders such as hypertension, heart disease, asthma, dia­ betes mellitus, and mental illnesses are significantly influenced by the genetic background. These polygenic or multifactorial (complex) disorders involve the contributions of many different genes, as well as environmental factors that can modify disease risk. Genome-wide association studies (GWAS) have elucidated numerous disease-associated loci and are providing novel insights into the allelic architecture of complex traits. These studies have been facilitated by the availability of comprehensive catalogues of human single nucleotide polymorphism (SNP) haplotypes (HapMap, International Genome Sample Resource). Next-generation DNA sequencing (NGS) technologies have evolved rapidly, and the cost of sequencing whole exomes (the exons within the genome; whole exome sequencing [WES]) or genomes (whole genome sequencing [WGS]) has plummeted. Comprehensive unbiased sequence analyses are now routinely used to characterize individuals

with complex undiagnosed conditions or to determine the mutational profile of advanced malignancies in order to select optimal and targeted therapies. The assembly of diploid genomes, i.e., the characterization of the complete genetic information from both sets of chromosomes in an individual’s genome, will further enhance the complete resolution of genetic variation and should provide further insights into heritability and disease mechanisms. Cancer has a genetic basis because it results from acquired somatic mutations in genes controlling growth, apoptosis, and cellular differ­ entiation (Chap. 76). In addition, the development of many cancers is associated with a hereditary predisposition. Characterization of the genome (and epigenome) in various malignancies has led to funda­ mental new insights into cancer biology and reveals that the genomic profile of mutations is in many cases more important in determining the appropriate therapy than the organ in which the tumor originates. The Cancer Genome Atlas (TCGA) initiative of the National Cancer Institute and the National Human Genome Research Institute has already characterized the genomic landscape across >30 malignancies. TCGA consists of comprehensive analyses of genomic and proteomic alterations and provided fundamental new insights into the molecular pathogenesis of cancer. These data, together with comprehensive cata­ logues of somatic mutations identified in human cancer, have direct clinical ramifications that impact cancer taxonomy, as well as the devel­ opment and choice of targeted therapies. Genetic and genomic approaches have proven invaluable for the detection of infectious pathogens and are used clinically to identify agents that are difficult to culture such as mycobacteria, viruses, and parasites, or to track infectious agents locally or globally. In many cases, molecular genetics has improved the feasibility and accuracy of diagnostic testing and has opened new avenues for therapy, includ­ ing gene and cellular therapies (Chap. 483). Molecular genetics has also provided the opportunity to characterize the microbiome, a field that characterizes the population dynamics of bacteria, viruses, and parasites that coexist with humans and other animals (Chap. 483). The microbiome has significant effects on normal physiology as well as various disease states, and the field is now focusing on defining the mechanisms underlying these interactions. Molecular biology has significantly changed the treatment of human disease. Peptide hormones, growth factors, cytokines, and vaccines can be produced in large amounts using recombinant DNA and RNA tech­ nology (e.g., mRNA vaccines against SARS-CoV-2; small interfering RNA [siRNA] to treat hypercholesterolemia). Targeted modifications of recombinant peptides provide improved therapeutic tools, as illus­ trated by genetically modified insulin analogues with more favorable kinetics or glucagon-like peptide 1 (GLP-1) agonists for treatment of type 2 diabetes and for weight management. The rate at which new genetic and genomic information is being generated presents many challenges for health care providers and systems. Although many functional aspects of the genome remain unknown, there are many clinical situations where genetic and genomic information optimize patient care. Much genetic information resides in databases that provide easy access to the expanding information about the human genome, genetic disease, and genetic testing (Table 479-1). For example, several thousand monogenic disorders are summarized in a large, continuously evolving compendium, the Online Mendelian Inheritance in Man (OMIM) catalogue (Table 479-1). The constant refinement of bioinformatics and big data analytics, together with the widespread adoption of electronic health records (EHRs), are simplify­ ing the access, analysis, and integration of this daunting amount of new information. Importantly, genomic data can be integrated readily into EHRs and thus impact clinical practice. ■ ■THE HUMAN GENOME Structure of the Human Genome  The Human Genome Project was initiated in the mid-1980s as an ambitious effort to characterize

TABLE 479-1  Selected Databases Relevant for Genomics and Genetic Disorders SITE URL COMMENT National Center for Biotechnology Information (NCBI) http://www.ncbi.nlm.nih.gov/ Broad access to biomedical and genomic information, literature (PubMed), sequence databases, software for analyses of nucleotides and proteins Extensive links to other databases, genome resources, and tutorials PART 16 Genes, the Environment, and Disease National Human Genome Research Institute http://www.genome.gov/ An institute of the National Institutes of Health focused on genomic and genetic research; links providing information about the human genome sequence, genomes of other organisms, genomic research, and legislation Catalog of Published Genome-Wide Association Studies https://www.ebi.ac.uk/gwas/ Published high-resolution genome-wide association studies (GWAS) Ensembl Genome browser http://www.ensembl.org Maps and sequence information of eukaryotic genomes Online Mendelian Inheritance in Man http://www.ncbi.nlm.nih.gov/omim Online compendium of Mendelian disorders and human genes causing genetic disorders American College of Medical Genetics and Genomics http://www.acmg.net/ Extensive links to other databases relevant for the diagnosis, treatment, and prevention of genetic disease American Society of Human Genetics http://www.ashg.org Information about advances in genetic research, professional and public education, and social and scientific policies The Cancer Genome Atlas https://cancergenome.nih.gov/ Comprehensive, multidimensional characterization of the genomic and proteomic landscape of malignancies with high public health impact COSMIC Catalogue of Somatic Mutations in Cancer https://cancer.sanger.ac.uk/cosmic Comprehensive catalogue of somatic mutations in human cancer Genetic Testing Registry https://www.ncbi.nlm.nih.gov/gtr/ International directory of genetic testing laboratories and prenatal diagnosis clinics; reviews and educational materials Genomes Online Database (GOLD) http://www.genomesonline.org/ Information on published and unpublished genomes HUGO Gene Nomenclature http://www.genenames.org/ Gene names and symbols GENECODE https://www.gencodegenes.org/ High-quality reference gene annotation and experimental validation for human and mouse genomes MITOMAP, a human mitochondrial genome database http://www.mitomap.org/ A compendium of polymorphisms and mutations of the human mitochondrial DNA The International Genome Sample Resource (IGSR) http://www.internationalgenome.org Public catalogue of human variation and genotype data from numerous ethnic groups Human Genome Variation Society https://www.hgvs.org/ Collection and documentation of genomic variations including population distribution and phenotypic associations ENCODE http://www.genome.gov/10005107 Encyclopedia of DNA Elements; catalogue of all functional elements in the human genome Dolan DNA Learning Center, Cold Spring Harbor Laboratories http://www.dnalc.org/ Educational material about selected genetic disorders, DNA, eugenics, and genetic origin The Online Metabolic and Molecular Bases of Inherited Disease (OMMBID) http://ommbid.mhmedical.com Online version of the comprehensive text on the metabolic and molecular bases of inherited disease Online Mendelian Inheritance in Animals (OMIA) https://www.omia.org/home/ Online compendium of Mendelian disorders in animals The Jackson Laboratory http://www.jax.org/ Information about murine models and the mouse genome Mouse genome informatics http://www.informatics.jax.org Mouse genome informatics, potential mouse models of human disease, information on phenotypic similarity between mouse models and human patients Note: Databases are evolving constantly. Pertinent information may be found by using links listed in the few selected databases. the entire human genome and culminated in the completion of the DNA sequence for the last of the human chromosomes in 2006. The scope of a WGS analysis can be illustrated by the following analogy. Human DNA consists of ~3 billion base pairs (bp) of DNA per haploid genome, which is nearly 1000-fold greater than that of the Escherichia coli genome. If the human DNA sequence were printed out, it would correspond to about 120 volumes of Harrison’s Principles of Internal Medicine. In addition to the human genome, the genomes of thousands of organisms have been sequenced completely or partially (Genomes Online Database [GOLD]; Table 479-1). They include, among others, eukaryotes such as the mouse (Mus musculus), Saccharomyces cere­ visiae, Caenorhabditis elegans, and Drosophila melanogaster; bacteria (e.g., E. coli); and archaea, viruses, organelles (mitochondria, chloro­ plasts), and plants (e.g., Arabidopsis thaliana). Genomic information of infectious agents has significant impact for the characterization of infectious outbreaks and epidemics. Other ramifications arising from the availability of genomic data include, among others, (1) the comparison of entire genomes (comparative genomics); (2) the study of large-scale expression of RNAs (functional genomics), proteins (proteomics), or protein families (e.g., the kinome, the complete set of

protein kinases) to detect differences between various tissues in health and disease; (3) the characterization of the variation among individu­ als by establishing catalogues of sequence variations and SNPs; and (4) the identification of genes that play critical roles in the development of polygenic and multifactorial disorders. CHROMOSOMES  The human genome is divided into 23 different chromosomes, including 22 autosomes (numbered 1–22) and the X and Y sex chromosomes (Fig. 479-1). Adult cells are diploid, meaning they contain two homologous sets of 22 autosomes and a pair of sex chromosomes. Females have two X chromosomes (XX), whereas males have one X and one Y chromosome (XY). As a consequence of meiosis, germ cells (sperm or oocytes) are haploid and contain one set of 22 autosomes and one of the sex chromosomes. At the time of fertiliza­ tion, the diploid genome is reconstituted by pairing of the homologous chromosomes from the mother and father. With each cell division (mitosis), chromosomes are replicated, paired, segregated, and divided into two daughter cells. STRUCTURE OF DNA  DNA is a double-stranded helix composed of four different bases: adenine (A), thymidine (T), guanine (G), and cytosine (C). Adenine is paired to thymidine, and guanine is paired

Guanine Cytosine H O O H N N H H C C C O O P C C N N N O– C C H H C N N C N H O H H Thymine Adenine O N H H3C C N H C O O P C C C C N O– C N H H N C C C N N H O T Double-strand DNA without histones G A C A T C G Nucleosome core Histone H2A, H2B, H4 Metaphase chromosome Nucleosome fiber p, short arm Centromere Solenoid q, long arm Telomere Supercoiled chromatin FIGURE 479-1  Structure of chromatin and chromosomes. Chromatin is composed of double-strand DNA that is wrapped around histone and nonhistone proteins forming nucleosomes. The nucleosomes are further organized into solenoid structures. Chromosomes assume their characteristic structure, with short (p) and long (q) arms at the metaphase stage of the cell cycle. to cytosine, by hydrogen bond interactions that span the double helix (Fig. 479-1). DNA has several remarkable features that make it ideal for the transmission of genetic information. It is relatively stable, and the double-stranded nature of DNA and its feature of strict base-pair complementarity permit faithful replication during cell division. Complementarity also allows the transmission of genetic information from DNA → RNA → protein (Fig. 479-2). mRNA is encoded by the so-called sense or coding strand of the DNA double helix and is trans­ lated into proteins by ribosomes. The presence of four different bases provides surprising genetic diversity. In the protein-coding regions of genes, the DNA bases are arranged into codons, a triplet of bases that specifies a particular amino acid. It is possible to arrange the four bases into 64 different triplet codons (43). Each codon specifies 1 of the 20 different amino acids, or a regulatory signal such as initiation and stop of translation. Because there are more codons than amino acids, the genetic code is degenerate; that is, most amino acids can be specified by several dif­ ferent codons. By arranging the codons in different combinations and in various lengths, it is possible to generate the tremendous diversity of primary protein structure. DNA length is normally measured in units of 1000 bp (kilobases, kb) or 1,000,000 bp (megabases, Mb). In the human genome, only ~1% of DNA accounts for protein-coding sequences. The noncoding DNA has multiple functional and structural roles including (1) sequences that form introns; (2) regulatory elements (promoters, enhancers, silencers, insulators); (3) sequences that generate RNAs that do not code for pro­ teins; (4) centromeres and telomeres; (5) regions defining chromatin

structure and histone modifications; (6) various forms of repetitive sequences of variable length; and (7) pseudogenes and regions without currently discernible functional or structural roles (Fig. 479-1).

GENES  A gene is a functional unit that is regulated by transcription (see below) and encodes an RNA product, which is most commonly, but not always, translated into a protein that exerts activity within or outside the cell (Fig. 479-3). Historically, genes were identified because they conferred specific traits that are transmitted from one generation to the next. Now, they are frequently characterized based on expression in various tissues (transcriptome). The size of genes is quite broad; some genes are only a few hundred base pairs, whereas others are extraordinarily large (2.3 Mb). The number of genes greatly underestimates the complexity of genetic expression, because single genes can generate multiple spliced messenger RNA (mRNA) products (isoforms), which are translated into proteins that are subject to com­ plex posttranslational modification such as phosphorylation. Exons refer to the portion of genes that are eventually spliced together to form mRNA. Introns refer to the spacing regions between the exons that are spliced out of precursor RNAs during RNA processing. The gene locus also includes regions that are necessary to control its expression (Fig. 479-2). Current estimates predict roughly 20,000 protein-coding genes in the human genome with an average of about four different coding transcripts per gene. Remarkably, the exome only constitutes 1.14% of the genome. Of note, the number of transcripts is close to 200,000 and includes thousands of noncoding transcripts (RNAs of various length such as microRNAs [miRNA] and long noncoding RNAs [lncRNA]). These noncoding RNAs are involved in numerous cellular processes such as transcriptional and posttranscriptional regulation of gene expression, chromatin remodeling, and protein trafficking, among others. Not surprisingly, aberrant expression and/or mutations in these RNAs play a pathogenic role in numerous diseases. CHAPTER 479 Principles of Human Genetics Histone H1 SINGLE-NUCLEOTIDE POLYMORPHISMS  On average, a typical genome differs from the reference human genome at 4 to 5 million sites. Some of these variants have no impact on health, whereas others may increase or lower the risk for developing a specific disease. Remarkably, however, the primary DNA sequence of humans has ~99.9% similarity compared to that of any other human. An SNP is a variation of a single base pair in the DNA. Across human populations from distinct ethnic backgrounds, there are more than 1 billion validated SNPs (Fig. 479-3). SNPs are the most common type of sequence variation and account for

90% of all sequence variation. They occur on average every 100–300 bases and are the major source of genetic heterogeneity. SNPs that are in proximity are inherited together (e.g., they are linked) and are referred to as haplotypes (Fig. 479-4). Haplotype maps describe the nature and location of these SNP haplotypes and how they are distrib­ uted among individuals within and among populations, information that has been facilitating GWAS designed to elucidate the complex interactions among multiple genes and lifestyle factors in multifactorial disorders (see below). Moreover, haplotype analyses are useful to assess variations in responses to medications (pharmacogenomics) and envi­ ronmental factors, as well as the prediction of disease predisposition. COPY NUMBER VARIATIONS  Copy number variations (CNVs) are relatively large genomic regions (1 kb to several Mb) that have been duplicated or deleted on certain chromosomes and hence alter the dip­ loid status of the DNA (Fig. 479-5). It has been estimated that 5–10% of the genome can display CNVs. When comparing the genomes of two individuals, ~0.4–0.8% of their genomes differ in terms of CNVs scattered throughout the genome. Some CNVs can increase or decrease gene dosage, potentially leading to detrimental effects if essential genes are impacted. Of note, de novo CNVs have been observed between monozygotic twins, who otherwise have identical genomes. Replication of DNA and Mitosis  Genetic information in DNA is transmitted to daughter cells under two different circumstances: (1) somatic cells divide by mitosis, allowing the diploid (2n) genome to repli­ cate itself completely in conjunction with cell division; and (2) germ cells (sperm and ova) undergo meiosis, a process that enables the reduction of the diploid (2n) set of chromosomes to the haploid state (1n).

Steroids Ca2+ Cytokines Growth factors Hormones Light UV-light Mechanical stress PART 16 Genes, the Environment, and Disease Regulation of Gene Expression Enhancer Silencer Nuclear receptor Nuclear receptor HAT CoA CBP TAF GTF CREB CREB TBP Transcription factor CRE RE CAAT TATA

mRNA Processing Posttranslational Processing FIGURE 479-2  Flow of genetic information. Multiple extracellular signals activate intracellular signal cascades that result in altered regulation of gene expression through the interaction of transcription factors with regulatory regions of genes. RNA polymerase transcribes DNA into RNA that is processed to mRNA by excision of intronic sequences. The mRNA is translated into a polypeptide chain to form the mature protein after undergoing posttranslational processing. CBP, CREB-binding protein; CoA, co-activator; COOH, carboxyterminus; CRE, cyclic AMP responsive element; CREB, cyclic AMP response element–binding protein; GTF, general transcription factors; HAT, histone acetyl transferase; NH2, aminoterminus; RE, response element; TAF, TBP-associated factors; TATA, TATA box; TBP, TATA-binding protein. Prior to mitosis, cells exit the resting, or G0 state, and enter the cell cycle. After traversing a critical checkpoint in G1, cells undergo DNA synthesis (S phase), during which the DNA in each chromosome is rep­ licated, yielding two pairs of sister chromatids (2n → 4n). The process of DNA synthesis requires stringent fidelity in order to avoid transmit­ ting errors to subsequent generations of cells. Genetic abnormalities of DNA mismatch/repair include xeroderma pigmentosum, Bloom’s syndrome, ataxia telangiectasia, and hereditary nonpolyposis colon cancer (HNPCC), among others. Many of these disorders strongly predispose to neoplasia because of the rapid acquisition of additional mutations (Chap. 76). After completion of DNA synthesis, cells enter G2 and progress through a second checkpoint before entering mitosis. At this stage, the chromosomes condense and are aligned along the equatorial plate at metaphase. The two identical sister chromatids, held together at the centromere, divide and migrate to opposite poles of the cell. After formation of a nuclear membrane around the two separated sets of chromatids, the cell divides and two daughter cells are formed, thus restoring the diploid (2n) state. Assortment and Segregation of Genes During Meiosis  Meiosis occurs only in germ cells of the gonads. It shares certain features with mitosis but involves two distinct steps of cell division that reduce the chromosome number to the haploid state. In addition, there is active recombination that generates genetic diversity. During the first cell division, two sister chromatids (2n → 4n) are formed for each chro­ mosome pair and there is an exchange of DNA between homologous paternal and maternal chromosomes. This process involves the forma­ tion of chiasmata, structures that correspond to the DNA segments that cross over between the maternal and paternal homologues (Fig. 479-6). Usually there is at least one crossover on each chromosomal arm; recombination occurs more frequently in female meiosis than in male meiosis. Subsequently, the chromosomes segregate randomly. Because there are 23 chromosomes, there exist 223 (>8 million) possible com­ binations of chromosomes. Together with the genetic exchanges that

Cytoplasm Nucleus RNA polymerase II DNA Transcription

hRNA

–Poly-A Tail 5′ -Cap mRNA Translation Protein NH2– –COOH occur during recombination, chromosomal segregation generates tre­ mendous diversity, and each gamete is genetically unique. The process of recombination and the independent segregation of chromosomes provide the foundation for performing linkage analyses, whereby one attempts to correlate the inheritance of certain chromosomal regions (or linked genes) with the presence of a disease or genetic trait (see below). After the first meiotic division, which results in two daughter cells (2n), the two chromatids of each chromosome separate during a sec­ ond meiotic division to yield four gametes with a haploid state (1n). When the egg is fertilized by sperm, the two haploid sets are combined, thereby restoring the diploid state (2n) in the zygote. ■ ■REGULATION OF GENE EXPRESSION Regulation by Transcription Factors  The expression of genes is regulated by DNA-binding proteins that activate or repress tran­ scription. The number of DNA sequences and transcription factors that regulate transcription is much greater than originally anticipated. Most genes contain at least 15–20 discrete regulatory elements within 300 bp of the transcription start site. This densely packed promoter region often contains binding sites for ubiquitous transcription fac­ tors. However, factors involved in cell-specific expression may also bind to these sequences. Key regulatory elements may also reside at a large distance from the proximal promoter. The globin and the immunoglobulin genes, for example, contain locus control regions that are several kilobases away from the structural sequences of the gene. Specific groups of transcription factors that bind to these promoter and enhancer sequences provide a combinatorial code for regulating transcription. In this manner, relatively ubiquitous factors interact with more restricted factors to allow each gene to be expressed and regulated in a unique manner that is dependent on developmental state, cell type, and numerous extracellular stimuli. Regulatory factors also bind within the gene itself, particularly in the intronic regions. The

SNPs (612,977) Known Genes (1260) p22.3 p22.1 p21.3 p15.3 p15.1 p14.3 p13 p12.3 p14.1 p21.1 Chromosome 7 116.90 Mb 116.94 Mb 116.98 Mb 117.02 Mb 117.06 Mb CFTR Gene SNPs Intronic Splice site Coding region, synonymous Coding region, nonsynonymous FIGURE 479-3  Chromosome 7 is shown with the density of single nucleotide polymorphisms (SNPs) and genes above. A 200-kb region in 7q31.2 containing the CFTR gene is shown below. The CFTR gene contains 27 exons. Close to 2000 mutations in this gene have been found in patients with cystic fibrosis. A 20-kb region encompassing exons 4–9 is shown further amplified to illustrate the SNPs in this region. FIGURE 479-4  The origin of haplotypes is due to repeated recombination events occurring in multiple generations. Over time, this leads to distinct haplotypes. These haplotype blocks can often be characterized by genotyping selected Tag single nucleotide polymorphisms (SNPs), an approach that facilitates performing genomewide association studies (GWAS).

CHAPTER 479 Principles of Human Genetics q31.2 q31.31 q31.33 q32.1 p21.13 q11.21 q11.22 q11.23 q21.11 q22.1 q22.3 q35 q36.1 q36.3 p12.1 p11.2 q21.3 q31.1 q33 q34 200 Kb 20 Kb Coding region, frameshift transcription factors that bind to DNA represent only the first level of regulatory control. Other proteins—co-activators and co-repressors— interact with the DNA-binding transcription factors to generate large regulatory complexes. These complexes are subject to control by numerous cell-signaling pathways and enzymes, leading to phosphory­ lation, acetylation, sumoylation, and ubiquitination. Ultimately, the recruited transcription factors interact with, and stabilize, components of the basal transcription complex that assembles at the site of the TATA box and initiator region. This basal transcription factor complex consists of >30 different proteins. Gene transcription occurs when RNA polymerase begins to synthesize RNA from the DNA template. A large number of identified genetic diseases involve transcription factors (Table 479-2). The field of functional genomics is based on the concept that under­ standing alterations of gene expression under various physiologic and pathologic conditions provides insight into the underlying functional role of the gene. The ENCODE (Encyclopedia of DNA Elements) project aims at identifying and annotating all functional sequences in the human genome. By revealing specific gene expression profiles, this knowledge can be of diagnostic and therapeutic relevance. The large-scale study of expression profiles is referred to as transcriptomics because the complement of mRNAs transcribed by the cellular genome is called the transcriptome. Most studies of gene expression have focused on the regulatory DNA elements of genes that control transcription. However, it must be

Normal Duplicated Area PART 16 Genes, the Environment, and Disease Deleted Area

log2 (ratio)

–1 –2 Chromosome 8 FIGURE 479-5  Copy number variations (CNV) encompass relatively large regions of the genome that have been duplicated or deleted. Chromosome 8 is shown with a CNV detected by genomic hybridization. An increase in the signal strength indicates a duplication, whereas a decrease reflects a deletion of the covered chromosomal regions. emphasized that gene expression requires a series of steps, including mRNA processing, protein translation, and posttranslational modifica­ tions, all of which are actively regulated (Fig. 479-2). Epigenetic Regulation of Gene Expression (see Chap. 497) 

Epigenetics describes mechanisms and phenotypic changes that are not a result of variation in the primary DNA nucleotide sequence but are caused by secondary modifications of DNA or histones. These modifi­ cations include heritable changes such as X-inactivation and imprint­ ing, but they can also result from dynamic posttranslational protein modifications in response to environmental influences such as diet, age, or drugs. The epigenetic modifications result in altered expression of individual genes or chromosomal loci encompassing multiple genes. The term epigenome describes the constellation of covalent modifica­ tions of DNA and histones that impact chromatin structure, as well as noncoding transcripts that modulate the transcriptional activity of DNA. Although the primary DNA sequence is usually identical in all cells of an organism, sex- and tissue-specific changes in the epigenome contribute to determining the transcriptional signature of a cell (tran­ scriptome) and hence the protein expression profile (proteome). Mechanistically, DNA and histone modifications can result in the activation or silencing of gene expression (Fig. 479-7). DNA methyla­ tion involves the addition of a methyl group to cytosine residues. This is usually restricted to cytosines of CpG dinucleotides, which are abun­ dant throughout the genome. Methylation of these dinucleotides is thought to represent a defense mechanism that minimizes the expres­ sion of sequences that have been incorporated into the genome such as retroviral sequences. CpG dinucleotides also exist in so-called CpG islands, stretches of DNA characterized by a high CG content, which are found in the majority of human gene promoters. CpG islands in promoter regions are typically unmethylated, and the lack of methyla­ tion facilitates transcription. Histone methylation involves the addition of a methyl group to lysine residues in histone proteins (Fig. 479-7). Depending on the specific lysine residue being methylated, this alters chromatin configuration, making it either more open or tightly packed. Acetylation of histone proteins is another well-characterized mechanism that results in an

open chromatin configuration, which favors active transcription. Acetylation is generally more dynamic than methylation, and many transcriptional activation complexes have histone acetylase activity, whereas repressor complexes often contain deacetylases and remove acetyl groups from histones. Other histone modifications include, among others, phosphorylation and sumoylation. Furthermore, noncoding RNAs and RNA regula­ tory networks that bind to DNA have a significant impact on transcriptional activity. Physiologically, epigenetic mechanisms play an important role in several instances. For example, X-inactivation refers to the relative silencing of one of the two X chromosome copies present in females. The inactivation process is a form of dosage compensation such that females (XX) do not generally express twice as many X-chromosomal gene products as males (XY). In a given cell, the choice of which chromo­ some is inactivated occurs randomly in humans. But once the maternal or paternal X chromosome is inactivated, it will remain inactive, and this infor­ mation is transmitted with each cell division. The X-inactive specific transcript (Xist) gene encodes a long non-coding RNA (lncRNA) that mediates gene silencing on one of the X chromosomes. The inac­ tive X chromosome is highly methylated and has low levels of histone acetylation. While the majority of X-chromosomal genes are silenced by X-inactivation, ~15% escape inactivation and are expressed. Epigenetic gene inactivation also occurs on selected chromosomal regions of autosomes, a phenomenon referred to as genomic imprinting. Through this mechanism, a small subset of genes is only expressed in a monoallelic fashion. Imprinting is A a A a a A B b B b b B C c C c c C D d D d d D Chromatids Homologous chromosomes A a a A A a a A A a a A B b b B B b b B B b b B c c C C c c C C C c c C d d D D D d d D D d d D Crossover Double crossover No crossover A a a A A a A a A a a A B b b B B b B b B b b B c c C C c c C C C c c C D d D d d d D D D d d D Recombination in gametes Recombination in gametes No recombination in gametes FIGURE 479-6  Crossing-over and genetic recombination. During chiasma formation, either of the two sister chromatids on one chromosome pairs with one of the chromatids of the homologous chromosome. Genetic recombination occurs through crossing-over and results in recombinant and nonrecombinant chromosome segments in the gametes. Together with the random segregation of the maternal and paternal chromosomes, recombination contributes to genetic diversity and forms the basis of the concept of linkage.

TABLE 479-2  Selected Examples of Diseases Caused by Mutations and Rearrangements in Transcription Factors TRANSCRIPTION FACTOR CLASS EXAMPLE ASSOCIATED DISORDER Nuclear receptors Androgen receptor Complete or partial androgen insensitivity (recessive missense mutations) Spinobulbar muscular atrophy (CAG repeat expansion) Zinc finger proteins WT1 WAGR syndrome: Wilms’ tumor, aniridia, genitourinary malformations, mental retardation Basic helix-loop-helix MITF Waardenburg’s syndrome type 2A Homeobox IPF1 Maturity onset of diabetes mellitus type 4 (monoallelic mutation/ haploinsufficiency) Pancreatic agenesis (biallelic mutations) Leucine zipper Retina leucine zipper (NRL) Autosomal dominant retinitis pigmentosa High mobility group (HMG) proteins SRY Sex reversal Forkhead HNF4α, HNF1α, HNF1β Maturity onset of diabetes mellitus types 1, 3, 5 Paired box PAX3 Waardenburg’s syndrome types 1 and 3 T-box TBX5 Holt-Oram syndrome (thumb anomalies, atrial or ventricular septum defects, phocomelia) Cell cycle control proteins P53 Li-Fraumeni syndrome, other cancers Co-activators CREB binding protein (CREBBP) Rubinstein-Taybi syndrome General transcription factors TATA-binding protein (TBP) Spinocerebellar ataxia 17 (CAG expansion) Transcription elongation factor VHL von Hippel–Lindau syndrome (renal cell carcinoma, pheochromocytoma, pancreatic tumors, hemangioblastomas) Autosomal dominant inheritance, somatic inactivation of second allele (Knudson two-hit model) Runt RUNX1 Familial thrombocytopenia with propensity to acute myelogenous leukemia Chimeric proteins due to translocations PML-RAR Acute promyelocytic leukemia t(15;17)(q22;q11.2-q12) translocation Abbreviations: CREB, cAMP responsive element–binding protein; HNF, hepatocyte nuclear factor; PML, promyelocytic leukemia; RAR, retinoic acid receptor; SRY, sexdetermining region Y; VHL, von Hippel–Lindau. heritable and leads to the preferential expression of one of the parental alleles, which deviates from the usual biallelic expression seen for the majority of genes. Remarkably, imprinting can be limited to a subset of tissues. Imprinting is mediated through DNA methylation of one of the alleles. The epigenetic marks on imprinted genes are maintained throughout life, but during zygote formation, they are activated or inactivated in a sex-specific manner (imprint reset) (Fig. 479-8), which allows a differential expression pattern in the fertilized egg and the sub­ sequent mitotic divisions. Appropriate expression of imprinted genes is important for normal development and cellular functions. Imprinting defects and uniparental disomy, which is the inheritance of two chro­ mosomes or chromosomal regions from the same parent, are the cause of several developmental disorders such as Beckwith-Wiedemann syn­ drome, Silver-Russell syndrome, Angelman’s syndrome, and PraderWilli syndrome (see below). Monoallelic loss-of-function mutations in the GNAS1 gene lead to Albright’s hereditary osteodystrophy (AHO). Paternal transmission of GNAS1 mutations leads to an isolated AHO

phenotype (pseudopseudohypoparathyroidism), whereas maternal transmission leads to AHO in combination with hormone resistance to parathyroid hormone, thyrotropin, and gonadotropins (pseudohy­ poparathyroidism type IA). These phenotypic differences are explained by tissue-specific imprinting of the GNAS1 gene, which is expressed primarily from the maternal allele in the thyroid, gonadotropes, and the proximal renal tubule. In most other tissues, the GNAS1 gene is expressed biallelically. In patients with isolated renal resistance to parathyroid hormone (pseudohypoparathyroidism type IB), defective imprinting of the GNAS1 gene results in decreased Gsα expression in the proximal renal tubules. Rett syndrome is an X-linked dominant disorder resulting in developmental regression and stereotypic hand movements in affected girls. It is caused by mutations in the MECP2 gene, which encodes a methyl-binding protein. The ensuing aberrant methylation results in abnormal gene expression in neurons, which are otherwise normally developed.

CHAPTER 479 Principles of Human Genetics Remarkably, epigenetic differences also occur among monozygotic twins. Although twins are epigenetically indistinguishable during the early years of life, older monozygotic twins exhibit differences in the overall content and genomic distribution of DNA methylation and histone acetylation, which would be expected to alter gene expression in various tissues. In cancer, the epigenome is characterized by simultaneous losses and gains of DNA methylation in different genomic regions, as well as repressive histone modifications. Hyper- and hypomethylation are associated with mutations in genes that control DNA methylation. Hypomethylation is thought to remove normal control mechanisms that prevent expression of repressed DNA regions. It is also associated with genomic instability. Hypermethylation, in contrast, results in the silencing of CpG islands in promoter regions of genes, including tumor-suppressor genes. Epigenetic alterations are more easily revers­ ible compared to genetic changes; modification of the epigenome with demethylating agents and histone deacetylases is being used in the treatment of various malignancies. ■ ■TRANSMISSION OF GENETIC DISEASE Origins and Types of Mutations  The term mutation or variant is used to designate the process of generating genetic variations as well as the effect of these alterations. A mutation can be defined as any change in the primary nucleotide sequence of DNA regardless of its functional consequences, although it often has a negative connotation. There has been a shift towards using the more neutral term variant to describe sequence changes, and it is now recommended by several professional organizations and guidelines instead of mutation. Some variants may be lethal, others are less deleterious, and some may confer an evolu­ tionary advantage. Variations can occur in the germline (sperm or oocytes); these can be transmitted to progeny. Alternatively, variants can occur during embryogenesis or in somatic tissues. Variations that occur during development lead to mosaicism, a situation in which tis­ sues are composed of cells with different genetic constitutions. If the germline is mosaic, a mutation can be transmitted to some progeny but not others, which sometimes leads to confusion in assessing the pat­ tern of inheritance. Somatic mutations that do not affect cell survival can sometimes be detected because of variable phenotypic effects in tissues (e.g., pigmented lesions in McCune-Albright syndrome). Other somatic mutations are associated with neoplasia because they confer a growth advantage to cells. Epigenetic events may also influence gene expression or facilitate genetic damage. With the exception of triplet nucleotide repeats, which can expand (see below), variations are usu­ ally stable. Sequence variants are structurally diverse—they can involve the entire genome, as in triploidy (one extra set of chromosomes), or gross numerical or structural alterations in chromosomes or individual genes. Large deletions may affect a portion of a gene or an entire gene, or, if several genes are involved, they may lead to a contiguous gene syndrome. Unequal crossing-over between homologous genes can result in fusion gene mutations, as illustrated by color blindness. Varia­ tions involving single nucleotides are referred to as point mutations.

Methylated DNA Cytosine Methylation NH2 NH2 N CH3 N PART 16 Genes, the Environment, and Disease O N O N Histone Acetylation Unmethylated DNA Histone Modifications Acetylation Phosphorylation Methylation Transcription NH2 FIGURE 479-7  Epigenetic modifications of DNA and histones. Methylation of cytosine residues is associated with gene silencing. Methylation of certain genomic regions is inherited (imprinting), and it is involved in the silencing of one of the two X chromosomes in females (X-inactivation). Alterations in methylation can also be acquired, e.g., in cancer cells. Covalent posttranslational modifications of histones play an important role in altering DNA accessibility and chromatin structure and hence in regulating transcription. Histones can be reversibly modified in their aminoterminal tails, which protrude from the nucleosome core particle, by acetylation of lysine, phosphorylation of serine, methylation of lysine and arginine residues, and sumoylation. Acetylation of histones by histone acetylases (HATs), e.g., leads to unwinding of chromatin and accessibility to transcription factors. Conversely, deacetylation by histone deacetylases (HDACs) results in a compact chromatin structure and silencing of transcription. Substitutions are called transitions if a purine is replaced by another purine base (A ↔ G) or if a pyrimidine is replaced by another pyrimi­ dine (C ↔ T). Changes from a purine to a pyrimidine, or vice versa, are referred to as transversions. If the DNA sequence change occurs in a coding region and alters an amino acid, it is called a missense muta­ tion. Depending on the functional consequences of such a missense mutation, amino acid substitutions in different regions of the protein can lead to distinct phenotypes. Variants can occur in all domains of a gene (Fig. 479-9). A point mutation occurring within the coding region leads to an amino acid substitution if the codon is altered (Fig. 479-10). Point mutations that introduce a premature stop codon result in a truncated or missing pro­ tein. Large deletions may affect a portion of a gene or an entire gene, whereas small deletions and insertions alter the reading frame if they do not represent a multiple of three bases. These “frameshift” muta­ tions, also designated as amphigoric amino acid changes, lead to an entirely altered carboxy terminus. Mutations in intronic sequences or in exon junctions may destroy or create splice donor or splice acceptor sites. Variants may also be found in the regulatory sequences of genes, resulting in reduced or enhanced gene transcription. Certain DNA sequences are particularly susceptible to mutagenesis. Successive pyrimidine residues (e.g., T-T or C-C) are subject to the formation of ultraviolet light–induced photoadducts. If these pyrimi­ dine dimers are not repaired by the nucleotide excision repair pathway, mutations will be introduced after DNA synthesis. The dinucleotide C-G, or CpG, is also a hot spot for a specific type of alteration. In this case, methylation of the cytosine is associated with an enhanced rate of deamination to uracil, which is then replaced with thymine. This C → T transition (or G → A on the opposite strand) accounts for at least one-third of point mutations associated with polymorphisms and mutations. In addition to the fact that certain types of mutations (C → T or G → A) are relatively common, the nature of the genetic code also results in overrepresentation of certain amino acid substitutions. Polymorphisms are sequence variations that have a frequency of at least 1%. Usually, they do not result in a perceptible phenotype; the term variant is now preferred for the description of these sequence changes because allele frequency and functional consequences are often not known. Often, they consist of single base-pair substitutions that do not alter the protein coding sequence because of the degenerate

nature of the genetic code (synonymous polymorphism), although it is possible that some might alter mRNA stability, transla­ tion, or the amino acid sequence (non­ synonymous polymorphism) (Fig. 479-10). The detection of sequence variants poses a practical problem because it is often unclear whether it creates a change with functional consequences or a benign variation. In this situation, the sequence alteration is also described as variant of unknown significance (VUS). Methylation MUTATION RATES  Mutations represent an important cause of genetic diversity as well as disease. Mutation rates are difficult to determine in humans because many muta­ tions are silent and because testing is often not adequate to detect the phenotypic con­ sequences. Mutation rates vary in different genes but are estimated to occur at a rate of ~10−10/bp per cell division. Germline muta­ tion rates (as opposed to somatic muta­ tions) are relevant in the transmission of genetic disease. Because the population of oocytes is established very early in develop­ ment, only ~20 cell divisions are required for completed oogenesis, whereas sper­ matogenesis involves ~30 divisions by the time of puberty and 20 cell divisions each year thereafter. Consequently, the probability of acquiring new point mutations is much greater in the male germline than the female germ­ line, in which rates of aneuploidy are increased. Thus, the incidence of new point mutations in spermatogonia increases with paternal age (e.g., achondrodysplasia, Marfan’s syndrome, neurofibromatosis). It is estimated that about 1 in 10 sperm carries a new deleterious mutation. The rates for new mutations are calculated most readily for autosomal dominant and X-linked disorders and are ~10−5−10−6/locus per genera­ tion. Because most monogenic diseases are relatively rare, new muta­ tions account for a significant fraction of cases. This is important in the context of genetic counseling because a new mutation can be transmit­ ted to the affected individual, but this does not necessarily imply that the parents are at risk to transmit the disease to other children. An exception to this is when the new mutation occurs early in germline development, leading to gonadal mosaicism. UNEQUAL CROSSING-OVER  Normally, DNA recombination in germ cells occurs with remarkable fidelity to maintain the precise junction sites for the exchanged DNA sequences (Fig. 479-6). However, mispair­ ing of homologous sequences leads to unequal crossover, with gene duplication on one of the chromosomes and gene deletion on the other chromosome. A significant fraction of growth hormone (GH) gene deletions, for example, involve unequal crossing-over (Chap. 391). The GH gene is a member of a large gene cluster that includes a GH vari­ ant gene as well as several structurally related chorionic somatomam­ motropin genes and pseudogenes (highly homologous but functionally inactive relatives of a normal gene). Because such gene clusters con­ tain multiple homologous DNA sequences arranged in tandem, they are particularly prone to undergo recombination and, consequently, gene duplication or deletion. Duplication of the PMP22 gene because of unequal crossing-over results in increased gene dosage and type IA Charcot-Marie-Tooth disease. In contrast, unequal crossing-over resulting in deletion of PMP22 causes a distinct neuropathy called hereditary neuropathy with liability to pressure palsies (HNPP) (Chap. 457). Glucocorticoid-remediable aldosteronism (GRA) is caused by a gene fusion or rearrangement involving the genes that encode aldo­ sterone synthase (CYP11B2) and steroid 11β-hydroxylase (CYP11B1), normally arranged in tandem on chromosome 8q. These two genes are 95% identical, predisposing to gene duplication and deletion by

Maternal somatic cell mat pat Active unmethylated Inactive methylated Inactive methylated Germline development: Imprint reset Maternal germline Paternal germline mat pat Active unmethylated Active unmethylated Inactive methylated Zygote pat mat Inactive methylated Active unmethylated FIGURE 479-8  A few genomic regions are imprinted in a parent-specific fashion. The unmethylated chromosomal regions are actively expressed, whereas the methylated regions are silenced. In the germline, the imprint is reset in a parent-specific fashion: both chromosomes are unmethylated in the maternal (mat) germline and methylated in the paternal (pat) germline. In the zygote, the resulting imprinting pattern is identical with the pattern in the somatic cells of the parents. unequal crossing-over. The rearranged gene product contains the regulatory regions of 11β-hydroxylase fused to the coding sequence of aldosterone synthetase. Consequently, the latter enzyme is expressed in the adrenocorticotropic hormone (ACTH)–dependent zona fasciculata of the adrenal gland, resulting in overproduction of mineralocorticoids and hypertension (Chap. 398). Gene conversion refers to a nonreciprocal exchange of homologous genetic information. It has been used to explain how an internal portion of a gene is replaced by a homologous segment copied from another allele or locus; these genetic alterations may range from a few nucleotides to a few thousand nucleotides. As a result of gene conver­ sion, it is possible for short DNA segments of two chromosomes to be identical, even though these sequences are distinct in the parents. A practical consequence of this phenomenon is that nucleotide substitu­ tions can occur during gene conversion between related genes, often altering the function of the gene. In disease states, gene conversion often involves intergenic exchange of DNA between a gene and a related pseudogene. For example, the 21-hydroxylase gene (CYP21A2) is adjacent to a nonfunctional pseudogene (CYP21A1P). Many of the

nucleotide substitutions that are found in the CYP21A2 gene in patients with congenital adrenal hyperplasia correspond to sequences that are present in the CYP21A1P pseu­ dogene, suggesting gene conversion as one cause of mutagenesis. In addition, mitotic gene conversion has been suggested as a mechanism to explain revertant mosaicism in which an inherited mutation is “corrected” in certain cells. For example, patients with autosomal recessive generalized atrophic benign epidermolysis bullosa have acquired reverse mutations in one of the two mutated COL17A1 alleles, leading to clinically unaf­ fected patches of skin.

Paternal somatic cell pat mat CHAPTER 479 Active unmethylated Principles of Human Genetics INSERTIONS AND DELETIONS  Although many instances of insertions and deletions occur as a consequence of unequal cross­ ing-over, there is also evidence for internal duplication, inversion, or deletion of DNA sequences. The fact that certain deletions or insertions appear to occur repeatedly as independent events indicates that specific regions within the DNA sequence predispose to these errors. For example, certain regions of the DMD gene, which encodes dystrophin, appear to be hot spots for deletions and result in muscular dystrophy (Chap. 460). Some regions within the human genome are rear­ rangement hot spots and lead to CNVs. pat mat Inactive methylated ERRORS IN DNA REPAIR  Because mutations caused by defects in DNA repair accumulate as somatic cells divide, these types of muta­ tions are particularly important in the con­ text of neoplastic disorders. Several genetic disorders involving DNA repair enzymes underscore their importance. Patients with xeroderma pigmentosum have defects in DNA damage recognition or in the nucleo­ tide excision and repair pathway (Chap. 81). Exposed skin is dry and pigmented and is extraordinarily sensitive to the mutagenic effects of ultraviolet irradiation. Variants in more than 10 different genes have been shown to cause the different forms of xero­ derma pigmentosum. Ataxia-telangiectasia is a multisystem disorder that includes progressive neuro­ degenerative cerebellar ataxia, immunologic defects, telangiectatic lesions, lymphomas and leukemias, and hyper­ sensitivity to ionizing radiation (Chap. 450). The discovery of the ataxia-telangiectasia mutated (ATM) gene revealed that it is homolo­ gous to genes involved in DNA repair and control of cell cycle check­ points. Mutations in the ATM gene give rise to defects in meiosis as well as increasing susceptibility to damage from ionizing radiation. Fanconi’s anemia is also associated with an increased risk of multiple acquired genetic abnormalities. It is characterized by diverse congeni­ tal anomalies and a strong predisposition to develop aplastic anemia and acute myelogenous leukemia (Chap. 109). Cells from these patients are susceptible to chromosomal breaks caused by a defect in genetic recombination. It can be caused by mutations in the multiple genes forming the Fanconi’s anemia pathway, which is involved in DNA repair and replication. HNPCC (Lynch syndrome) is characterized by autosomal dominant transmission of colon cancer, young age (<50 years) of presentation, predisposition to lesions in the proximal large bowel, and associated malignancies such as uterine cancer and ovarian cancer. HNPCC is predominantly caused by mutations in one of several different mismatch repair (MMR) genes including MutS

A *

PART 16 Genes, the Environment, and Disease intron 2 intron 1 Poly A Promoter 5'UTR ε Gγ Aγ ψβ β δ –10 kb 0 kb 10 kb 20 kb 30 kb 40 kb 50 kb 60 kb β-Globin Gene Cluster FIGURE 479-9  Point mutations causing a thalassemia as example of allelic heterogeneity. The b-globin gene is located in the globin gene cluster. Point mutations occur in the promoter, the CAP site, the 5′-untranslated region, the initiation codon, each of the three exons, the introns, or the polyadenylation signal. Many mutations introduce missense or nonsense mutations, whereas others cause defective RNA splicing. Not shown here are deletion mutations of the β-globin gene or larger deletions of the globin locus that can also result in thalassemia. , promoter mutations; *, CAP site; , 5’UTR; 1 , initiation codon; , defective RNA processing; , missense and nonsense mutations; A, Poly A signal. homologue 2 (MSH2), MutL homologue 1 and 6 (MLH1, MLH6), MSH6, PMS1, and PMS2 (Chap. 86). These proteins are involved in the detection of nucleotide mismatches and in the recognition of slippedstrand trinucleotide repeats. Germline mutations in these genes lead to microsatellite instability and a high mutation rate in colon cancer. Genetic screening tests for this disorder are now being used for families considered to be at risk. Recognition of HNPCC allows early screen­ ing with colonoscopy and the implementation of prevention strategies using nonsteroidal anti-inflammatory drugs. UNSTABLE DNA SEQUENCES  Trinucleotide repeats may be unstable and expand beyond a critical number. Mechanistically, the expansion is thought to be caused by unequal recombination and slipped mispair­ ing. A premutation represents a small increase in trinucleotide copy number. In subsequent generations, the expanded repeat may increase further in length and result in an increasingly severe phenotype, a process called dynamic mutation (see below for discussion of anticipa­ tion). Trinucleotide expansion was first recognized as a cause of the fragile X syndrome, one of the most common causes of intellectual disability. Other disorders arising from a similar mechanism include Wild-type AA DNA A GCA CTC L CTA S TCG H CAC A GCT R CGG E GAG G GGC E L Silent mutation AA DNA A CGT GCA L CTC L CTA S TCG H CAC A GCT R GAG G GGC E E Missense mutation AA DNA A CCG E GCA L CTC L CTA S TCG H CAC A GCT GAG G GGC E P Nonsense mutation AA DNA A GCA L CTC L CTA S TCG H CAC A GCT R CGG E GAG G GGC 1 bp Deletion with frameshift AA DNA A GCA L CTC L CTA CGC ACG CTC GGG AGG GCG R T L G R A A B FIGURE 479-10  A. Examples of mutations (now commonly referred to as variations). The coding strand is shown with the encoded amino acid sequence. B. Chromatograms of sequence analyses after amplification of genomic DNA by polymerase chain reaction.

Huntington’s disease, X-linked spino­ bulbar muscular atrophy, and myotonic dystrophy. Malignant cells are also char­ acterized by genetic instability, indicat­ ing a breakdown in mechanisms that regulate DNA repair and the cell cycle. Functional Consequences of Mutations  Functionally, muta­ tions can be broadly classified as gain-of-function and loss-of-function mutations. Gain-of-function mutations are typically dominant (e.g., they result in phenotypic alterations when a single allele is affected). Inactivating mutations are usually recessive, and an affected individual is homozygous or compound heterozygous (e.g., carrying two differ­ ent mutant alleles of the same gene) for the disease-causing mutations. Alter­ natively, mutation in a single allele can result in haploinsufficiency, a situation in which one normal allele is not sufficient to maintain a normal phenotype. Hap­ loinsufficiency is a commonly observed mechanism in diseases associated with mutations in transcription factors (Table 479-2). Remarkably, the clinical features among patients with an identical mutation often vary significantly. One mechanism underlying this variability consists in the influence of modifying genes. Haploinsuf­ ficiency can also affect the expression of rate-limiting enzymes. For example, haploinsufficiency in enzymes involved in heme synthesis can cause porphyrias (Chap. 428). An increase in dosage of a gene product may also result in disease, as illustrated by the duplication of the DAX1 (NR0B1) gene in dosagesensitive sex reversal (Chap. 402). Mutation in a single allele can also result in loss of function due to a dominant-negative effect. In this case, the mutated allele interferes with the function of the normal (wild type) gene product by one of several different mechanisms: (1) a mutant protein may interfere with the function of a multimeric protein com­ plex, as illustrated by mutations in type 1 collagen (COL1A1, COL1A2) genes in osteogenesis imperfecta (Chap. 425); (2) a mutant protein may occupy binding sites on proteins or promoter response elements, as illustrated by thyroid hormone resistance β, a disorder in which Wild-type GAA N AAT E GAG S AGC F T T C T A C C D G A C F T T C I A T A C T G C GAA N AAT E GAG S AGC Heterozygous point mutation F T T C I A T A C T G C F T T C T A C C D G A C GAA N AAT E GAG S AGC T A C Y TAA AAT GAG AGC Homozygous point mutation X F T T C T A C C Y T A C F T T C I A T A C T G C AAA ATG AGA GC K M R

inactivated thyroid hormone receptor β binds to target genes and func­ tions as an antagonist of normal receptors (Chap. 394); or (3) a mutant protein can be cytotoxic as in α1 antitrypsin deficiency (Chap. 303) or autosomal dominant neurohypophyseal diabetes insipidus (Chap. 393), in which the abnormally folded proteins are trapped within the endo­ plasmic reticulum and ultimately cause cellular damage. Genotype and Phenotype  •  ALLELES, GENOTYPES, AND HAPLO­ TYPES  An observed trait is referred to as a phenotype; the genetic information defining the phenotype is called the genotype. Alternative forms of a gene or a genetic marker are referred to as alleles. Alleles may be polymorphic variants of nucleic acids that have no apparent effect on gene expression or function. In other instances, these variants may have subtle effects on gene expression, thereby conferring adap­ tive advantages associated with genetic diversity. On the other hand, allelic variants may reflect mutations that clearly alter the function of a gene product. The common Glu6Val (E6V) sickle cell mutation in the β-globin gene and the ΔF508 deletion of phenylalanine (F) in the CFTR gene are examples of allelic variants of these genes that result in disease. Because each individual has two copies of each chromosome (one inherited from the mother and one inherited from the father), an individual can have only two alleles at a given locus. However, there can be many different alleles in the population. The normal or com­ mon allele is usually referred to as wild type. When alleles at a given locus are identical, the individual is homozygous. Inheriting identical copies of a mutant allele occurs in many autosomal recessive disorders, particularly in circumstances of consanguinity or isolated populations. If the alleles are different on the maternal and the paternal copy of the gene, the individual is heterozygous at this locus (Fig. 479-10). If two different mutant alleles are inherited at a given locus, the individual is said to be a compound heterozygote. Hemizygous is used to describe males with a mutation in an X chromosomal gene or a female with a loss of one X chromosomal locus. Genotypes describe the specific alleles at a particular locus. For example, there are three common alleles (E2, E3, E4) of the apolipo­ protein E (APOE) gene. The genotype of an individual can therefore be described as APOE3/4 or APOE4/4 or any other variant. These des­ ignations indicate which alleles are present on the two chromosomes in the APOE gene at locus 19q13.2. In other cases, the genotype might be assigned arbitrary numbers (e.g., 1/2) or letters (e.g., B/b) to distin­ guish different alleles. A haplotype refers to a group of alleles that are closely linked together at a genomic locus (Fig. 479-4). Haplotypes are useful for tracking the transmission of genomic segments within families and for detect­ ing evidence of genetic recombination if the crossover event occurs between the alleles (Fig. 479-6). As an example, various alleles of the histocompatibility locus antigens (HLA) at the major histocompatibil­ ity complex (MHC) on chromosome 6 are used to establish haplotypes associated with certain disease states. For example, 21-hydroxylase deficiency, complement deficiency, and hemochromatosis are each associated with specific HLA haplotypes. It is now recognized that these genes lie in close proximity to the HLA locus, which explains why HLA associations were identified even before the disease genes were cloned and localized. In other cases, specific HLA associations with diseases such as ankylosing spondylitis (HLA-B27) or type 1 diabetes mellitus (HLA-DR4) reflect the role of specific HLA allelic variants in susceptibility to these autoimmune diseases. The characterization of common SNP haplotypes in numerous populations from different parts of the world has provided the necessary tools for association stud­ ies designed to detect genes involved in the pathogenesis of complex disorders (Table 479-1). The presence or absence of certain haplotypes can also be relevant for the customized choice of medical therapies (pharmacogenomics) or may have value for preventive strategies. Genotype-phenotype correlation describes the association of a spe­ cific mutation and the resulting phenotype. The phenotype may differ depending on the location or type of the mutation in some genes. For example, in von Hippel–Lindau disease, an autosomal dominant mul­ tisystem disease that can include renal cell carcinoma, hemangioblas­ tomas, and pheochromocytomas, among others, the phenotype varies

greatly, and the identification of the specific mutation can be clinically useful in order to predict the spectrum of disease manifestations.

ALLELIC HETEROGENEITY  Allelic heterogeneity refers to the fact that different mutations in the same genetic locus can cause an identical or similar phenotype. For example, many different mutations of the β-globin locus can cause β thalassemia (Table 479-3) (Fig. 479-9). In essence, allelic heterogeneity reflects the fact that many different muta­ tions can alter protein structure and function. For this reason, maps of inactivating mutations in genes usually show a near-random distribu­ tion. Exceptions include (1) a founder effect, in which a particular mutation that does not affect reproductive capacity can be traced to a single individual; (2) “hot spots” for mutations, in which the nature of the DNA sequence predisposes to a recurring mutation; and (3) local­ ization of mutations to certain domains that are particularly critical for protein function. Allelic heterogeneity creates a practical problem for genetic testing because one must often examine the entire genetic locus for mutations, because these can differ in each patient. For example, ~2000 variants have been identified in the CFTR gene to date, although some of them are very rare and some may not be disease-causing (Fig. 479-3). Mutational analysis may initially focus on a panel of mutations that are particularly frequent (often taking the ethnic background of the patient into account), but a negative result does not exclude the presence of a mutation elsewhere in the gene. Until recently, muta­ tional analyses tended to focus on the coding region of a gene without considering regulatory and intronic regions. However, disease-causing mutations may be located outside the coding regions, so negative results need to be interpreted with caution. The increasingly wide­ spread access to comprehensive sequencing technologies, WES and WGS, greatly facilitates unbiased mutational analyses. However, com­ prehensive sequencing can result in significant diagnostic challenges because the detection of a sequence alteration is not always sufficient to establish that it has a causal role (VUS). CHAPTER 479 Principles of Human Genetics PHENOTYPIC HETEROGENEITY  Phenotypic heterogeneity occurs when more than one phenotype is caused by allelic mutations (e.g., by different mutations in the same gene) (Table 479-3). For example, laminopathies are monogenic multisystem disorders that result from mutations in the LMNA gene, which encodes the nuclear lamins A and C. Multiple autosomal dominant and recessive disorders are caused by mutations in the LMNA gene. They include several forms of lipodys­ trophies, Emery-Dreifuss muscular dystrophy, progeria syndromes, a form of neuronal Charcot-Marie-Tooth disease (type 2B1), and a group of overlapping syndromes. Remarkably, hierarchical cluster analysis has revealed that the phenotypes vary depending on the position of the mutation (genotype-phenotype correlation). Similarly, identical mutations in the FGFR2 gene can result in very distinct phenotypes: Crouzon’s syndrome (craniofacial synostosis) or Pfeiffer’s syndrome (acrocephalopolysyndactyly). LOCUS OR NONALLELIC HETEROGENEITY AND PHENOCOPIES  Nonal­ lelic or locus heterogeneity refers to the situation in which a similar dis­ ease phenotype results from mutations at different genetic loci (Table 479-3). This often occurs when more than one gene product produces different subunits of an interacting complex or when different genes are involved in the same genetic cascade or physiologic pathway. For example, osteogenesis imperfecta can arise from mutations in two dif­ ferent procollagen genes (COL1A1 or COL1A2) that are located on dif­ ferent chromosomes and can involve multiple other genes (Chap. 425). The effects of inactivating mutations in these two genes are similar because the protein products comprise different subunits of the helical collagen fiber. Similarly, muscular dystrophy syndromes can be caused by mutations in various genes, consistent with the fact that it can be transmitted in an X-linked (Duchenne or Becker), autosomal domi­ nant (limb-girdle muscular dystrophy type 1), or autosomal recessive (limb-girdle muscular dystrophy type 2) manner (Chap. 460). Muta­ tions in the X-linked DMD gene, which encodes dystrophin, are the most common cause of muscular dystrophy. This feature reflects the large size of the gene (2.3 MB, 79 exons), as well as the fact that the phenotype is expressed in hemizygous males because they have only

TABLE 479-3  Selected Examples of Phenotypic Heterogeneity and Locus Heterogeneity Phenotypic Heterogeneity GENE, PROTEIN PHENOTYPE INHERITANCE OMIM LMNA, Lamin A/C Emery-Dreifuss muscular dystrophy (AD) PART 16 Genes, the Environment, and Disease Familial partial lipodystrophy Dunnigan AD

Hutchinson-Gilford progeria AD

Atypical Werner’s syndrome AD

Dilated cardiomyopathy 1A AD

Familial atrial fibrillation 3 AD

Charcot-Marie-Tooth type 2B1 AR

KRAS Noonan’s syndrome AD

Cardio-facio-cutaneous syndrome 1 AD

Locus Heterogeneity PHENOTYPE GENE CHROMOSOMAL LOCATION PROTEIN Familial hypertrophic cardiomyopathy MYH7 14q11.2 Myosin heavy chain beta   Genes encoding sarcomeric proteins TNNT2 1q32.1 Troponin-T2 TPM1 15q22.2 Tropomyosin alpha MYBPC3 11p11q Myosin-binding protein C TNNC1 19q13.4 Troponin 1 MYL2 12q24.11 Myosin light chain 2 MYL3 3p21.31 Myosin light chain 3 TTN 2q31.2 Cardiac titin ACTC 15q14 Cardiac alpha actin MYH6 14q11.2 Myosin heavy chain alpha MYLK2 20q11.21 Myosin light-peptide kinase CAV3 3p25 Caveolin 3   Genes encoding nonsarcomeric proteins MT-T1 Mitochondrial tRNA isoleucine MT-TG Mitochondrial tRNA glycine PRKAG2 7q36.1 AMP-activated protein kinase γ2 subunit DMPK 19q13.32 Myotonin protein kinase (myotonic dystrophy) FRDA 9q21.11 Frataxin (Friedreich’s ataxia) Polycystic kidney disease PKD1 16p13.3 Polycystin 1 (AD) PKD2 4q22.1 Polycystin 2 (AD) PKHD1 6p21.1-p12.2 Fibrocystin/polyductin (AR) Noonan’s syndrome PTPN11 12q24.13 Protein-tyrosine phosphatase 2c KRAS 12p12.1 KRAS Abbreviations: AD, autosomal dominant; AR, autosomal recessive; OMIM, Online Mendelian Inheritance in Man. a single copy of the X chromosome. Dystrophin is associated with a large protein complex linked to the membrane-associated cytoskeleton in muscle. Mutations in several different components of this protein complex can also cause muscular dystrophy syndromes. Although the phenotypic features of some of these disorders are distinct, the phenotypic spectrum caused by mutations in different genes overlaps, thereby leading to nonallelic heterogeneity. It should be noted that mutations in dystrophin are also associated with allelic heterogeneity. For example, mutations in the DMD gene can cause either Duchenne’s or the less severe Becker’s muscular dystrophy, depending on the sever­ ity of the protein defect. Recognition of nonallelic heterogeneity is important for several rea­ sons: (1) the ability to identify disease loci in linkage studies is reduced by including patients with similar phenotypes but different genetic disorders; (2) genetic testing is more complex because several differ­ ent genes need to be considered along with the possibility of different mutations in each of the candidate genes; and (3) novel information is gained about how genes or proteins interact, providing unique insights into molecular physiology. Phenocopies refer to circumstances in which nongenetic conditions mimic a genetic disorder. For example, features of toxin- or druginduced neurologic syndromes can resemble those seen in Huntington’s

AD

disease, and vascular causes of dementia share phenotypic features with familial forms of Alzheimer’s dementia (Chap. 442). As in nonallelic heterogeneity, the presence of phenocopies has the potential to con­ found linkage studies and genetic testing. Patient history and subtle differences in phenotype can often provide clues that distinguish these disorders from related genetic conditions. VARIABLE EXPRESSIVITY AND INCOMPLETE PENETRANCE  The same genetic mutation may be associated with a phenotypic spectrum in different affected individuals, thereby illustrating the phenomenon of variable expressivity. This may include different manifestations of a disorder variably involving different organs (e.g., multiple endocrine neoplasia [MEN]), the severity of the disorder (e.g., cystic fibrosis), or the age of disease onset (e.g., Alzheimer’s dementia). MEN 1 illustrates several of these features. In this autosomal dominant tumor syndrome, affected individuals carry an inactivating germline mutation that is inherited in an autosomal dominant fashion. After somatic inactiva­ tion of the alternate allele (loss of heterozygosity; Knudson two-hit model), patients can develop tumors of the parathyroid gland, endo­ crine pancreas, the pituitary gland, and dermatologic lesions (Chap. 400). However, the pattern of tumors in the different glands, the age at which tumors develop, and the types of hormones produced vary among

affected individuals, even within a given family. In this example, the phenotypic variability arises, in part, because of the requirement for a second somatic mutation in the normal copy of the MEN1 gene, as well as the large array of different cell types that are susceptible to the effects of MEN1 gene mutations. In part, variable expression reflects the influence of modifier genes, or genetic background, on the effects of a particular mutation. Even in identical twins, in whom the genetic constitution is essentially the same, one can occasionally see variable expression of a genetic disease. Interactions with the environment can also influence the course of a disease. For example, the manifestations and severity of hemochro­ matosis can be influenced by iron intake (Chap. 426), and the course of phenylketonuria is affected by exposure to phenylalanine in the diet (Chap. 431). Other metabolic disorders, such as hyperlipidemias and porphyria, also fall into this category. Many mechanisms, including genetic effects and environmental influences, can therefore lead to variable expressivity. In genetic counseling, it is particularly important to recognize this variability, because one cannot always predict the course of disease, even when the mutation is known. Penetrance refers to the proportion of individuals with a mutant genotype that express the phenotype. If all carriers of a mutant express the phenotype, penetrance is complete, whereas it is said to be incomplete or reduced if some individuals do not exhibit features of the phenotype. Dominant conditions with incomplete penetrance are characterized by skipping of generations with unaffected carriers transmitting the mutant gene. For example, hypertrophic obstructive cardiomyopathy (HCM) caused by mutations in the myosin-binding protein C gene is a dominant disorder with clinical features in only a subset of patients who carry the mutation (Chap. 267). Patients who have the mutation, but no evidence of the disease, can still transmit the disorder to subsequent generations. In many conditions with postnatal onset, the proportion of gene carriers who are affected varies with age. Thus, when describing penetrance, one must specify age. For example, for disorders such as Huntington’s disease or familial amyotrophic lateral sclerosis, which present later in life, the rate of penetrance is influenced by the age at which the clinical assessment is performed. Imprinting can also modify the penetrance of a disease. For example, in patients with AHO, mutations in the Gsα subunit (GNAS1 gene) are expressed clinically only in individuals who inherit the mutation from their mother (Chap. 422). SEX-INFLUENCED PHENOTYPES  Certain mutations affect males and females quite differently. In some instances, this is because the gene resides on the X or Y sex chromosomes (X-linked disorders and Y-linked disorders). As a result, the phenotype of mutated X-linked genes will be expressed fully in males but variably in heterozygous females, depending on the degree of X-inactivation and the function of the gene. For example, most heterozygous female carriers of factor VIII deficiency (hemophilia A) are asymptomatic because sufficient factor VIII is produced to prevent a defect in coagulation (Chap. 121). On the other hand, some females heterozygous for the X-linked lipid storage defect caused by α-galactosidase A deficiency (Fabry’s disease) experience mild manifestations of painful neuropathy, as well as other features of the disease (Chap. 429). Because only males have a Y chro­ mosome, mutations in genes such as SRY, which causes male-to-female sex reversal, or DAZ (deleted in azoospermia), which causes abnor­ malities of spermatogenesis, are unique to males (Chap. 402). Other diseases are expressed in a sex-limited manner because of the differential function of the gene product in males and females. Activating mutations in the luteinizing hormone receptor cause dominant male-limited precocious puberty in boys (Chap. 403). The phenotype is unique to males because activation of the receptor induces testosterone production in the testis, whereas it is function­ ally silent in the immature ovary. Biallelic inactivating mutations of the follicle-stimulating hormone (FSH) receptor cause primary ovarian failure in females because the follicles do not develop in the absence of FSH action. In contrast, affected males have a more subtle phenotype, because testosterone production is preserved (allowing sexual maturation) and spermatogenesis is only partially impaired

(Chap. 403). In congenital adrenal hyperplasia, most commonly caused by 21-hydroxylase deficiency, cortisol production is impaired and ACTH stimulation of the adrenal gland leads to increased production of androgenic precursors (Chap. 398). In females, the increased androgen level causes ambiguous genitalia, which can be recognized at the time of birth. In males, the diagnosis may be made on the basis of adrenal insufficiency at birth, because the increased adrenal androgen level does not alter sexual differentiation, or later in childhood, because of the development of precocious puberty. Hemochromatosis is more common in males than in females, pre­ sumably because of differences in dietary iron intake and losses associated with menstruation and pregnancy in females (Chap. 426).

CHAPTER 479 Principles of Human Genetics Chromosomal Disorders  Chromosomal disorders and the tech­ niques used for their characterization have been discussed in detail in previous editions of this textbook. Chromosomal or cytogenetic disorders are caused by numerical (aneuploidy) or structural aberra­ tions (deletions, duplications, translocations, inversions, dicentric and ring chromosomes, Robertsonian translocations) in chromosomes. They occur in ~1% of the general population, in 8% of stillbirths, and in close to 50% of spontaneously aborted fetuses. Indications for cytogenetic and cytogenomic chromosome analyses are summarized in Table 479-4. Contiguous gene syndromes (e.g., large deletions affect­ ing several genes) have been useful for identifying the location of new disease-causing genes. Because of the variable size of gene deletions in different patients, a systematic comparison of phenotypes and loca­ tions of deletion breakpoints allows positions of particular genes to be mapped within the critical genomic region. Monogenic Mendelian Disorders  Monogenic human diseases are frequently referred to as Mendelian disorders because they obey the principles of genetic transmission originally set forth in Gregor Mendel’s classic work. The continuously updated OMIM catalogue lists several thousand of these disorders and provides information about the clinical phenotype, molecular basis, allelic variants, and pertinent animal models (Table 479-1). The mode of inheritance for a given phe­ notypic trait or disease is determined by pedigree analysis. All affected and unaffected individuals in the family are recorded in a pedigree using standard symbols (Fig. 479-11). The principles of allelic seg­ regation, and the transmission of alleles from parents to children, are illustrated in Fig. 479-12. One dominant (A) allele and one recessive (a) allele can display three Mendelian modes of inheritance: autosomal dominant, autosomal recessive, and X-linked. About 65% of human monogenic disorders are autosomal dominant, 25% are autosomal recessive, and 5% are X-linked. Genetic testing is now readily available for the characterization of monogenic disorders and plays an important role in clinical medicine (Chap. 480). TABLE 479-4  Indications for Cytogenetic and Cytogenomic Analysis across the Life Span TIMING OF TESTING INDICATIONS FOR TESTING Prenatal Advanced maternal age Abnormalities on ultrasound Increased risk for genetic disorder on maternal serum screen Neonatal and childhood Multiple congenital anomalies Intellectual disability Autism spectrum disorders Developmental delay Failure to thrive Short stature Disorders of sexual development History of familial chromosomal alteration Cancer Adult Infertility Recurrent miscarriage Familial cancer

Female Male Unknown sex

PART 16 Genes, the Environment, and Disease Multiple siblings Spontaneous abortion Deceased male Affected male Proband Affected female Heterozygous male Heterozygous female Female carrier of X-linked trait I

Mating Consanguineous union II

Monozygotic twins Dizygotic twins FIGURE 479-11  Standard pedigree symbols. AUTOSOMAL DOMINANT DISORDERS  In autosomal dominant disor­ ders, mutations in a single allele are sufficient to cause the disease. In contrast to recessive disorders, in which disease pathogenesis is rela­ tively straightforward because there is a biallelic loss of gene function, dominant disorders can be caused by various disease mechanisms, many of which are unique to the function of the genetic pathway involved. Mechanistically, the mutation may confer constitutive activa­ tion (gain of function), exert a dominant negative effect, or result in loss of function and haploinsufficiency. In autosomal dominant disorders, individuals are affected in suc­ cessive generations; the disease does not occur in the offspring of unaffected individuals. Males and females are affected with equal frequency because the defective gene resides on one of the 22 auto­ somes (Fig. 479-13A). Autosomal dominant mutations alter one of the two alleles at a given locus. Because the alleles segregate randomly at meiosis, the probability that an offspring will be affected is 50%. Unless there is a new germline mutation, an affected individual has an affected parent. Children with a normal genotype do not transmit the disorder. Due to differences in penetrance or expressivity (see above), the clini­ cal manifestations of autosomal dominant disorders may be variable. Because of these variations, it is sometimes challenging to determine the pattern of inheritance. It should be recognized, however, that some individuals acquire a mutated gene from an unaffected parent due to de novo germline muta­ tions. They occur more frequently during later cell divisions in gameto­ genesis, which explains why siblings are rarely affected. As noted, new germline mutations occur more frequently in fathers of advanced age. For example, the average age of fathers with new germline mutations that Aa aa AA Aa Aa Aa aa Aa Aa AA Aa Aa aa 50:50

25:50:25 FIGURE 479-12  Segregation of alleles. Segregation of genotypes in the offspring of parents with one dominant (A) and one recessive (a) allele. The distribution of the parental alleles to their offspring depends on the combination present in the parents. Filled symbols = affected individuals.

Autosomal dominant A B Autosomal recessive Autosomal recessive with pseudodominance X-linked C Mitochondrial D FIGURE 479-13  A. Dominant, B. recessive, C. X-linked, and D. mitochondrial (matrilinear) inheritance. cause Marfan’s syndrome is ~37 years, whereas fathers who transmit the disease by inheritance have an average age of ~30 years. AUTOSOMAL RECESSIVE DISORDERS  In recessive disorders, the mutated alleles result in a complete or partial loss of function. They fre­ quently involve enzymes in metabolic pathways, receptors, or proteins in signaling cascades. In an autosomal recessive disease, the affected individual, who can be of either sex, is a homozygote or compound heterozygote for a single-gene defect. With a few important excep­ tions, autosomal recessive diseases are rare and occur more often in the context of parental consanguinity. The relatively high frequency of cer­ tain recessive disorders such as sickle cell anemia, cystic fibrosis, and thalassemia, is partially explained by a selective biologic advantage for the heterozygous state (see below). Although heterozygous carriers of a defective allele are usually clinically normal, they may display subtle differences in phenotype that only become apparent with more precise testing or in the context of certain environmental influences. In sickle cell anemia, for example, heterozygotes are normally asymptomatic. However, in situations of dehydration or diminished oxygen pressure, sickle cell crises can also occur in heterozygotes (Chap. 103). aa In most instances, an affected individual is the offspring of heterozy­ gous parents. In this situation, there is a 25% chance that the offspring will have a normal genotype, a 50% probability of a heterozygous state, and a 25% risk of homozygosity for the recessive alleles (Figs. 479-12 and 479-13B). In the case of one unaffected heterozygous and one affected homozygous parent, the probability of disease increases to 50% for

each child. In this instance, the pedigree analysis mimics an autosomal dominant mode of inheritance (pseudodominance). In contrast to auto­ somal dominant disorders, new mutations in recessive alleles are rarely manifest because they usually result in an asymptomatic carrier state. X-LINKED DISORDERS  Males have only one X chromosome; conse­ quently, a daughter always inherits her father’s X chromosome in addi­ tion to one of her mother’s two X chromosomes. A son inherits the Y chromosome from his father and one maternal X chromosome. Thus, the characteristic features of X-linked inheritance are (1) the absence of father-to-son transmission and (2) the fact that all daughters of an affected male are obligate carriers of the mutant allele (Fig. 479-13C). The risk of developing disease due to a mutant X-chromosomal gene differs in the two sexes. Because males have only one X chromosome, they are hemizygous for the mutant allele; thus, they are more likely to develop the mutant phenotype, regardless of whether the muta­ tion is dominant or recessive. A female may be either heterozygous or homozygous for the mutant allele, which may be dominant or reces­ sive. The terms X-linked dominant and X-linked recessive are therefore only applicable to expression of the mutant phenotype in women. In addition, the expression of X-chromosomal genes is influenced by X chromosome inactivation. Y-LINKED DISORDERS  The Y chromosome has a relatively small number of genes. One such gene, the sex-region determining Y factor (SRY), which encodes the testis-determining factor (TDF), is crucial for normal male development. Normally, there is infrequent exchange of sequences on the Y chromosome with the X chromosome. The SRY region is adjacent to the pseudoautosomal region, a chromosomal seg­ ment on the X and Y chromosomes with a high degree of homology. A crossing-over event occasionally involves the SRY region with the distal tip of the X chromosome during meiosis in the male. Trans­ locations can result in XY females with the Y chromosome lacking the SRY gene or XX males harboring the SRY gene on one of the X chromosomes (Chap. 402). Point mutations in the SRY gene may also result in individuals with an XY genotype and an incomplete female phenotype. Most of these mutations occur de novo. Men with oligo­ spermia/azoospermia frequently have microdeletions on the long arm of the Y chromosome that involve one or more of the azoospermia factor (AZF) genes. Exceptions to Simple Mendelian Inheritance Patterns  • 

MITOCHONDRIAL DISORDERS  Mendelian inheritance refers to the transmission of genes encoded by DNA contained in the nuclear chro­ mosomes. In addition, each mitochondrion contains several copies of a small circular chromosome (Chap. 481). The mitochondrial DNA (mtDNA) is ~16.5 kb and encodes transfer and ribosomal RNAs and 13 core proteins that are components of the respiratory chain involved in oxidative phosphorylation and ATP generation. The mitochondrial genome does not recombine and is inherited through the maternal line because sperm does not contribute significant cytoplasmic components to the zygote. A noncoding region of the mitochondrial chromosome, referred to as D-loop, is highly polymorphic. This property, together with the absence of mtDNA recombination, makes it a valuable tool for studies tracing human migration and evolution, and it is also used for specific forensic applications. Inherited mitochondrial disorders are transmitted in a matrilineal fashion; all children from an affected mother will inherit the disease, but it will not be transmitted from an affected father to his children (Fig. 479-13D). Alterations in the mtDNA that involves enzymes required for oxidative phosphorylation lead to reduction of ATP sup­ ply, generation of free radicals, and induction of apoptosis. Several syndromic disorders arising from mutations in the mitochondrial genome are known in humans, and they affect both protein-coding and tRNA genes. The broad clinical spectrum often involves (cardio) myopathies and encephalopathies because of the high dependence of these tissues on oxidative phosphorylation. The age of onset and the clinical course are highly variable because of the unusual mechanisms of mtDNA transmission, which replicates independently from nuclear DNA. During cell replication, the proportion of wild-type and mutant

mitochondria can drift among different cells and tissues. The resulting heterogeneity in the proportion of mitochondria with and without a mutation is referred to as heteroplasmia and underlies the phenotypic variability that is characteristic of mitochondrial diseases.

Acquired somatic mutations in mitochondria are thought to be involved in several age-dependent degenerative disorders affecting predominantly muscle and the peripheral and central nervous sys­ tem (e.g., Alzheimer’s and Parkinson’s diseases). Establishing that an mtDNA alteration is causal for a clinical phenotype is challenging because of the high degree of polymorphism in mtDNA and the phe­ notypic variability characteristic of these disorders. Certain pharma­ cologic treatments may have an impact on mitochondria and/or their function. For example, treatment with the antiretroviral compound azidothymidine (AZT) causes an acquired mitochondrial myopathy through depletion of muscular mtDNA. CHAPTER 479 Principles of Human Genetics MOSAICISM  Mosaicism refers to the presence of two or more geneti­ cally distinct cell lines in the tissues of an individual. It results from a mutation that occurs during embryonic, fetal, or extrauterine devel­ opment. The developmental stage at which the mutation arises will determine whether germ cells and/or somatic cells are involved. Chro­ mosomal mosaicism results from nondisjunction at an early embryonic mitotic division, leading to the persistence of more than one cell line, as exemplified by some patients with Turner’s syndrome (Chap. 402).

Somatic mosaicism is characterized by a patchy distribution of genetically altered somatic cells. The McCune-Albright syndrome, for example, is caused by activating mutations in the stimulatory G protein α (Gsα) that occur postzygotically in early development (Chap. 422). The clinical phenotype varies depending on the tissue distribution of the mutation; manifestations include ovarian cysts that secrete sex steroids and cause precocious puberty, polyostotic fibrous dysplasia, café-au-lait skin pigmentation, GH-secreting pituitary adenomas, and hypersecreting autonomous thyroid nodules. X-INACTIVATION, IMPRINTING, AND UNIPARENTAL DISOMY  Accord­ ing to traditional Mendelian principles, the parental origin of a mutant gene is irrelevant for the expression of the phenotype. There are, however, important exceptions to this rule. X-inactivation prevents the expression of most genes on one of the two X chromosomes in every cell of a female. Gene inactivation through genomic imprinting occurs on selected chromosomal regions of autosomes and leads to inheritable preferential expression of one of the parental alleles. It is of pathophysiologic importance in disorders where the transmission of disease is dependent on the sex of the transmitting parent and, thus, plays an important role in the expression of certain genetic disorders. Two classic examples are the Prader-Willi syndrome and Angelman’s syndrome. Prader-Willi syndrome is characterized by diminished fetal activity, obesity, hypotonia, intellectual disability, short stature, and hypogonadotropic hypogonadism. Deletions of the paternal copy of the Prader-Willi locus located on the short arm of chromosome 15 result in a contiguous gene syndrome involving missing paternal cop­ ies of the necdin and SNRPN genes, among others. In contrast, patients with Angelman’s syndrome, characterized by intellectual disability, seizures, ataxia, and hypotonia, have deletions involving the maternal copy of this region on chromosome 15. These two syndromes may also result from uniparental disomy. In this case, the syndromes are not caused by deletions on chromosome 15 but by the inheritance of either two maternal chromosomes (Prader-Willi syndrome) or two paternal chromosomes (Angelman’s syndrome). Lastly, the two distinct phenotypes can also be caused by an imprinting defect that impairs the resetting of the imprint during zygote development (defect in the father leads to Prader-Willi syndrome; defect in the mother leads to Angelman’s syndrome). Imprinting and the related phenomenon of allelic exclusion may be more common than currently documented because it is difficult to examine levels of mRNA expression from the maternal and paternal alleles in specific tissues or in individual cells. Genomic imprinting, or uniparental disomy, is involved in the pathogenesis of several other disorders and malignancies. For example, hydatidiform moles contain a normal number of diploid chromosomes, but they are exclusively of

paternal origin. The opposite situation occurs in ovarian teratomata, with 46 chromosomes of maternal origin. Expression of the imprinted gene for insulin-like growth factor 2 (IGF-2) is involved in the patho­ genesis of the cancer-predisposing Beckwith-Wiedemann syndrome (BWS). These children show somatic overgrowth with organomegalies and hemihypertrophy, and they have an increased risk of embryonal malignancies such as Wilms’ tumor. Normally, only the paternally derived copy of the IGF2 gene is active, and the maternal copy is inactive. BWS can be caused by several genetic defects that result in overactivity of IGF-2, or a missing active copy of CDKN1C, that result in inhibition of cell proliferation. They include paternal uniparental disomy (UPD) of chromosome 11, aberrant methylation of this region, maternal chromosomal rearrangements, or deletions within the locus.

PART 16 Genes, the Environment, and Disease Alterations of the epigenome through gain and loss of DNA meth­ ylation and altered histone modifications play an important role in the pathogenesis of malignancies. SOMATIC MUTATIONS  Cancer can be considered a genetic disease at the cellular level (Chap. 76). Cancers are monoclonal in origin, indicat­ ing that they have arisen from a single precursor cell with one or sev­ eral mutations in genes controlling growth (proliferation or apoptosis) and/or differentiation. These acquired somatic mutations are restricted to the tumor and its metastases and are not found in the surrounding normal tissue. The molecular alterations include dominant gain-offunction mutations in oncogenes, recessive loss-of-function mutations in tumor-suppressor genes and DNA repair genes, gene amplification, and chromosome rearrangements. Chromothripsis refers to a muta­ tional process including multiple clustered chromosomal rearrange­ ments in close vicinity, for example, after injury by ionizing radiation. Rarely, a single mutation in certain genes may be sufficient to trans­ form a normal cell into a malignant cell. In most cancers, however, the development of a malignant phenotype requires several genetic altera­ tions for the gradual progression from a normal cell to a cancerous cell, a phenomenon termed multistep carcinogenesis. Genome-wide analyses of cancers using deep sequencing often reveal somatic rearrangements resulting in fusion genes and mutations in multiple genes (Table 479-1 and Fig. 479-14). Comprehensive sequence analyses, now also pos­ sible through single-cell sequencing (SCS), provide insight into the evolution and genetic heterogeneity within malignancies; these include intratumoral heterogeneity among the cells of the primary tumor, intermetastatic and intrametastatic heterogeneity, and interpatient

Mutations per Mb

A

Histology HPV clade HPV integration APOBEC mutagenesis UCEC-like EMT score Purity iCluster PIK3CA (26%) EP300 (11%) FBXW7 (11%) PTEN (8%) HLA-A (8%) ARID1A (7%) NFE2L2 (7%) HLA-B (6%) KRAS (6%) ERBB3 (6%) MAPK1 (5%) B CASP8 (4%) TGFBR2 (3%) SHKBP1 (2%) C Synonymous In-frame indel Other non-synonymous Missense Splice site Frameshift Nonsense 3q (66%) CD274 (8%) PTEN (8%) YAP1 (16%) BCAR4 (16%) D 0.1 ≤ log2[CN] < 0.4 log2[CN] ≤ –0.4 –0.4 < log2[CN] ≤ –0.1 log2[CN] ≥ 0.4 Gene-level SCNAs

FIGURE 479-14  Somatic alterations in cervical cancer. A. Cervical carcinoma samples ordered by histology and mutation frequency; B. clinical and molecular platform features; C. significantly mutated genes (SMGs); and D. select somatic copy number alterations. SMGs are ordered by the overall mutation frequency and color-coded by mutation type. Adeno, adenocarcinomas; Adenosq, adenosquamous cancers; CN, copy number; SCNAs, somatic copy number alterations; Squamous, squamous cell carcinomas. (Reproduced from The Cancer Genome Atlas Research Network. Integrated genomic and molecular characterization of cervical cancer. Nature 543:378–384, 2017.)

differences. These analyses further support the notion of cancer as an ongoing process of clonal evolution, in which successive rounds of clonal selection within the primary tumor and metastatic lesions result in diverse genetic and epigenetic alterations that require targeted (personalized) therapies (precision medicine). The heterogeneity of mutations within a tumor can also lead to resistance to targeted thera­ pies because cells with mutations that are resistant to the therapy, even if they are a minor part of the tumor population, will be selected as the more sensitive cells are eliminated. Telomeres, repeats of conserved sequences, protect the ends the chromosomes from DNA damage or fusion with neighboring chromo­ somes. Telomere length shortens with age. Most human tumors express telomerase, an enzyme formed of a protein and an RNA component, which adds telomere repeats at the ends of chromosomes during rep­ lication. This mechanism impedes shortening of the telomeres and is associated with enhanced replicative capacity in cancer cells. Telomer­ ase inhibitors provide a strategy for treating advanced human cancers. In many cancer syndromes, there is frequently an inherited predis­ position to tumor formation. In these instances, a germline mutation is inherited in an autosomal dominant fashion inactivating one allele of an autosomal tumor-suppressor gene. If the second allele is inactivated by a somatic mutation or by epigenetic silencing in a given cell, this will lead to neoplastic growth (Knudson two-hit model). Thus, the defec­ tive allele in the germline is transmitted in a dominant mode, although tumorigenesis results from a biallelic loss of the tumor-suppressor gene in an affected tissue. The classic example to illustrate this phenomenon is retinoblastoma, which can occur as a sporadic or hereditary tumor. In sporadic retinoblastoma, both copies of the retinoblastoma (RB) gene are inactivated through two somatic events. In hereditary retino­ blastoma, one mutated or deleted RB allele is inherited in an autosomal dominant manner and the second allele is inactivated by a subsequent somatic mutation. This two-hit model applies to other inherited cancer syndromes such as MEN 1 (Chap. 400) and neurofibromatosis types 1 and 2 (Chap. 95). In contrast, in the autosomal dominant MEN 2 syndrome, the predisposition for tumor formation in various organs is caused by a gain-of-function mutation in a single allele of the RET gene (Chap. 400). NUCLEOTIDE REPEAT EXPANSION DISORDERS  Several diseases are associated with an increase in the number of nucleotide repeats above a certain threshold (Table 479-5). The repeats are sometimes located Synonymous Non-synonymous Other Squamous Adenosq. Adeno. Negative A9 A7 Yes Low High No No Yes No 0.96 –3.76 0.22 1.15 Adeno. Keratin-high Keratin-low APOBEC Non-APOBEC

Mutations Gain Loss

TABLE 479-5  Selected Trinucleotide Repeat Disorders DISEASE LOCUS REPEAT X-chromosomal spinobulbar muscular atrophy (SBMA) Xq12 CAG 11–34/40–62 XR Androgen receptor Fragile X syndrome (FRAXA) Xq27.3 CGG 6–50/200–300 XR FMR-1 protein Fragile X syndrome (FRAXE) Xq28 GCC 6–25/>200 XR FMR-2 protein Dystrophia myotonica (DM) 19q13.32 CTG 5–30/200–1000 AD, variable penetrance Myotonin protein kinase Huntington’s disease (HD) 4p16.3 CAG 6–34/37–180 AD Huntingtin Spinocerebellar ataxia type 1 (SCA1) 6p22.3 CAG 6–39/40–88 AD Ataxin 1 Spinocerebellar ataxia type 2 (SCA2) 12q24.12 CAG 15–31/34–400 AD Ataxin 2 Spinocerebellar ataxia type 3 (SCA3);

Machado-Joseph disease (MD) 14q32.12 CAG 13–36/55–86 AD Ataxin 3 Spinocerebellar ataxia type 6 (SCA6, CACNAIA) 19p13 CAG 4–16/20–33 AD Alpha 1A voltage-dependent L-type calcium channel Spinocerebellar ataxia type 7 (SCA7) 3p14.1 CAG 4–19/37 to >300 AD Ataxin 7 Spinocerebellar ataxia type 12 (SCA12) 5q32 CAG 6–26/66–78 AD Protein phosphatase 2A Dentatorubral pallidoluysian atrophy (DRPLA) 12p13.31 CAG 7–23/49–75 AD Atrophin 1 Friedreich’s ataxia (FRDA1) 9q21.11 GAA 7–22/200–900 AR Frataxin Abbreviations: AD, autosomal dominant; AR, autosomal recessive; XR, X-linked recessive. within the coding region of the genes, as in Huntington’s disease or the X-linked form of spinal and bulbar muscular atrophy (SBMA; Kennedy’s syndrome). In other instances, the repeats probably alter gene regulatory sequences. If an expansion is present, the DNA frag­ ment is unstable and tends to expand further during cell division. The length of the nucleotide repeat often correlates with the severity of the disease. When repeat length increases from one generation to the next, disease manifestations may worsen or be observed at an earlier age; this phenomenon is referred to as anticipation. In Huntington’s disease, for example, there is a correlation between age of onset and length of the triplet codon expansion (Chap. 435). Anticipation has also been documented in other diseases caused by dynamic mutations in tri­ nucleotide repeats (Table 479-5). The repeat number may also vary in a tissue-specific manner. In myotonic dystrophy, the CTG repeat may be 10-fold greater in muscle tissue than in lymphocytes (Chap. 460). Complex Genetic Disorders  The expression of many common diseases such as cardiovascular disease, hypertension, diabetes, asthma, psychiatric disorders, and certain cancers is determined by a combina­ tion of genetic background, environmental factors, and lifestyle. A trait is called polygenic if multiple genes contribute to the phenotype or mul­ tifactorial if multiple genes are assumed to interact with environmental factors. Genetic models for these complex traits need to account for genetic heterogeneity and interactions with other genes and the envi­ ronment. Complex genetic traits may be influenced by modifier genes that are not linked to the main gene involved in the pathogenesis of the trait. This type of gene-gene interaction, or epistasis, where the expres­ sion of a gene is altered by the expression of one or several indepen­ dently inherited genes, plays an important role in polygenic traits. In aggregate, variants in multiple genes need to be present simultaneously to result in a pathologic phenotype. Type 2 diabetes mellitus provides a paradigmatic example of a multifactorial disorder, because genetic, nutritional, and lifestyle fac­ tors are intimately interrelated in disease pathogenesis (Table 479-6)

(Chap. 415). The identification of genetic variations and environmen­ tal factors that either predispose to or protect against disease is essen­ tial for predicting disease risk, designing preventive strategies, and developing novel therapeutic approaches. The study of rare monogenic diseases may provide insight into some of the genetic and molecular mechanisms important in the pathogenesis of complex diseases. For example, the identification of the genes causing monogenic forms of permanent neonatal diabetes mellitus or maturity-onset diabetes defined them as candidate genes in the pathogenesis of diabetes mel­ litus type 2 (Tables 479-2 and 479-6) (Fig. 479-15). Genome scans have identified numerous genes and loci that may be associated with

TRIPLET LENGTH (NORMAL/DISEASE) INHERITANCE GENE PRODUCT CHAPTER 479 Principles of Human Genetics susceptibility to development of diabetes mellitus in certain popula­ tions (Fig. 479-16). Efforts to identify susceptibility genes require very large sample sizes, and positive results may depend on ethnicity, ascer­ tainment criteria, and statistical analysis. Association studies analyzing the potential influence of (biologically functional) SNPs and SNP hap­ lotypes on a particular phenotype have revealed new insights into the genes involved in the pathogenesis of these common disorders. Large variants ([micro]deletions, duplications, and inversions) present in the human population also contribute to the pathogenesis of complex dis­ orders, but their contributions remain poorly understood. Linkage and Association Studies  There are two primary strate­ gies for mapping genes that cause or increase susceptibility to human disease: (1) classic linkage can be performed based on a known genetic model or, when the model is unknown, by studying pairs of affected relatives; or (2) disease genes can be mapped using allelic association studies (Table 479-7). GENETIC LINKAGE  Genetic linkage refers to the fact that genes are physically connected, or linked, to one another along the chromosomes. Two fundamental principles are essential for understanding the concept of linkage: (1) when two genes are close together on a chromosome, they are usually transmitted together, unless a recombination event separates them (Figs. 479-6); and (2) the odds of a crossover, or recom­ bination event, between two linked genes is proportional to the distance that separates them. Thus, genes that are farther apart are more likely to undergo a recombination event than genes that are very close together. The detection of chromosomal loci that segregate with a disease by linkage can be used to identify the gene responsible for the disease and to predict the odds of disease gene transmission in genetic counseling. Polymorphic variants are essential for linkage studies because they provide a means to distinguish the maternal and paternal chromo­ somes in an individual. On average, 1 out of every 1000 bp varies from one person to the next. Although this degree of variation seems low (99.9% identical), it means that >3 million sequence differences exist between any two unrelated individuals and the probability that the sequence at such loci will differ on the two homologous chromosomes is high (often >70–90%). These sequence variations include variable number of tandem repeats (VNTRs), short tandem repeats (STRs), and SNPs. Most STRs, also called polymorphic microsatellite markers, consist of di-, tri-, or tetranucleotide repeats that can be characterized readily using the polymerase chain reaction (PCR). Characterization of SNPs, using DNA chips or beads, permits comprehensive analyses of genetic variation, linkage, and association studies. Although these sequence variations often have no apparent functional consequences, they provide much of the basis for variation in genetic traits.

TABLE 479-6  Examples of Genes and Loci Involved in Mono- and Polygenic Forms of Diabetes DISORDER GENES OR SUSCEPTIBILITY LOCUS Monogenic permanent neonatal diabetes mellitus KCNJ11 (inwardly rectifying potassium channel Kir6.2) 11p15.1 AD GCK (glucokinase) 7p13 AR PART 16 Genes, the Environment, and Disease INS (insulin) 11p15.5 AR, hyperproinsulinemia ABCC8 (ATP-binding cassette, subfamily c, member 8; sulfonylurea receptor) 11p15.1 AD or AR GLIS3 (GLIS family zinc finger protein 3) 9p24.2 AR, diabetes, congenital hypothyroidism Maturity-onset diabetes of the young (MODY): monogenic forms of diabetes mellitus       MODY 1 HNF4α (hepatocyte nuclear factor 4α) 20q13.12 AD inheritance MODY 2 GCK (glucokinase) 7p13 MODY 3 HNF1α (hepatocyte nuclear factor 1α) 12q24.31 MODY 4 IPF1 (insulin receptor substrate) 13q12.2 MODY 5 (renal cysts, diabetes) HNF1β (hepatocyte nuclear factor 1β) 17q12 MODY 6 NeuroD1 (neurogenic differentiation factor 1) 2q31.3 MODY 7 KLF1 (Kruppel-like factor 1) 19p13.13 MODY 8 CEL (carboxyl ester lipase) 9q34.13 MODY 9 PAX4 (paired box transcription factor 4) 7q32.1 MODY 10 INS (insulin) 11p15.5 MODY 11 BLK (B-lymphocyte-specific tyrosine kinase) 8p23.1 MODY 12 ABCC8 (ATP-binding cassette, subfamily c, member 8; sulfonylurea receptor) 11p15.1   MODY 13 KCNJ11 (inwardly rectifying potassium channelKir6.2) 11p15.1   Diabetes mellitus type 2; loci and genes linked and/or associated with susceptibility for diabetes mellitus type 2 Genes and loci identified by linkage/association studies   Heavily influenced by diet, energy expenditure, obesity PPARG, KCNJ11/ABCC8, TCF7L2, HNF1B, WFS1, SLC30A8, FTO, HHEX, IGF2BP2, CDKN2A/B, CDKAL1, TSPAN8, ADAMTs9, CDC123/CAMK1D, JAZF1, NOTCH2, THADA, KCNQ1, DUSP8, MTNR1B, IRS1, SPRY2, SRR, ZFAND6, GCK, KLF14, TP53INP1, PROX1, PRC1, BCL11A, ZBED3, RBMS1, HNF1A, DGKB/ TMEM195, CCND2, C2CD4A/C2CD4B, PTPRD, ARAP1/CENTD2, HMGA2, TLE4/ CHCHD9, ADCY5, UBE2E2, DUSP9, GCKR, COBLL1/GRB14, HMG20A, VPS26A, ST6GAL1, AP3S2, HNF4A, BCL2, LAMA1, GIPR, MC4R, TLE1, KCNK16, ANK1, KLHDC5, ZMIZ1, PSMD6, FITM2/R3HDML/HNF4A, CILP2, ANKRD55, GLIS3, PEPD, GCC1/PAX4, ZFAND3, MAEA, BCAR1, RBM43/RND3, MACF1, RASGRP1, GRK5, TMEM163, SGCG, LPP, FAF1, TMEM154, MPHOSPH9, ARL15, POU5F1/ TCF19, SSR1/RREB1, HLA-B, INS-IGF2, GPSM1, LEP, SLC16A13, PAM/PPIP5K2, SLC16A11, CCDC63, C12orf51, CCND2, HNF1A, TBC1D4, CCDC85A, INAFM2, ASB3, FAM60A, ATP8B2, MIR4686, MTMR3, DMRTA1, SLC35D3, GLP2R, GIP, MAP3K11, PLEKHA1, HSD17B12, NRXN3, CMIP, ZZEF1, MNX1, ABO, ACSL1, HLA-DQA1 Abbreviations: AD, autosomal dominant; AR, autosomal recessive; MODY, maturity-onset diabetes of the young. In order to identify a chromosomal locus that segregates with a disease, it is necessary to characterize polymorphic DNA markers from affected and unaffected individuals of one or several pedigrees. One can then assess whether certain marker alleles cosegregate with the disease. Markers that are closest to the disease gene are less likely to undergo recombination events and therefore receive a higher link­ age score. Linkage is expressed as a lod (logarithm of odds) score—the ratio of the probability that the disease and marker loci are linked rather than unlinked. Lod scores of +3 (1000:1) are generally accepted as supporting linkage, whereas a score of –2 is consistent with the absence of linkage. ALLELIC ASSOCIATION, LINKAGE DISEQUILIBRIUM, AND HAPLO­ TYPES  Allelic association refers to a situation in which the frequency of an allele is significantly increased or decreased in individuals affected by a particular disease in comparison to controls. Linkage and association differ in several aspects. Genetic linkage is demonstrable in families or sibships. Association studies, on the other hand, compare a population of affected individuals with a control population. Asso­ ciation studies can be performed as case-control studies that include unrelated affected individuals and matched controls or as family-based

CHROMOSOMAL LOCATION OTHER FACTORS studies that compare the frequencies of alleles transmitted or not trans­ mitted to affected children. Allelic association studies are particularly useful for identifying susceptibility genes in complex diseases. When alleles at two loci occur more frequently in combination than would be predicted (based on known allele frequencies and recombination fractions), they are said to be in linkage disequilibrium. Evidence for linkage disequilibrium can be helpful in mapping disease genes because it suggests that the two loci are tightly linked. Detecting the genetic factors contributing to the pathogenesis of common complex disorders is challenging. In many instances, these are low-penetrance alleles (e.g., variations that individually have a subtle effect on disease development, and they can only be identified by unbiased GWAS) (Catalog of Published Genome-Wide Association Studies; Table 479-1) (Fig. 479-16). Most variants occur in noncoding or regulatory sequences but do not alter protein structure. The analysis of complex disorders is further complicated by ethnic differences in disease prevalence, differences in allele frequencies in known suscepti­ bility genes among different populations, locus and allelic heterogene­ ity, gene-gene and gene-environment interactions, and the possibility of phenocopies. Catalogues of human variation and genotype data

Rare alleles Mendelian disease High Effect size 3.0 Intermediate 1.5 Modest Rare variants with small effect: difficult to identify 1.1 Low 0.001 0.005 0.05 Common Very rare FIGURE 479-15  Relationship between allele frequency and effect size in monogenic and polygenic disorders. In classic Mendelian disorders, the allele frequency is typically low but has a high impact (single-gene disorder). This contrasts with polygenic disorders that require the combination of multiple low-impact alleles that are frequently quite common in the general population. (HapMap, International Genome Sample Resource) have greatly facili­ tated GWAS for the characterization of complex disorders. Adjacent SNPs are inherited together as blocks, and these blocks can be identi­ fied by genotyping selected marker SNPs, so-called Tag SNPs, thereby reducing cost and workload (Fig. 479-4). The availability of this infor­ mation permits the characterization of a limited number of SNPs to identify the set of haplotypes present in an individual (e.g., in cases and controls). This, in turn, permits performing GWAS by searching for associations of certain haplotypes with a disease phenotype of inter­ est, an essential step for unraveling the genetic factors contributing to complex disorders. POPULATION GENETICS  In population genetics, the focus changes from alterations in an individual’s genome to the distribution pattern of different genotypes in the population. In a case where there are only two alleles, A and a, the frequency of the genotypes will be p2 + 2pq + q2 = 1, with p2 corresponding to the frequency of AA, 2pq to the fre­ quency of Aa, and q2 to aa. When the frequency of an allele is known, the frequency of the genotype can be calculated. Alternatively, one can determine an allele frequency if the genotype frequency has been determined. Allele frequencies vary among ethnic groups and geographic regions. For example, heterozygous mutations in the CFTR gene are relatively common in populations of European origin but are rare in the African population. Allele frequencies may vary because certain allelic variants confer a selective advantage. For example, heterozygotes for the sickle cell mutation, which is particularly common in West Africa, are more resistant to malaria infection because the erythrocytes of heterozygotes provide a less favorable environment for Plasmodium parasites. Although homozygosity for the sickle cell mutation is associ­ ated with severe anemia and sickle ‘crises, heterozygotes have a higher probability of survival because of the reduced morbidity and mortality from malaria; this phenomenon has led to an increased frequency of the mutant allele. Recessive conditions are more prevalent in geograph­ ically isolated populations because of the more restricted gene pool. APPROACH TO THE PATIENT Inherited Disorders For the practicing clinician, the family history remains an essential step in recognizing the possibility of a hereditary predisposition to disease. When taking the history, it is useful to draw a detailed

Rare: Common variants with high effect on complex disease CHAPTER 479 Low-frequency variants with intermediate effect Principles of Human Genetics Typical: Common variants with low effect on complex disease Rare Low frequency Allele frequency pedigree of the first-degree relatives (e.g., parents, siblings, and children), because they share 50% of genes with the patient. Stan­ dard symbols for pedigrees are depicted in Fig. 479-11. The family history should include information about ethnic background, age, health status, and deaths, including infants. Next, the physician should explore whether there is a family history of the same or related illnesses to the current problem. An inquiry focused on commonly occurring disorders such as cancers, heart disease, and diabetes mellitus should follow. Because of the possibility of agedependent expressivity and penetrance, the family history will need intermittent updating. If the findings suggest a genetic disorder, the clinician should assess whether some of the patient’s relatives may be at risk of carrying or transmitting the disease. In this cir­ cumstance, it is useful to confirm and extend the pedigree based on input from several family members. Emerging artificial intelligence tools analyzing facial features can aid the clinician in diagnosing patients with genetic conditions. In aggregate, this information may form the basis for genetic counseling, carrier detection, early intervention, and disease prevention in relatives of the index patient (Chap. 480). In instances where a diagnosis at the molecular level may be rel­ evant, it is important to identify an appropriate laboratory that can perform the appropriate test. Genetic testing is available for a large number of monogenic disorders through commercial laboratories. For uncommon disorders, the test may only be performed in a spe­ cialized research laboratory. Approved laboratories offering testing for inherited disorders can be identified in continuously updated online resources (e.g., Genetic Testing Registry; Table 479-1). If genetic testing is considered, the patient and the family should be counseled about the potential implications of positive results, including psychological distress and the possibility of discrimina­ tion. The patient or caretakers should be informed about the mean­ ing of a negative result, technical limitations, and the possibility of false-negative and inconclusive results. For these reasons, genetic testing should only be performed after obtaining informed consent. Published ethical guidelines address the specific aspects that should be considered when testing children and adolescents. IDENTIFYING THE DISEASE-CAUSING GENE Precision medicine aims to enhance the quality of medical care through the use of genotypic analysis (DNA testing) to iden­ tify genetic predisposition to disease, to select more specific

Significant Loci:

PART 16 Genes, the Environment, and Disease African and African-American East Asian European Hispanic/Native American South Asian

Initial sample size Replication sample size

Linkage or candidate gene GWAS or Metabochip Exome array Genome or exome sequencing

Total sample size (1000s)

PPARG KCNJ11 TCF7L2 SLC30A8 MC4R SLC16A11

PubMed ID

2003 2006

Year

FIGURE 479-16  Genome-wide association studies (GWAS) across ancestries and discovery of loci over time. The pie charts represent type 2 diabetes GWAS, as well as candidate gene or sequencing studies. The x axis shows the year of publication, and the y axis shows discovery sample size. The inner circles are scaled in proportion to discovery sample size, and the outer circles are scaled in proportion to total (discovery + replication) sample size. Significant loci are defined as a p value of 5 × 10−8. At the end of 2022, 534 type 2 diabetes distinct intervals (520 autosomal, 14 X chromosomal) were defined. (Reproduced with permission from BF Voight.) pharmacotherapy, and to design individualized medical care based on genotype. Genotype can be deduced by analysis of protein (e.g., hemoglobin, apoprotein E), mRNA, or DNA. Many (pathogenic) variants can be readily identified by DNA analyses; technical advances in RNA sequencing now add increasing depth to genetic and genomic investigations (e.g., for the detection of gene fusions or aberrant gene expression patterns). DNA testing is performed by mutational analysis or linkage studies in individuals at risk for a genetic disorder known to be present in a family. Mass screening programs require tests of high sensitivity and specificity to be cost-effective. The benefits and risks of screening newborns with genomic sequencing, and the potential impact on surveillance, preventative health care, and personalized treatment options are topics of current research (BabySeq Project). Prerequisites for the success of genetic screening programs include the following: that the disorder is potentially serious; that it can be influenced at a presymptomatic stage by changes in behavior, diet, and/or pharmaceutical manipulations; and that the screen­ ing does not result in any harm or discrimination. Screening in Jewish populations for the autosomal recessive neurodegenerative storage disease Tay-Sachs has reduced the number of affected indi­ viduals. In contrast, screening for sickle cell trait/disease in African Americans has led to unanticipated problems of discrimination by health insurers and employers. Mass screening programs harbor additional potential problems. For example, screening for the most common genetic alteration in cystic fibrosis, the ΔF508 mutation

PNPLA3 LPL POC5 ANKH TBC1D4 PAM

with a frequency of ~70% in northern Europe, is feasible and seems to be effective. One has to keep in mind, however, that there is pronounced allelic heterogeneity and that the CFTR gene can be affected by >2000 other mutations. While the search for less com­ mon mutations has been challenging in the past, next-generation genome sequencing now permits comprehensive and cost-effective mutational analyses. However, the bioinformatic analysis and the classification of the detected variants as pathogenic or benign alterations is still challenging. Occupational screening programs aim to detect individuals with increased risk for certain professional activities (e.g., α1 antitrypsin deficiency and smoke or dust expo­ sure). Integrating genomic data into electronic medical records is evolving and can provide significant decision support at the point of care, for example, by providing the clinician with genomic data and decision algorithms for the prescription of drugs that are subject to pharmacogenetic influences. Mutational Analyses  DNA sequence analysis is widely used as a diagnostic tool and has significantly enhanced diagnostic accuracy. It is used for determining carrier status and for prenatal testing in monogenic disorders. Numerous techniques, discussed in previous versions of this chapter, are available for the detection of mutations. Analyses of large alterations in the genome are possible using clas­ sic methods such as karyotype analysis, cytogenetics, fluorescent in situ hybridization (FISH), and array- or bead-based techniques that search for multiple single exon deletions or duplications.

TABLE 479-7  Genetic Approaches for Identifying Disease Genes INDICATIONS AND ADVANTAGES LIMITATIONS METHOD Linkage Studies Classical linkage analysis (parametric methods) Analysis of monogenic traits Difficult to collect large informative pedigrees Suitable for genome scan Difficult to obtain sufficient statistical power for complex traits Control population not required   Useful for multifactorial disorders in isolated populations   Allele-sharing methods (nonparametric methods) Suitable for identification of susceptibility genes in polygenic and multifactorial disorders Difficult to collect sufficient number of subjects Affected sib and relative pair analyses Suitable for genome scan Difficult to obtain sufficient statistical power for complex traits Sib pair analysis Control population not required if allele frequencies are known Reduced power compared to classical linkage, but not sensitive to specification of genetic mode   Statistical power can be increased by including parents and relatives   Association Studies Case-control studies Suitable for identification of susceptibility genes in polygenic and multifactorial disorders Requires large sample size and matched control population Linkage disequilibrium Suitable for testing specific allelic variants of known candidate loci False-positive results in the absence of suitable control population Transmission disequilibrium test (TDT) Facilitated by comprehensive catalogs of genotypes and variants Candidate gene approach does not permit detection of novel genes and pathways Whole-genome association studies Does not necessarily need relatives Susceptibility genes can vary among different populations Next-Generations Sequencing Technologies Whole exome or genome sequencing Unbiased approach, analysis can be performed without reference sequences from parents or siblings Requires appropriate bioinformatics, may have low sensitivity if CNV analysis is not included, detects numerous VUS, can lead to the detection of unrelated deleterious alleles Targeted sequencing of gene panels Captures multiple candidate genes and loci with hybridization techniques followed by deep sequencing Permits analyses of multiple candidate genes in parallel; facilitates molecular characterization of disorders with locus heterogeneity Abbreviations: CNV, copy number variation; VUS, variants of unknown significance. The analysis of more discrete sequence alterations often rely on the use of PCR, which allows rapid gene amplification and analysis. Moreover, PCR makes it possible to perform genetic testing and mutational analysis with small amounts of DNA extracted from leukocytes or even from single cells, buccal cells, or hair roots. DNA sequencing can be performed directly on PCR products. The advent of comprehensive sequencing technologies analyzing the whole

exome or genome, of selected chromosomes, or of numerous candi­ date genes in a single run with NGS platforms is now fundamentally transforming the characterization of patients with rare disorders and advanced malignancies. These techniques have the advantage of an unbiased comprehensive approach, and they are increasingly cost-effective. Analysis of cell-free DNA (cfDNA; also referred to as “liquid biopsy”) present in body fluids is playing a growing role for minimally invasive diagnostics and disease monitoring. Genomic tests are also widely used for the detection of pathogens and for the identification of viral or bacterial sequence variations. CHAPTER 479 Principles of Human Genetics The integration of genomic tests into clinical medicine is asso­ ciated with a number of ongoing challenges related to variable sensitivities of the tests, bioinformatics analyses, storage and shar­ ing of data, and the difficulty of interpreting all genetic variants identified with comprehensive testing. The discovery of incidental (or secondary) findings that are unrelated to the indication for the sequencing analysis, but indicators of other disorders of potential relevance for patient care can pose a difficult ethical dilemma. It can lead to the detection of undiagnosed medically action­ able genetic conditions but can also reveal deleterious mutations that cannot be influenced, as numerous sequence variants are of unknown significance. A general algorithm for the approach to mutational analysis in patients with a suspected genetic disorder and (advanced) malig­ nancies is outlined in Fig. 479-17. The importance of a detailed characterization of the clinical phenotype cannot be overempha­ sized. This is the step where one should also consider the possibil­ ity of genetic heterogeneity and phenocopies. If obvious candidate genes are suggested by the phenotype, they can be analyzed directly. After identification of a mutation, it is essential to demonstrate that it segregates with the phenotype. The functional characterization of novel mutations remains labor intensive and may require analyses in vitro or in transgenic models in order to document the relevance of the genetic alteration. Prenatal diagnosis of numerous genetic diseases in instances with a high risk for certain disorders is possible by direct DNA analysis. Amniocentesis involves the removal of a small amount of amniotic fluid, usually at 16 weeks of gestation. Cells can be collected and submitted for karyotype analyses, FISH, and mutational analysis of selected genes (Table 479-4). The main indications for amnio­ centesis include advanced maternal age (>35 years), presence of an abnormality of the fetus on ultrasound examination, an abnormal serum “quad” test (α fetoprotein, β human chorionic gonadotropin, inhibin-A, and unconjugated estriol), a family history of chromo­ somal abnormalities, or a Mendelian disorder amenable to genetic testing. Prenatal diagnosis can also be performed by chorionic villus sampling (CVS), in which a small amount of the chorion is removed by a transcervical or transabdominal biopsy. Chromosomes and DNA obtained from these cells can be submitted for cytogenetic and mutational analyses. CVS can be performed earlier in gesta­ tion (weeks 9–12) than amniocentesis, an aspect that may be of relevance when termination of pregnancy is a consideration. Later in pregnancy, beginning at ~18 weeks of gestation, percutaneous umbilical blood sampling (PUBS; cordocentesis) permits collection of fetal blood for analysis. Prenatal cfDNA allows DNA analy­ ses from the mother and fetus from a maternal blood sample to screen for certain chromosomal abnormalities and fetal sex. These approaches enable screening for clinically relevant and deleterious alleles inherited from the parents, as well as for de novo germline mutations, and they have the potential to identify genetic disorders in the prenatal setting. In combination with in vitro fertilization (IVF) techniques, it is possible to perform genetic diagnoses in a single cell removed from the four- to eight-cell embryo or to analyze the first polar body from an oocyte. Preconceptual diagnosis thereby avoids therapeutic abortions but is costly and labor intensive. It should be empha­ sized that excluding a specific disorder by any of these approaches is never equivalent to the assurance of having a normal child.

Characterization of phenotype Familial or sporadic genetic disorder PART 16 Genes, the Environment, and Disease Pedigree analysis Gene unknown Gene known or candidate genes Targeted sequencing Deep sequencing of DNA Deep sequencing of RNA (RNAseq) Deep sequencing (Linkage analysis and sequencing of linked region) Mutational analysis Determine functional properties of identified mutations in vitro and in vivo Genetic counseling Testing of other family members Therapy integrating genetic and genomic information Treatment based on pathophysiology FIGURE 479-17  Approach to genetic disease. Postnatal indications for cytogenetic analyses in infants or chil­ dren include multiple congenital anomalies, suspicion of a known cytogenetic syndrome, developmental delay, dysmorphic features, autism, short stature, and disorders of sexual development, among others (Table 479-4). Mutations in certain cancer susceptibility genes such as BRCA1 and BRCA2 may identify individuals with an increased risk for the development of malignancies and result in risk-reducing interven­ tions. The detection of cytogenetic alterations and mutations is an important diagnostic and prognostic tool in leukemias, and it has also transformed the management of solid tumors. In addition to providing diagnostic information, mutational analysis can inform the choice of targeted therapies (“actionable mutations”), character­ ize the mutational load, identify gene signatures associated with effective immunotherapies, and be used for surveillance. The demonstration of the presence or absence of mutations and polymorphisms is also relevant for the field of pharmacogenom­ ics, including the identification of differences in drug treatment response or metabolism as a function of genetic background Gene therapy through the introduction of a normal gene or the ability to make site-specific modifications to the human genome has, so far, limited clinical application. However, several gene transfer methods have now been approved for clinical use, for example, for the treatment of Leber congenital amaurosis, B-cell acute lymphoblastic leukemia, spinal muscular atrophy, and hereditary transthyretin-mediated amyloidosis. Genome edit­ ing (or gene editing) with CRISPR-Cas9 is a promising novel approach for the treatment of various diseases, for example cystic fibrosis, certain cancers, hemophilia, and sickle cell disease. The first therapies using this technology for the treatment of sickle cell disease were approved by the U.S. Food and Drug Administration in 2023 (Chap. 483). ETHICAL ISSUES Determination of the association of genetic defects with disease, comprehensive data of an individual’s genome, and studies of genetic variation raise many ethical and legal issues. Genetic information is generally regarded as sensitive information that should not be

Patient with (advanced) cancer Tumor biopsy: Somatic analysis Peripheral cells: Germline analysis DNA and RNA extraction Bioinformatics Tumor board readily accessible without explicit consent (genetic privacy). The disclosure of genetic information may risk possible discrimination by insurers or employers. The scientific components of the Human Genome Project have been paralleled by efforts to examine ethical, social, and legal implications. An important milestone emerging from these endeavors is the Genetic Information Nondiscrimination Act (GINA), signed into law in 2008, which aims to protect asymp­ tomatic individuals against the misuse of genetic information for health insurance and employment. It does not, however, protect the symptomatic individual. Provisions of the U.S. Patient Protection and Affordable Care Act, effective in 2014, have, in part, filled this gap and prohibit exclusion from, or termination of, health insurance based on personal health status. Potential threats to the maintenance of genetic privacy include the increasing integration of genomic data into electronic medical records, compelled disclosures of health records, and direct-to-consumer genetic testing. It is widely accepted that identifying disease-causing genes can lead to improvements in diagnosis, treatment, and prevention. However, the information gleaned from genotypic results can have quite different impacts, depending on the availability of strategies to modify the course of disease. For example, the identification of mutations that cause MEN 2 or hemochromatosis allows specific interventions for affected family members. On the other hand, at present, the identification of an Alzheimer’s or Huntington’s disease gene does not currently alter therapy and outcomes. Most genetic disorders are likely to fall into an intermediate category where the opportunity for prevention or treatment is significant but limited. However, the progress in this area is unpredictable, as underscored by the finding that angiotensin II receptor blockers appear to slow disease progression in Marfan’s syndrome. Genetic test results can generate anxiety in affected individuals and family members. Com­ prehensive sequence analyses are particularly challenging because most individuals can be expected to harbor several serious reces­ sive gene mutations. Moreover, the sensitivity of comprehensive sequence analyses is not always greater, for example, if CNV analy­ sis is not integrated. Genetic manipulation and patient selection for gene therapy approaches have raised ethical controversy and safety concerns that remain unresolved.