Genetics: Elementary human genetics
Genetics is the study of the biological information that contributes to making an organism, and how this information is transmitted from one generation to the next. Inherited biological information is of course not the only influence that affects how an organism comes to be the way it is: the environment in which the organism develops and lives interacts with its genetic make-up to determine, for example, its lifespan and its susceptibility to disease as well as many behavioural characteristics. Only genetic information, however, is directly passed on from one generation to the next.
The chemicals of life
Genetic information is stored inside the nucleus of each cell of the body as deoxyribonucleic acid (DNA). The essential functions of cells are performed by proteins, which are also key components of cellular structure. DNA has two primary features: it is a code for directing the formation of proteins, and it is also reproducible. The special structure of DNA is essential for both these functions (figure 1). DNA has a long helical sugar-phosphate 'backbone' with nucleotide bases attached. There are four different bases: adenine (A), cytosine (C), guanine (G) and thymine (T). Specific combinations of bases can bind together reversibly to form base pairs: A binds with T, and C with G. The DNA molecule consists of two strands or chains held together by the bonds between base pairs, to form a double-helical structure. Because only specific combinations of bases can pair, the sequence of bases along one strand of the DNA exactly specifies the sequence on the other strand and the sequences are said to be 'complementary'.
ribonucleic acid (RNA), which carries the information to the cytoplasm of the cell, where proteins are made. RNA (figure 1) is very similar to DNA except that it has a slightly different sugar making up its backbone and it has the base uracil (U) instead of thymine. RNA does not form double helices but usually exists as short single chains.
Genes, genomes and chromosomes
A sequence of DNA that contains the information to code for a protein is called a gene. The code (figure 3) is embodied in the sequence of bases along the DNA strand of the gene: a set of three bases (known as a codon) specifies anamino acid.
Some amino acids are specified by more than one codon, and some codons stand for 'stop' signals indicating the end of the protein. The protein-coding sequence of most genes is interrupted by non-coding sequences called introns; the protein-coding sections are called exons (figure 4). Genes also contain regulatory sequences (usually locatedoutside the coding region) that control whether and when that protein is made.
Every cell in the body contains a complete set of DNA instructions for all the millions of different proteins the body needs; this is the genome. The human genome contains 3 billion base pairs and 22-25,000 genes, which represent no more than a few per cent of the total DNA sequence. The functions of the remaining 'junk' DNA are largely unknown. The discrepancy between the number of genes and the number of proteins is explained partly by mechanisms such as alternative splicing, where different combinations of exons within the same DNA sequence encode different proteins, and partly by production of proteins with different functional properties by variations in post-transcriptional and post-translational processing.
Within the cell nucleus, DNA is wound up and packaged with proteins to form a set of structures called chromosomes (figure 5).
Each species has a characteristic set of chromosomes, the karyotype (figure 6). Corresponding chromosomes from different individuals of the same species in general carry the same sets of genes in the same order.
All but two of the chromosomes occur in pairs whose members are very similar to each other in size and shape: one member of each pair is inherited from the mother and the other from the father, with the result that every individual has two copies of each gene. Humans have 22 pairs of these similar (homologous) chromosomes, which are known as the autosomes. In addition, humans have two sex chromosomes: females have two X chromosomes, while men have one X and one Y. The X and Ychromosomes contain different sets of genes; the fact that men have only one X chromosome has implications for some genetic conditions.
A small amount of DNA is found outside the nucleus, in structures called mitochondria, which are concerned with the cell's energy metabolism. All of an individual's mitochondrial DNA is inherited from their mother, in the cytoplasm of the egg.
Gene expression and the genetic 'programme'
In order to make use of the information stored in its DNA, a cell needs to express that information - that is, to produce the proteins encoded by the genes. There are several steps in the process of gene expression (Figure 7):
transcription (making an RNA molecule using the DNA sequence of the gene as a template), removing introns from the RNA (known as splicing), exporting the 'messenger' RNA from the nucleus, translation (making protein using the messenger RNA sequence to direct the assembly of a specific protein chain), modification of the protein (for example, adding sugar or lipid molecules), and translocation (moving the protein to where it is needed in the cell). The cell can exert control over the process at any of these levels.
Different types of cells (for example blood, liver or skin cells) contain the same DNA but they have different functional characteristics because they have expressed different sets of genes. Specialised cell types maintain their gene expression patterns through modifications to the DNA, such as methylation and binding of regulatory proteins that are retained when the cells divide. These 'epigenetic' modifications do not change the primary sequence of the DNA.
The process by which a single fertilised egg grows, divides and develops into an organism with a wide variety of cell and tissue types, integrated into a functional whole, involves far more than a simple, linear decoding of the DNA. The decoding process itself is regulated and carried out by proteins, which in this way feed information back to the genome, selecting the next batches of genes to be decoded as development proceeds. As cell numbers grow, groups of cells communicate with one another, sending and receiving signals that again help to direct the unfolding of the genetic programme.
This complex web of regulatory processes continues throughout the life of the organism. It follows that it is usually over-simplistic to speak in terms of a 'gene for' a particular characteristic (or disease).
Mutation and variation
A mutation is an alteration in the normal sequence of a DNA molecule, most commonly due to mistakes made by the cellular machinery that copies DNA but also as a result of environmental agents such as radiation or hazardous chemicals. Most mutations are repaired by the cell but if not they may lead to loss or alteration of the protein encoded by the DNA.
Mutations can occur on either a small or a large scale (box 1). Small-scale mutations usually affect only a single protein but can have drastic consequences for the organism. For example, a change to a single base can alter a normal codon to a STOP codon. The protein produced from that gene will then be truncated and will probably not work properly. If one or two bases are deleted or duplicated, the triplet genetic code will be put out of register, and again a normal protein will probably not be produced. Some mutations fall outside the coding regions of genes but can still affect protein production. Mutations in the regulatory regions of genes, for example, can affect the binding of regulatory proteins, causing the gene to be expressed in the wrong type of cell or at an inappropriate time or level. Mutations in the sequences at the boundaries between exons and introns can cause the RNA message to be spliced incorrectly, again leading to an altered protein product.
Large-scale mutations (deletions, duplications, insertions, inversions and translocations) often cause severe problems, for example large deletions may completely remove one or more genes. Large duplications may cause too much of a protein to be produced, which can make cells grow wrongly. The effects of inversions can be subtle, because the number of genes remains unchanged. However, changing their arrangement on the DNA molecule can change their relationship to their control sequences. Similarly, translocations, in which segments of DNA are transferred from one chromosome to another, can change the way genes are expressed and may lead to an incomplete chromosome complement when sex cells are produced. In addition, the genes at the location of the cuts in the DNA can be damaged.
If a mutation occurs in the cells that give rise to the sex (sperm or egg) cells it can be passed on to the next generation. Not all changes in the DNA sequence are harmful, however. Mutation provides the raw material for evolution: the variation on which natural selection acts. Mutations that increase reproductive fitness (or at least do not decrease it) will tend to persist and spread in a population. As a result of aeons of evolution, all species have accumulated many different genetic variants that explain why individuals in these species are not identical. This phenomenon is known as polymorphism - the existence within a population of several subtly different normal variants of a DNA sequence.
Generally, the term 'mutation' is used to refer to a rare (present in less than 1% of the population) and deleterious genetic change, whereas 'polymorphism' refers to a normal variant (present in at least 1% of the population). However, the degree of polymorphism is small compared with the total amount of DNA: human DNA varies at about 1 base in every 300.
Alleles, genotype and phenotype
Most of the variation in DNA is in the non-coding regions, but some falls within genes. Different versions of the same gene at the same position on corresponding chromosomes (i.e. at the same genetic locus) are known as alleles. Some alleles are identical in their action but others produce quite different effects; for example, sickle-cell disease is caused by a mutant allele of a haemoglobin gene.
The set of alleles that a particular person has is known as their genotype. The set of observable characteristics that they have is known as their phenotype. The phenotype is the result of the interaction between the genotype and environmental factors.
Everyone has two alleles of each gene in their autosomes, one inherited from their mother and the other from their father. If someone has two copies of the same allele then they are said to be homozygous for that allele. If they have different alleles of that gene then they are heterozygous at that genetic locus. An individual who is homozygous will show the characteristic associated with that form of the gene. For example, a person with two copies of the "A" alleleencoding a particular blood protein has type A blood. A person who has one copy of the A allele and one of the "O" allele also has the A blood type: in this case the A trait is said to be dominant over the O trait and the O trait issaid to be recessive. If both traits are expressed (as in the AB blood group) the alleles are called co-dominant.
© Public Health Genetics Unit 2006