Population Genetics: An Introduction

by David Warmflash, MD, Nathan H Lents, Ph.D.

Throughout a region in the US called Pennsylvania Dutch country, where there is a large Amish population, there is also an unusually high proportion of people with a condition called Ellis-van Creveld syndrome. Bearers of this condition are short in stature and have extra fingers (Figure 1), poorly formed teeth and nails, and heart defects that can shorten their lives significantly. Although Ellis-van Creveld syndrome is extremely rare globally, affecting less than 0.1 percent of people, it afflicts over seven percent of Amish people in the United States. Rates in the specific Amish communities in Pennsylvania Dutch country are even higher.

Figure 1: People with Ellis-van Creveld Syndrome often have shorter forearms and lower legs, plus extra fingers and toes (polydactyly), malformed fingernails and toenails, and dental abnormalities. image © Darryl Leja, NHGRI

Variants in the genes (Figure 2), called alleles, for particular enzymes produce a defective gene product and can lead to genetic diseases that are recessive. This means that only individuals that receive defective copies from both parents are affected. Individuals with only one copy of the abnormal allele often experience no symptoms whatsoever. Occasionally, a disease-causing allele can actually confer a benefit. The classic example is the gene for sickle cell disease, which is devastating in individuals with only two copies of the allele, but protective against malaria in individuals with only one copy. In human populations plagued with a high presence of malaria, the sickle cell allele has thus persisted in the gene pool, a term to describe the collection of genes in the population. (For more on alleles and genes, see our module Gene Expression: An Overview.)

Some disease-causing genes have persisted in the human population because they provide some benefit, but the allele that causes Ellis-van Creveld syndrome helps nobody. It only kills. Therefore, why is it present in 7% of Amish people when it is so rare in the general population? Is there something unusual in the environment of central Pennsylvania that gives carriers of Ellis-van Creveld some advantage, like those with one sickle cell gene who are protected against malaria?

Genes chromosomes and DNA
Figure 2: Illustration of genes, chromosomes, and DNA components. image © National Institute on Aging/National Institutes of Health

Reginald Punnett and early research

Questions like this puzzled early geneticists, particularly Reginald Punnett, a British researcher in the early 20th century. Charles Darwin had explained how natural selection worked as an evolutionary force, but Darwin had thought children were simply a blend of their parents. By Punnett’s time, the rediscovery of Gregor Mendel’s work (see our module Mendel and Independent Assortment) had led to the understanding that genes are the carriers of inherited traits (Figure 3).

Blue eyes
Figure 3: Reginald Punnett developed a visual method to understand inherited traits. Called a Punnett square, this instance shows how the probability of eye color. Here a brown-eyed parent and a blue-eyed parent produce 50% children with brown eyes (a dominant trait) and 50% children with blue eyes (a recessive trait). image © Purpy Pupple

Nobody knew what genes actually were, but during a lecture in 1908, Punnett was asked why a harmful recessive trait would not simply disappear over time. If a healthy gene were dominant over the version that caused a genetic disease, why would those diseases still be present in the population? Punnett realized that the answer must have something to do with genes, but he was unable to give a comprehensive answer.

Stumped, he explained the problem to his friend and cricket partner, Godfrey Harold Hardy, a mathematician who in June of 1908 published what came to be called Hardy’s Law. Decades later, it was realized that German physician Wilhelm Weinberg had figured out and published the same rule in January 1908, five months before Hardy. It thus became the Hardy-Weinberg Principle, although it turns out that an American, William Castle, actually had figured out the same thing, even earlier, in 1903. Calling it the Castle-Weinberg-Hardy Principle might be more accurate, but it’s quite a mouthful, so today biologists usually say the Hardy-Weinberg Principle, or the Hardy-Weinberg Equilibrium.

Comprehension Checkpoint

The rediscovery of Gregor Mendel's work led to the understanding that:

The Hardy-Weinberg Equilibrium

The Hardy-Weinberg Equilibrium describes how alleles behave in a given population, meaning a population’s gene pool. It’s called an equilibrium because the idea is that the frequencies of alleles (the variations of genes), genotypes (the alleles an individual possesses), and phenotypes (the characteristics an individual expresses due to the alleles, see Figure 4) in a population will remain constant unless the population is acted upon by a force. If this reminds you of Newton’s First Law of Motion, you have the right idea.

Alleles genotypes phenotypes
Figure 4: Using the example of the eye color Punnett square, the alleles, or variations of genes, are B for the dominant brown color and b for the recessive blue color. These combine to form the genotypes, the alleles an individual possesses, in BB, Bb, or bb combinations. Those with the at least one dominant allele, B, have the phenotype, or expressed characteristic, of brown eyes; those with two recessive b alleles have the phenotype of blue eyes. image © Based on Punnett square image by Purpy Pupple

The Hardy-Weinberg Equilibrium is usually understood in reference to one specific gene at a time. To understand the rules of Hardy-Weinberg, it is easiest to begin by considering the case of a gene with only two possible alleles in the population, a dominant one and a recessive one. Each individual can be homozygous for the recessive or dominant allele (i.e., having two copies of the dominant allele), or can be heterozygous, having one copy of each.

The equation of Hardy-Weinberg requires that we consider the abundance of each allele as frequencies, expressed in decimals as opposed to percentages. By convention, the dominant allele is called p and the recessive allele is called q. If the p allele has abundance of 35% in the population, it is expressed as 0.35. Because p and q are the only two alleles, their frequencies must add up to 100%. Therefore, p + q = 1. For example, if there’s an island with 50 dogs, that’s 100 alleles (two alleles per dog) for a certain gene, say one that determines the length of their tails, either a short or long tail. And if 30 of those alleles are the recessive type (a short tail), that’s q = 0.3, which means that the dominant type (a long tail) is p = 0.7.

The equation p + q = 1 speaks only about the frequencies of the individual alleles. However, each individual has two alleles. The frequency of each allele is multiplied because there are two chances to get each one. For an individual to end up with two dominant alleles, like a dog with two of the long tail alleles from the example above, we multiply p x p. This gives us p2. The frequency of the homozygous recessive phenotype, or the dog with two short tail alleles from the example, would thus be q x q, or q2.

Calculating the frequency of heterozygotes (those having a copy of each allele, like the dog with both a long and short tail allele) requires one extra step. We must multiply the frequency of the dominant allele, p, by that of the recessive allele, q, but we must also multiple this by a factor of 2. Why? Because heterozygous individuals have two possible ways to become heterozygous. They can receive the p allele from one parent and the q from the other, or, they can receive q from the first parent and p from the other. Therefore, the frequency of heterozygotes in the population is p x q x 2, or simply 2pq.

This leads us to the Hardy-Weinberg Equilibrium equation. If we add up all of the homozygote dominants, plus all the homozygote recessives, plus all the heterozygotes, we should get 100%. Therefore:

p 2 + 2 p q + q 2 = 1

Comprehension Checkpoint

An individual with both types of alleles, a dominant and a recessive one, is called:

Evolutionary forces

Saying that the Hardy-Weinberg principle describes an “equilibrium” is misleading, however, because the values remain constant only in a population that is not evolving. But real-life populations are always evolving. The frequencies of alleles, and thus genotypes and phenotypes, do not stay the same for long because there are always forces acting upon them. Some of the forces acting on the allele frequencies are mutation and natural selection, along with two other phenomena: gene flow and genetic drift.

Now let’s consider some of the interesting things that can happen to gene frequencies in a population.

Natural Selection occurs when one allele confers some benefit to the individuals that bear it and is thus favored by natural selection over time. This violates Hardy-Weinberg Equilibrium because the frequency of the beneficial allele will increase over time. The opposite will be true for an allele that harms the individuals that get it: The frequency will decline over time until it is eliminated.

Gene Flow refers to the movement of genes or alleles into our out of a gene pool. This can happen when members of a population migrate out, or members of another population migrate in and interbreed.

Genetic Drift refers to changes in gene frequencies due to random events, which can happen very quickly, producing dramatic and sudden effects. Drift can occur when a small group becomes isolated from the larger population. This is often called the Founder Effect. Drift can also occur when a catastrophic event reduces a large population to a very small size. Genetic drift means that the gene pool shrinks and becomes less diverse, which is often the opposite of what happens during gene flow when interbreeding expands the gene pool and increases genetic diversity.

Comprehension Checkpoint

When an allele confers some benefit to the individuals and is passed on over time, the genetic force is called:

How genetic drift works

Genetic drift is faster and more powerful in small populations, and this is best explained by considering the statistics of coin flipping. For each toss, you know that the chances of getting heads or tails is 50:50, but if you perform only ten flips, you probably won’t get exactly five heads and five tails. It might come out 4:6 or 3:7, simply due to the randomness of how the coin lands. You could also get 2:8, 1:9, or even 0:10. Odds are against this, but it’s certainly possible.

However, if you increase to 100 flips, you will probably get very close to a 50:50 ratio, even closer if you go up to 200, 400, or 1,000 flips. This is because the random factors causing heads or tails increasingly cancel each other out. The larger the number of coin flips, the more accurate the ideal prediction of 50:50 becomes. The lower the number of flips, the higher the chance of getting a strange ratio like 2:8 or 1:9.

For essentially the same reason, the frequencies of alleles are subject to wide swings when a population gets very small. Consequently, if we use the Hardy-Weinberg Equilibrium equation to calculate allele frequencies in a large population at one moment in time, the answer will be pretty accurate and will hold over several generations. However, when a population gets very small, little differences can have big impacts on the population frequencies after a few generations. This is the essence of genetic drift: The gene frequencies change over time because of random effects due to small population size. One allele may become way more frequent than another one for no other reason other than chance, like flipping 8 heads out of 10 flips.

In nature, it’s tempting to assume that some alleles become more frequent because of natural selection because they bring some benefit to survival or reproduction, but that may not be the case. It could be a case of pure genetic drift. While the Hardy-Weinberg Equilibrium equation can help us detect that a population has undergone some kind of change, such as genetic drift, it cannot say how or why. For that, we have to look closer.

Comprehension Checkpoint

Gene frequencies change over time because of _____ due to small population size.

Types of genetic drift

The two main types of genetic drift are "bottleneck events" and "founder effects," each referring to a different mechanism by which a small population becomes reproductively isolated. Simply picturing how the neck of a bottle allows just a small fraction of the bottle’s contents into the limited space in a finite amount of time gives you a clue of what bottleneck means in genetics. If you imagine that the bottle contains a gene pool, you get still a better idea.

Bottleneck events

When a population suffers a sudden catastrophic decline and is then repopulated by a small group of survivors, that’s a bottleneck event (Figure 5). The gene pool shrinks and the new frequency of alleles for each gene is different from what it was in the larger population prior to the event.

Bottleneck event
Figure 5: When a catastrophic event kills off a large portion of a population and a small group of survivors is left to repopulate, it is a type of genetic drift known as the bottleneck effect. It results in a smaller gene pool and a different mix of allele frequencies. image © OpenStax, Rice University

In this way, previously rare alleles can suddenly become common, purely by chance. This happens in nature all the time. A good example is the Northern elephant seal, which thrived in the Northern Pacific Ocean on the continent and islands from Mexico to Alaska, but was hunted to near extinction by the 1880s. In the 1920s, however, Mexico designated an island called Guadalupe as a sanctuary for the animals. Beginning with less than 100 seals, the population started expanding again so that today it numbers more than 127,000. While the population rebound is great news, this is an extreme bottleneck effect and the seals have lost a great deal of their original genetic diversity.

Furthermore, these Northern elephant seals are now very different from their counterparts in the South Pacific that did not undergo a bottleneck. For example, Northern elephant seals have an asymmetric looking face that is extremely rare among Southern elephant seals. This facial anomaly in the Northern seals might remind you of the Ellis-van Creveld syndrome seen in America’s Amish. Genetically, the two cases are similar; both are examples of genetic drift. Neither the asymmetric face nor the allele for Ellis-van Creveld syndrome offer a survival benefit, yet they have increased in abundance in these specific populations.

Founder effects

The type of genetic drift experienced by North American Amish communities is not a bottleneck, it is a founder effect because the Amish are not rare survivors of a large population that was mostly destroyed. Rather, they are descended from a small group of founders (Figure 6), people who left their roots in German lands and crossed the ocean. By pure chance, the small group that left Europe had a higher frequency of the Ellis-van Creveld gene allele than the larger population from which they came. When they became the founders of the new population of Amish in America, their descendants also exhibited that higher frequency.

Founder effect
Figure 6: When a portion of a population is separated, like when settlers leave for a new location - a type of genetic drift called the founder effect occurs. The separated population's genetic makeup starts to change and, over time, match that of the founding men and women. image © Tsaneda

Over time, we would expect that the Ellis-van Creveld condition would reduce survival rates of those that bear it, mostly because of the heart defects. This would result in a reduction in the allele frequency and this could be detected using the Hardy-Weinberg Equilibrium equation. However, the symptoms associated with Ellis-van Creveld disease don't impair survival until after the person has likely reproduced, making it hard for natural selection to eliminate since the trait is already passed on before it hurts anyone.

Founder effects have been implicated for numerous recessive diseases, such as Tay-Sachs in Ashkenazi Jews, who didn’t abandon Europe but became reproductively isolated, due to anti-Semitism in the Middle Ages. Similar to the sickle cell gene, it’s also possible that the Tay-Sachs gene conferred some health benefit in heterozygous individuals many centuries ago. This makes matters more complex, but that’s a common characteristic of nature. Evolution results from the combined effect of many forces. Genetic drift is an important one but does not operate in a vacuum.

Comprehension Checkpoint

When a population suffers a sudden catastrophic decline and is then repopulated by a small group of survivors, it is called a:

Calculating the frequencies of alleles

You might be asking, “So if Hardy-Weinberg Equilibrium only holds for populations that are not evolving, and all populations are always evolving, what is it good for?” The value of the equation is two-fold. First, it is useful for calculating allele and genotype frequencies for populations at a certain point in time. It may not predict the future, but it can at least help describe the present. Secondly, the value of the Hardy-Weinberg principle is in helping us discover when a certain gene is being subject to natural selection or some other evolutionary force. If the Hardy-Weinberg Equilibrium predictions do not hold, then we know that something interesting is happening to that gene in the population.

Those who remember their basic algebra will recognize the equation, p2 + 2pq + q2 = 1, as a quadratic equation, more often expressed as x2 + 2xy + y2 = 1. This equation comes from the expansion of (x + y) = 1. When you square both sides, you get (x + y)2 = 12. While 12 is just 1, (x + y)2 expands to x2 + 2xy + y2.

When it comes to gene frequencies, the squaring of both sides of the equation represents fertilization – the fusion of sperm and egg. The sperm or egg cells are gametes, reproductive cells having half the number of chromosomes of a mature cell. When the two gametes are joined in the creation of a new organism, the frequency of each allele is multiplied. The genotype frequency of the resulting individual is the frequency of the maternal allele times the frequency of the paternal allele.

Frequency of the two alleles in the population: p + q = 1

Fertilization brings two alleles together: (p + q)2 = 12

Performing the square: p2 + 2pq + q2 = 1

Remember that p and q represent the allele frequencies, or the number of times that allele appears on the genes of the individuals in the population. For example, earlier we noted that in the population of 100 dogs, 70 had the long tail allele (p = 0.7) and 30 had the short tail allele (q = 0.3). While p2, q2, and 2pq represent the genotype frequencies, or the number of individuals in a population with the various types of genotypes (homozygous recessive, homozygous dominant, or heterozygous).

This quadratic relationship is also the mathematical expression of the dihybrid cross of selected individuals (a dihybrid cross is when two parents that differ by two pairs of alleles mate), but applied to a random population. It’s the same concept: the fusion of gametes brings together two alleles and so their individual frequencies are multiplied together. (For more on dihybrids, see our module Mendel and Independent Assortment.)

Because p = the frequency of the dominant alleles, p2 represents the frequency of homozygous dominant individuals. In the dog example above, the frequency of p, the allele for a long tail, equals 0.7. Therefore the frequency of homozygous dominant dogs would be (0.7)2 = 0.7 x 0.7 = 0.49. And because q = the frequency of recessive allele, a short tail, q2 represents the frequency of homozygous recessive dogs, which is (0.3)2 = 0.3 x 0.3 = 0.09.

Finally, 2pq represents the frequency of heterozygous dogs, those with both the long and short tail allele, which is 2 x 0.7 x 0.3 = 0.42. We can check our math by ensuring the frequency of each phenotype adds up to one: 0.49 + 0.09 + 0.42 = 1

Comprehension Checkpoint

In the Hardy-Weinberg Equilibrium equation, the symbol q represents the:

Example: rabbit fur

Let’s look at an example. Suppose that we have 100 rabbits. 88 of them have an agouti fur coat, a kind of blended color, which is a dominant trait. The other 12 have black fur, which is a recessive trait, so we know that those 12 are homozygous for the black fur gene. Therefore, q represents the frequency of the recessive allele. We can actually calculate q because we know q2. Since 12 rabbits out of 100 have the black fur, that means that q2 = 12/100 = 0.12. To find q, we calculate the square-root of 0.12, which is 0.35 (rounding to two significant figures).

Using this knowledge, we can calculate the other frequencies using the Hardy-Weinberg Equilibrium equation. If the black fur allele equals 0.35 and the frequencies of both alleles must add up to one, this means that p = 1 - 0.35 = 0.65. That’s the frequency of the dominant allele that produces agouti fur, and by squaring that frequency we can get the number of homozygous dominant rabbits. p2 = (0.65)2 = 0.42. Since there are 100 rabbits total, that means 42 of them are homozygous dominant for an agouti coat.

If 42 are homozygous dominant (agouti fur) and 12 are homozygous recessive (black fur), how many are heterozygous (agouti fur with only one agouti allele)? That’s 100 – (12 + 42) = 100 - 54 = 46, or 0.46 of the population of 100. The Hardy-Weinberg Equilibrium equation predicts that this frequency should equal 2pq. Let’s make sure that it does: 2 x 0.65 x 0.35 = 0.46. The math is correct!


Changes in the genetic makeup of a population affect the incidence of certain traits and diseases within the population. Beginning with a look at the abnormally high rate of a dangerous health condition in US Amish communities, this module explores forces that affect a population's gene pool. Among them are natural selection, gene flow, and two types of genetic drift: founder effects and bottleneck events. The Harvey-Weinberg Equilibrium equation is presented along with sample problems that show how to calculate the frequency of specific alleles in a population.

Key Concepts

  • Variants in genes are called alleles. Alleles can be dominant, meaning they are always expressed, or recessive, meaning that only individuals that receive defective copies from both parents are affected.

  • The work of Gregor Mendel on genes and inherited traits was important in the development of early genetic theories of traits.

  • In a population, the frequencies of alleles (the variations of genes), genotypes (the alleles an individual possesses), and phenotypes (the characteristics an individual expresses due to the alleles) will remain constant, or at equilibrium, unless acted upon by a force.

  • The Hardy-Weinberg Equilibrium equation (p2 + 2pq + q2 = 1) describes how alleles behave in a given population, also known as a population’s gene pool.

  • Genetic drift refers to changes in gene frequencies due to random events, which can happen very quickly, producing dramatic and sudden effects.

  • There are two main types of genetic drift: bottleneck events (when a population suffers a sudden catastrophic decline and is repopulated by a small group of survivors) and Founder effects (when a new population is started by just a few members of the original population).