January 25, 2017

Variation Within and Between Races

A common argument against the taxonomic validity of race is that there is more genetic variation within than between races and so races must not be genetically different enough to be subspecies. This argument comes from a 1972 paper by the Harvard geneticist Richard Lewontin (Lewontin 1972). As will be shown, Lewotin’s argument fails because the metric of genetic differences he used has no obvious relevance to subspecies and because human races are equally or more genetically differentiated than recognized subspecies from other species are.

To understand Lewontin’s argument you have to have a conceptual grasp of a metric used in population genetics called an Fst value. Say we take two random animals from the species and look at what variant they have for some specific gene. There will be some probability, called the species’s total heterozygosity, that these gene variants will not be the same. Now say we do the same thing, but this time the two people are picked from the same sub-population within the species. This time the probability that their genes variants will not be the same will be called the sub-population heterozygosity. To calculate an Fst value you subtract a the sub-population heterozygosity from the total heterozygosity and then divide by the total heterozygosity:

Fst = (Ht-Hs)/Ht

In other words, an Fst value tells us how much the probability of picking different gene variants increases is the gene variants are picked at random from the entire species instead of the same sub-population. When calculating an Fst value, geneticists run this analysis for many genes and then find the average increase in heterozygosity.

When an Fst value is calculated for a species with multiple proposed sub-populations the values are averaged. So, for instance, if we conducted a study and found that two people having different gene variants was 10% less likely if they were both picked randomly from the Asian population instead of humanity at large, 8% less likely if they were both from the European population instead of humanity at large, and 6% less likely if they were picked from the African population rather than humanity at large, we might assign humanity an Fst value of (10%+8%+6%)/3% = 8% under this 3 race model. And this is what we would mean if we said something like “Only 8% of human genetic variation is between races while 92% is within them”. (The proportion of variation within groups is just 1 – the Fst value.)

In 1972, Richard Lewontin became the first person to empirically measure the human Fst value and found it to be 6.3%. Based on this finding, Lewontin  declared that categorizing humans racially has no “genetic or taxonomic significance”.

Unfortunately, Lewontin never explained why an Fst value of 6.3% should mean races have no taxonomic or genetic significance. And it isn’t obvious that it should. In fact, Sewall Wright, a founder of population genetics and the man who invented Fst values, thought that they had nothing to do measuring taxonomic significance and continued to believe in Human races long after Lewontin’s famous article (Wright 1984).

That Lewontin’s idea never took hold in the world of biology can be seen by looking at a 2006 report be the U.S Geological Survey which reviewed more than a century of popular proposed criteria for when a population counts as a sub-species. It never mentioned Fst values let alone Lewontin’s paper (Haig et al. 2006).

Since Lewontin’s paper, research has suggested that the Human Fst value is actually about twice as large, 12%, as what Lewontin suggested (Elhaik 2012). This has not altered the stance of Lewontin on races. Indeed, it isn’t obvious that his stance is open to changing because he has never said how high an Fst value would need to be in-order for a population to be of taxonomic signficance. Instead, he has just said that the human Fst value is too low.

Furthermore, Lewontin has never adressed the fact that there are many species with recognized subspecies which have Fst values lower than Humans. As can be seen below, I was easily able to find 8 other species with recognized subspecies which have Fst values no higher than humans.  In fact, it isn’t hard to find researchers in the nonhuman literature taking any Fst value greater than zero as evidence that a population is a subspecies. See, for instance, Lorenzen et al. 2007 and Williams, Homan, Johnston, and Linz, 2004. Given this, it is clear that most biologists do not use Lewontin’s criteria, whatever exactly that is, for subspecies. And given that he has never made any argument for using it, neither should we.


Jackson et al. 2014Elhaik 2012,  Lorenzen, Arctander, and Siegismund, 2008Pierpaoli et al. 2003Lorenzen et al. 2007Jordana et al. 2003Hooft, Groen, and Prins, 2009Schwarts et al. 2002, and Williams, Homan, Joshston, and Linz, 2004.

Instead, many biologists use a criteria of subspecies based, in part, on the idea that a population can only be a subspecies if you can analyze the traits of an organism in that species and accurately predict whether or not it is a member of a proposed subspecies.

Based on this traditional understanding of subspecies taxonomy, multiple geneticists have pointed out that an Fst value of 6% is just the average increased probability of a single gene being different and that, by combining data from multiple genes at once into our analysis, we can very accurately predict whether or not someone will be a member of a given race (Mitton 1977). To get a conceptual understanding of what this means, imagine that you were told to guess whether a person was a male or a female based on whether they were taller or shorter than average, or hairier or less hairy than average, or whether their voice was higher or lower pitched than average, etc. If only one of these facts were told to you, you could make an educated guess but there would be a decent chance that you would be wrong. But if you combined data on, say, 20 such sex differences, your chances of correctly guessing the person’s sex would become quite high. By the same principle, a singe gene might not be a very good predictor of someone’s race, but that doesn’t mean that the combined data of many genes wont be.  It was on this basis that the famed population genetic A. W. F. Edwards dubbed this argument against race “Lewontin’s Fallacy” (Edwards 2002).

Further more, an Fst value is not even a good measure of genetic differentiation. Consider the work done in Long and Kittles 2003, which provided a powerful demonstration of just how ridiculous an Fst subspecies criteria really is. Long and Kittles calculated the Fst value of the global human population at 11%, which is pretty typical of modern studies. They then calculated the Fst value of the global human population plus a population of chimpanzees to be 16%. Thus, the inclusion of Chimpanzees into the calculation only raised the Fst value by 5%, and most Fst based subspecies criteria would therefore conclude that a population of humans and chimps has no significantly different sub populations within it!

This work is not only amusing, but illustrative of the primary problem with Fst values as a measure of genetic differentiation. Recall that an Fst value tells us how much more likely it is two gene variants will be different if they are picked out of the entire species instead of from member of the same race. Well, what if the probability that they will be different is really high even when the genes are picked from the same race. Say, 85%, for instance. Well, in that case the most that the probability of picking different genes could increase would be by 15%, which is only an Fst value of .15.

More generally, the table below makes two points. First, for simple mathematical reasons, an Fst value can never be larger than one minus the sub-population heterozygosity. Second, because an Fst value is a measure of how much heterozygosity increases when gene variants are picked from the entire population rather than the same population, expressed as a percentage of the total heterozygosity, the same absolute difference between total and sub-population heterozygosity can lead to radically different Fst values depending on what the absolute values of these variables are:


To connect this back to humans, our sub-population heterozygsity levels range from .70-.76 (Jorde et al. 1997). Thus, no matter how different the races were, our Fst value could never be greater than roughly 25%. Each race could literally be as different, genetically speaking, as dogs are from cats. It wouldn’t matter. Our Fst value would never seem intuitively high. and most of our genetic variation would still be contained “within races”.

For these and other reasons, geneticists are increasingly recognizing that Fst values cannot be meaningfully compared across species, which have different total heterozygosities, and so, beyond testing that an Fst value is greater than zero, it cannot possibly be the foundation for criteria of sub-species (Jost 2008).

Appendix 1: Alan Templeton and Fst > .25

A highly cited 1999 paper by the geneticist Alan Templeton claimed that requiring that a subspecies have an Fst value of at least 25%-30% is “standard in the nonhuman literature” (Templeton 1999). Templeton, who uses this claim to argue against the existence of human races, cites the 1997 paper “Subspecies and Classification” by Smith, Chiszar, and Montanucci, to substantiate that this Fst standard is common place in biology (Smith, Chiszar, and Montanucci, 1997.). But Smith et al. 1997 never even mentions Fst values! It appears that Templeton assumed that this is what Smith et al 1997 meant when they wrote that subspecies cannot “overlap in variation of their differentiae” by more than 25%-30%. This is almost surely not a reference to Fst values. Instead, this paper was referencing the so called “75% rule”, which is criteria of subspecies which stated that a population would count as a sub-speices if you could analyze the traits of organisms in the species and, on this basis, predict whether or not they were a member of the proposed subspecies with an error rate of 25% or less. There are several reasons for thinking that Smith et al. 1997 were referring to the 75% rule and not an Fst based criteria for subspecies:

  1. They referred to “differentia” implying that multiple traits can be used to differentiate subspecies. This is consistent with the 75% rule, several observable traits were the norm, and not an Fst value criterion.
  2. Smith et al. 1997 goes on to state “A subspecies name draws attention to a geographic segment of a species that in some way is recognizably different”. This appeal to recognizable differences clearly implies that subspecies are differentiated based on observable traits, as in the 75% rule, and not a molecular genetic analysis.
  3. As demonstrated by Haig et al. 2006, large teams of researchers reviewing the subspecies literature have never heard of Templeton’s Fst criteria. Haig et al do, however, spend several paragraphs talking about the 75% rule.
  4. As is evidenced above, an Fst criteria is not, in fact, commonly used. But the 75% rule was. Given that Smith is an expert in subspecies taxonomy who has been writing on the topic for decades, it is therefore far more likely that he was talking about the 75% rule than Templeton’s contrived criteria which can’t be found anywhere else in the literature.

Thus, Templeton’s paper is based on an extremely misleading reading of Smith et al 1997 and fails to establish any Fst criteria for subspecies.

Appendix 2: Joseph Graves and Sewall Wright 

Joseph Graves is a biologist who has written several books and countless articles arguing against the biological existence of races. In his writings he often says something such as this about Sewall Wright, the inventor of Fst values:

“Wright felt the latter, measured by Fst was equivalent to the subspecies used by taxonomists (also called biological or geographical race.) Population subdivision can be calculated at individual genetic loci or for numerous genetic loci simultaneously. Wright’s statistic can range between 0 and 1.00. He arbitrarily suggested that the minimal threshold for the existence of great variation was Fst = 0.250 and moderate variation Fst = 0.15 to 0.250. He examined individual loci derived from protein electrophoresis from a variety of species, finding a range of differentiation from 0.023 to 0.501 (average Fst= 0.168).

Subsequent studies of multiple loci, including whole genome analyses, have generally shown human Fst at much less than Wright’s critical value.” –Graves 2006

As we have already seen, Sewall Wright did not think that Fst values should be a criteria for sub-species. He literally dedicates an entire chapter two the fourth volume of his X to race and never mentions Fst values, not does he anywhere else state that they should be used as a criteria for subspecies. In fact, on page 85 Wright cautions readers against using Fst values as a straight forward measure of genetic differentiation:

We will take F = 0.25 as an arbitrary value above which there is very great differentiation, the range of 0.15 to 0.25 as indicating moderately great differentiation. Differentiation is, however, by no means negligible if F is as small as 0.05 or even less” – Wright 1984

Thus, Graves is misleading readers by separating these two sentences, only showing his readers the first, and thus stripping it of its proper context. Wright’s views do not, in fact, lend credence to the idea that human races do no exist.

Facebook Comments
  • Greg Yudenko

    I think I might have spotted a spelling error. “In other words, an Fst value tells us how much the probability of
    picking different gene variants increases is (should be if) the gene variants are
    picked at random…”

  • Emil Kirkegaard

    “For example, when the analysis includes only humans, F(ST) = 0.119, but
    adding the chimpanzees increases it only a little, F(ST) = 0.183.”

    The number given in the article are incorrect. The increase is 6.4%points, or 54%. So, adding chimps does matter a good deal, but perhaps not as much as some would think.

    See also discussion by Fuerst.