July 26, 2017

Regression to the Mean

I am taller than the average American male. This can only be for a combination of two reasons: genes and the environment. I have no idea which environmental factors might have made me taller than average, so if my height advantage is part genetic and part environmental in origin I will probably only pass on the genetic portion to my children. Unless they, by sheer luck, also get the same environmental advantages I did, they will be more like the average person, in terms of height, than I am, but still somewhat taller than average due to my genes. They will have regressed towards the population mean.

Assuming that they have about as lucky of an environment, with respect to height, as the average American does, their height advantage will be 100% genetic. So long as they marry someone with an equally strong genetic advantage in height, they will probably have kids about as tall as they are.

This simple concept is called “regression to the mean”. In this article, I am going to add various complexities to this concept and then discuss some arguments about how this phenomenon relates to immigration and the race & IQ debate.

Decomposing behavioral variation

Before I talked about “genes” and “environment” rather loosely. To be more precise, phenotypic differences between people have four possible causes: additive genetics, non additive genetics, shared environmental factors, and non-shared environmental factors.

(A phenotype is any physical trait).

Additive genes are simply genes that have the same effect on your phenotype regardless of what other genes you have in your genome. Non-additive genes will have a different effect on a phenotype depending on what other genes are in your genome.

For instance, the gene variant “A” would be additive if, no matter what other genes you had, it increased your height by  .1 inches. On the other hand, it would be non-additive if it increased your height by .1 inches if, at some other location in your genome, you had gene variant”B” and by .2 inches if you had gene variant “C”.

Thus, non-additive genes exert their effect on you based on the combination of genes you have across multiple locations on your genome. In this example, the relevant combination only included 2 gene variants, but it can include many more.

The proportion of phenotypic variation between individuals within a given population which is explained by additive genetics is called the trait’s “narrow sense” heritability. The proportion or variance explained by additive and non-additive genetic factors together is called the trait’s “broad sense” heritability.

Now that we’ve looked more closely at genetic variation, let’s turn to the environment. Environmental influences come in two types: shared and unshared. Shared environment refers to those environmental stimuli which make members of the same home more similar to one another compared to two people raised in separate homes.

For instance, socio-economic status (SES) is typically shared by all members of a family. At the very least, the difference in SES between two members of the same family will, on average, be far less than the difference in SES between two randomly selected people from different families. Thus, being in the same family decreases variation in SES between people and so will make them more similar on traits impacted by SES.

Unshared environment is anything not covered in the categories listed thus far. The kinds of friends you have, random events in your life, and measurement error, are commonly given examples of the unshared environment.

It is important to note that the unshared environment, by definition, does not cause parents and children to correlate to each other. In other words, given a parent’s experience with the unshared environment, you can predict absolutely nothing about a child’s experience with the unshared environment. These two values, their experiences with the unshared environment, are random with respect to one another.

Returning to regression to the mean, before, speaking about my height advantage,  I said: “I will probably only pass on the genetic portion to my children”. That is not strictly true: I will probably only pass on the portion of my height advantage which is caused by additive genetics.

This is because my children will only get a random half of my genes. Because of that, unless their mother happens to have the exact some combinations of genes that I do, they are very unlikely to have the same combinations of genes as me.

For instance, if I have gene variants “A”, B”, and  “C”, which when combined give people a big height boost, while, at the same loci, their mother has the gene variants “A”, “D”, and “E”, our kids will get our shared A variant, but will probably some mixed combination of “D”,  “E”, “B”, and “C”. Thus, our children will probably not have the same 3-variant combination that either I or their mother has.

Thus, only additive genetics can be passed on to your kids. Well, that and shared environment.

Two kinds of shared environment

However, we need to make a distinction between two kinds of shared environmental factors. Before I said that shared environmental factors made members of the same home more similar than average. But which members?

Consider parenting style. It could be considered a shared environmental factor. After all, siblings experience parenting styles which are more similar than two people growing up in separate homes would. But do parents and children in the same home experience highly similar “parenting styles”?

Only to the degree that the parents have the same parenting style that the grandparents did. If they have a parenting style which differs from that of the grandparents, their children will not experience the same parenting style that they did. Consider also that the two parents probably experienced different parenting styles growing up. The kids would need to experience a perfect midpoint between the parenting styles of the two sets of grandparents in-order for them to “share” their experience of parenting style with their parents.

Thus, some shared environmental factors will tend to make siblings, but not necessarily parents and children, more similar than average. Other shared environmental factors, like SES, will make all members of the home more similar than average. There are probably even factors that make parents and children more similar than average but not siblings, though I can’t think of any.

So, back to regression to the mean: people can pass on only that aspect of the shared environment which makes parents and children more similar than average.

Using variance components to predict regression

Before I noted that the portion of phenotypic variation explained by additive genetics is called that trait’s narrow sense heritability. Similarly, we can say that a certain proportion of variation in a trait is explained by non additive genetics, shared environmental factors, and non shared environmental factors.

If X proportion of variation in a trait is explained by additive genetics, if its narrow sense heritability is X%, then, on average, a given person’s deviation from the population’s mean will be X% due to additive genetics.

The same logic applies to the other components of phenotype variance. If a trait’s variance is Z% due to the shared environment then, on average, the difference between that trait in a particular person and the population’s mean value for that trait will be Z% due to the shared environment.

Before I said the following: “I have no idea which environmental factors might have made me taller than average, so if my height advantage is part genetic and part environmental in origin I will probably only pass on the genetic portion to my children.”

We can now be much more precise: on average, the proportion of a tall person’s height advantage that is due to additive genetics will be equal to the narrow sense heritability of height. Let’s call this X. Due to additive genetics, the children of the average tall person will be X% as far from the population mean as they are.

Similarly, if height’s variation is Z% due to environmental factors shared between parents and children, then a tall person’s child will be Z% as far away from the mean as they are because of shared environmental factors.

The total degree to which children will be as distant from the mean as their parents will equal Z% (parent-offspring shared environment) + X% (narrow sense heritability).

Assortative mating

Previously, when speaking about my hypothetical kids and their hypothetical mates, I said: “So long as they marry someone with an equally strong genetic advantage in height, they will probably have kids about as tall as they are.”

This assumption, that my children will marry people as distant from the population mean as they are, is probably false.

For most traits, the correlation between parents is in the range of .1 -.4. This means that people’s partners only tend to be 10%-40% as far from the mean as they are for any given trait.

This correlation between the traits of parents is called an assortative mating coefficient. Assortative mating simply refers to the degree to which people tend to mate with people who are similar to themselves.

Given this, the person I have children with will probably not be as abnormally tall (for their sex) as I am. If the assortative mating coefficient is .3, then they will probably only be 30% as far from their sex mean as I am.

To predict how far my children will regress to the mean relative to me we would take the narrow sense heritability of height, say .7, and the parent off spring shared environmental competent, say .1, add them together to get .8, and then multiply by the assortative mating coefficient, .3, to ultimately estimate that my children will only be .24, or 24%, as far from the population mean in height as I am.

Alternatively, we could find the average distance from the mean in height of me and the children’s mother, the midpoint parent average, and then multiply that by the narrow sense heritability + shared environment component, and that too would tell us how far back my kids will regress.

Sibling Regression

It may have occurred to you that parent’s and offspring are not the only pairs of people who share 50% of their additive genetics, 100% of their shared environment, and 0% of their unshared environment. Biological siblings do aswell. Because of this, we can apply everything I have said thus far about regression to the mean to siblings as well.

A sibling with an extreme score in some trait will have siblings with less extreme traits. How extreme their traits will be, compared to their siblings, will be a simple function of the proportion of variance in that trait accounted for by additive genetics and the shared environment. Only this time the focus will be on the aspects of the shared environment which make siblings more alike than average.

For most of this post, I am going to continue talking about parents and offspring. But this is for the sake of consistency. Regression to the mean happens between siblings as well.

A Simplification: the shared environment often doesn’t matter

Until now, I have been acting as if additive genetics and the shared environment are both really important factors in explaining human variation. This is largely untrue. As counter-intuitive as it may seem, the shared environment often explains little, or literally zero, variation in human traits, especially in adult populations.

The simplest way to measure this is with adoption studies that look at the correlation in a trait between adoptive pairs of parents and children. Unlike most families, these pairs shared a home, and so the shared environment, but are no more alike genetically than average.

Such studies often find correlations near zero. That is, adoptive parents and children are no more similar than average for many traits.

More complicated twin designs can be used to measure the influence of the shared environment as well. Using these methods, these are the conclusions that researchers have come to about the proportion of phenotypic variation explained by the shared environment:

Chart 1.png

Bouchard and McGue (2003);Rhee, Hyun, and Irwin (2002);  Branigan, McCallum, and Freese (2013); Hyytinen et al (2013)

As can be seen, for many traits widely considered to be of high importance, the shared environment exerts a rather small influence. Because of this, it is often the case that regression to the mean is purely, or almost purely, a function of the narrow sense heritability of a trait.

Regression Happens Once

For whatever reason, a lot of people think that regression to the mean will keep happening to someone’s descendants until they research original population’s mean. This is not true. Once the next generation inherits an additive genetic advantage in some trait it keeps it. Their kids and their grand kids will continue to have it.

Were this not true, evolution would be impossible. Every time a mutation resulted in a genetic change in a trait the next few generations would simply regress back to where the trait was before. Evolution happens, and so regression to the mean clearly does not work this way.

Let’s imagine that my height is 100% the result of a new additive mutation that makes people 50% further, in a positive direction, from their sexes mean height than average. Let’s also assume that my mate has the same new mutation. We are the only two people who have it. We will both give that mutation our kids. They will be just as tall as us. After all, they have the same genes that made us tall. There is nowhere else for the genes to go. Thus, no regression to the mean happens, right?

Well, yes and no. Yes, my kids are just as far from the original population’s height mean as I am. However, the population’s mean height has changed. Thanks to the new mutation being given to my kids, the population’s mean height has risen. Thus, the distance between my kids height and the population mean is less than the initial distance between my height and the population mean was. So, there is a sense in which, as the mutation continues to be spread across the population, everyone will return to “the mean”. But it will be a new mean, and that is not regression to the mean.

Regression goes both ways

A final point to make is this: regression to the mean goes both ways. That is, if a pair of parents have a trait value is x% above, or x% below, the mean their children will regress to the same degree back towards the mean. In this post, I have used a lot of examples having to do with me being tall. All the same logic would apply to someone who is abnormally short.

Implications for immigration policy

Okay, so, why should we care about this, other than for the inherent fun of behavioral genetics? Well, it has an important implication for immigration policy: if we have a criterion for immigration which requires people to deviate from their population’s mean value of a trait, then their offspring will not deviate from their population to the same degree; they will regress towards the mean.

That sounds pretty simple, but there is a complication: what mean will they regress towards?

Suppose we had a requirement such that you had to make at least $70,000 a year to immigrate to the U.S. Let’s also suppose that this is 10 standard deviations above the median income of India. An Indian who makes $70,000 comes here. Added together, additive genetics and the shared environment account for about 40%, of variation in income. Given this, what will the Indian’s kid’s incomes look like?

What we have covered thus far tells us that his kid’s incomes will only be 40% as far from the mean as his income is. But what mean? The mean income in India? Obviously not. This would require second generation Indian immigrants to make 3rd world incomes in the United States.

Perhaps the mean income of the united states then? Will the Indian immigrant’s children be 40% as far from the American mean income as he is? There is no reason to think so.

The issue is this: the Indian immigrant has genes and a shared environment which caused him to make $70,000 given his (random) experience with the distribution of unshared environmental stimuli in India. His children will be exposed to a different distribution of different un-shared environmental factors in America. Had the Indian been exposed to this distribution, his income may have ended up being significantly greater, or lesser, (but probably greater) than it ended up being in India.  His children will end up having incomes 40% as far from the American mean as his income would have been if he was raised in the distribution of unshared environmental factors present in America.

Let’s suppose that, in India, an important unshared environmental factor which influences income is the degree to which people inhale certain pollutants in the air while they are growing up. Let’s further suppose that our hypothetical Indian immigrant experienced the average value of this stimuli, and so it did not cause his income to deviate from the Indian mean.

Next, let’s make the plausible assumption that people in America inhale far fewer of these chemicals. Thus, though he had an average experience with the unshared environment in India, his experience with the unshared environment is abnormally poor by American standards.

He won’t pass this experience on. His children will have a random experience with the unshared environment of America. It will probably be much better than his experience was in India in terms of stimulating income. They will, therefore, have a higher income than what we would expect based on the amount of income variation explained by shared environment and additive genetics.

Formally, this happens because regression to the mean only works in the way I have explained it if the unshared environmental experiences of parent’s and offspring are random with respect for each other. (That is, one does not correlate with the other). In general, this is true by definition because environmental stimuli that cause parents and offspring to be alike are shared environmental factors.

But in this case, the unshared environment, or what would normally be called the unshared environment, differs radically between parents and offspring in a way that will predictably make the offspring have higher incomes than their parents. This could cause a positive or negative correlation between parent and offspring income depending on the effect size. Standard behavioral genetic models “break down” (require re-thinking) because they are meant to be applied to a single population.

Thus, to predict the trait value of an immigrant’s offspring, you will need to estimate what their IQ would have been had they been raised in the country they are immigrating to. Then, you can apply the normal regression to the mean model to their offspring.

This creates a complication for meritocratic immigration systems. In the long run, regression to the mean will be an important factor in determining the impact that immigrants have on a society. This makes it important to estimate what an immigrants offspring will look like, either by studying the children of past immigrants from the relevant country or by somehow estimating the difference that America’s unshared environment would have made on their IQ had they been raised here, when determining if an immigrant is “good enough” to allow into the US or any other country.

Race and IQ

Another argument some people make is that a genetic cause of racial IQ differences is implied by the fact that members of different races regress to their respective racial population means. This is simply not true.

Studies in this area typically find that children (or siblings) are about 50% as far from their respective racial mean as their parent’s (or siblings) are. (Note: such studies typically involve children and the heritability/shared environmental components for IQ vary with age.)  A point of importance is that the races regress to different means.

For instance, a Black with an IQ of 115 would be 2 standard deviations above the Black mean. Their children, on average, will have IQs of 100, or 1 standard deviation above the Black mean.

By contrast, a White with an IQ of 115 has an IQ 1 standard deviation above the white mean and their children will have IQs, on average, of 107.5, or .5 standard deviations above the White mean.

This necessarily follows from the fact that regression is 50% and the races have different mean IQs. The fact that they have mean IQs tells us that they differ with respect to some set of factors which influence IQ: additive genetics, nonadditive genetics, the shared environment, or the unshared environment. None of this tells us anything about which factor it is.

Let’s think for a moment about an IQ 115 Black. Half of his advantage comes from factors they can pass on to their children. The other doesn’t. Let’s suppose that their IQ advantage was 50% due to them having a better unshared environment than the average Black. This would tell us nothing about group differences. This is consistent with the average Black having a worse unshared environment than the average White. The IQ 115 Black had a much better unshared environment than the average Black but, by definition, was not able to pass this advantage on. This is also consistent with any other imaginable cause of the B/W IQ gap.

Suppose instead that the IQ 115 Black has a strong advantage in nonadditive genetics. Again, this tells us nothing about group differences. It just says that this Black has “better” non additive genes than the average Black. This doesn’t mean that the average Black has worse non additive genes than the average White. If their IQ is being depressed by an environmental factor and they have the exact same genes as Whites their IQs will be lower than Whites and blacks with abnormally better genes (than both blacks and Whites) will have a higher IQ.

I could go on, but I think you get the idea. Nothing about regression to the mean differing between races implies a cause of racial IQ differences. That being said, let’s look at a few sources that say otherwise:

Wouldn’t one expect that some black families in some more environmentally propitious situations would enable their children to escape, or at least significantly to avoid, any factor X that might depress IQ scores? How is it, then, that even for black children with relatively high IQs of 120, their siblings should average only 100, rather than 110, as with the siblings of white children with IQs of 120? Consider the parents of a black child with the relatively high IQ of 120. Wouldn’t one expect that that family, which had managed to find and develop an environment congenial enough to the intellectual development of one of their children that he or she achieved an IQ of 120, might likewise have secured an environment as well suited for the intellectual development of their other children? – Liberal Bio Realism

You would only expect this if you didn’t know that shared environment explains very little IQ variation in adulthood. Adults typically do not pass on much in the way of environmental advantages to their children. (Not ones that last into adulthood, anyway). This is a surprising fact about reality, but it has nothing to do with race. Moreover, the same basic phenomenon happens to Whites. While to a lesser degree, their children regress as well.

Regression toward the mean provides still another method of testing if the group differences are genetic. Regression toward the mean is seen, on average, when individuals with high IQ scores mate and their children show lower scores than their parents. This is because the parents pass on some, but not all, of their genes to their offspring. The converse happens for low IQ parents; they have children with somewhat higher IQs. Although parents pass on a random half of their genes to their offspring, they cannot pass on the particular combinations of genes that cause their own exceptionality. – Rushton and Jensen (2005)

This is a half-truth. The unshared environment is not inherited either.

“Genetic theory predicts the magnitude of the regression effect to be smaller the closer the degree of kinship between the individuals being compared (e.g., identical twin  full-sibling or parent– child half-sibling). Culture-only theory makes no systematic or quantitative predictions.” – Rushton and Jensen (2005)

This prediction is actually made by any theory that accepts that the shared environmental factors and additive genetics explains IQ variation within each race and increase in similarity with kinship. As long as the same account of individual differences holds within each race, the groups will be predicted to regress to an equal extent to different means.

They regress to the mean because they have a different mean value of some factor in IQ variation. This is true regardless of what that factor is.

To see that this is so, let’s suppose that Blacks have lower IQ than Whites because they have lower levels of self-esteem caused by Whites being racist towards them. (It’s not.) Some Black just so happens to get lucky and experience very little racism in their life and so gets an elevated IQ. Will he be able to pass this environmental advantage on to his kids? No. His kids will only inherit the proportion of his IQ advantage due to genes and the shared environment. Let’s say that 50%. So, his kid will regress 50% back to the Black mean, and this is predicted by any theory which states that the unshared environment causes the B/W IQ gap.

It could also easily be the case that a given Black with an abnormally high IQ has a noninheritable advantage in factors which have nothing to do with the Black-White IQ gap at all.

Thus, the hereditarian view on racial intelligence differences make no predictions on how races will regress other than those made by basic behavioral genetic theory which proponents of both sides of the debate may accept. Regression to the mean is a problem for egalitarians who insists that the shared environment has a huge effect on IQ, but regression to the mean is the least of their problems since they are denying a foundational finding in the behavioral genetics of intelligence to begin with.

That being said, some people may be persuaded of hereditarianism by the simple realization that the cause of the Black-White IQ gap is not something which is passed on from parents to offspring or shared by siblings. However, this shouldn’t be surprising given what we know about intelligence in general.

Alternatively, some people have argued that the hereditarian view on race and IQ is falsified by the fact that the offspring of immigrants don’t always regress back as far as one would expect given their parent’s IQ. The mistake in this reasoning is to calculate the predicted degree of regression based on the parent’s IQ as opposed to the IQ they would have had if they grew up in a first world Western nation. As discussed above, these two figures can be very different and conflating them could lead someone to expect the children of immigrants to be far less intelligent than they actually are.

Facebook Comments
  • blackacidlizzard

    “Regression Happens Once”

    When we look at studies which have selected high-IQ Black parents, haven’t we already selected a “breeding population” with its own average IQ? And yet, the children do not regress towards the median IQ of that group, but towards a lower median, probably somewhere between the median Black IQ and the median IQ if the “parent” group. (at least according to what I’ve found on the topic, if not, I’d love to see the contrary data)

    “Every time a mutation resulted in a genetic change in a trait the next few generations would simply regress back to where the trait was before.”

    This doesn’t seem to answer the issue. Are mutations not relatively rare? Is it not the case that the most likely explanation for most instances of regression towards mean has to do with polygenic traits?

  • Mailinated

    Misspelling alert, under “core issues” tab someone wrote “white priviledge”

    Fix pls my autism gets triggered 2 much by this

  • Emil Kirkegaard

    You should the better term: regression towards the mean. Some people do get the misconception that regression goes to the mean. That would imply a heritability of 0 and make evolution impossible.

    See also: http://emilkirkegaard.dk/understanding_statistics/?app=regression_towards_the_mean