If
we look at SNP alleles associated with educational attainment, we see differences
between Europeans and sub-Saharan Africans. Is genetic drift the cause? Or natural selection?
IQ
has long been the yardstick of cognitive ability. As such, it describes
phenotype, not genotype: it measures how your inborn potential has developed in
your environment. Genotype is the inborn component of IQ. It can be inferred
from twin studies, family studies, and adoption studies, but those approaches
are indirect and far from perfect.
To
measure genotype directly, we need to identify the alleles that affect the
development of cognitive ability. We also need to measure the size of each allele’s
effect. Recently, much progress has been made. By using genome-wide association
studies (GWAS), researchers have identified many alleles that are associated
with educational attainment (EA). EA is not quite the same as IQ—it also
includes things like sitting still in class and brownnosing the teacher—but
it's a good approximation.
In
the most recent study of this sort, Lee et al. (2018) identified 1,271
single-nucleotide polymorphisms (SNPs) that are significantly associated with
high EA in a sample of over one million people of European ancestry. Together,
the SNPs can explain 11-13% of the variance in EA among individuals. This new
yardstick is called the "polygenic score."
The
polygenic score is more accurate for populations than for individuals. If we
compare the mean polygenic score of a population and its mean IQ, the
correlation is 90% (Piffer 2019). This high correlation is due to the logic of
sampling: to estimate the mean cognitive ability of a population, we don't have
to identify all of the relevant SNPs, just a large enough sample.
Like
mean IQ, the mean polygenic score differs among human populations. It seems to
have increased during the northward spread of modern humans out of Africa and
into the temperate zone of Europe and Asia, with East Asians having the highest
scores. This geographic pattern is in line with IQ data. The mean polygenic
score is also very high among Ashkenazi Jews and Finns, again in line with IQ
data (Piffer 2019).
Kevin Bird’s paper
The
above findings have been disputed by the American researcher Kevin Bird in a
recent paper. Although Europeans and sub-Saharan Africans have different
alleles at genes associated with educational attainment, he argues that these
differences correspond to small differences in cognitive ability. In fact, they
are more consistent with genetic drift than with natural selection.
To
prove his argument, he performed two analyses of the data: an Fst and a test
for polygenic selection. In my opinion, both analyses have serious problems.
The Fst
This
is the most common measure of genetic differentiation. If the Fst is low,
differentiation is trivial and consistent with genetic drift. If it is high, differentiation
is significant and consistent with natural selection.
For
SNPs associated with EA, Kevin Bird reports an Fst of 0.111. Is that low or
high? When Sewall Wright (1978, pp. 82-85) created this measure, he defined
four categories of differentiation:
0
to 0.05 - little genetic differentiation
0.05
to 0.15 - moderate genetic differentiation
0.15
to 0.25 - great genetic differentiation
0.25
to 1 - very great genetic differentiation
Those
categories are widely cited in the literature. A search in Google Scholar for "moderate genetic differentiation" and
"0.05 - 0.15" shows over two hundred papers.
So
does an Fst of 0.111 mean moderate genetic differentiation? Not according to
Kevin Bird, who sees nothing at all below a benchmark of 0.118. That benchmark may
be valid, but it cannot be easily verified and does not appear elsewhere in the
literature. Nor does Kevin explain why it is better than the ones put forward
by Sewall Wright. In fact, he makes no reference to them.
One
may also question the Fst of 0.111. For the data source, the reader is referred
to Lee et al. (2018), but that study was done only with European subjects.
Moreover, Kevin Bird used 1,259 SNPs to calculate that Fst, even though he found
only 685 SNPs that had data on both Africans and Europeans.
The
Fst of 0.111 seems to be the diversification of those SNPs in Europeans. That value is what would be expected,
but it says nothing about diversification between Europeans and sub-Saharan
Africans.
The polygenic
selection analysis
The
other analysis is more on subject. Kevin Bird compared European data with African
data as follows:
1.
First, he looked through the 1000 Genomes Project for SNP data on Europeans and
sub-Saharan Africans. He found data on five European-descended populations
(Utah residents, Tuscans, Finns, British, Iberians) and five African
populations (Yoruba, Luhya, Gambians, Mende, Esan). The two datasets had
information on 685 of the 1,271 SNPs associated with educational attainment.
2.
For each SNP, he noted the allele frequencies in Europeans and the allele
frequencies in sub-Saharan Africans.
3.
He calculated the differences in allele frequencies between the two groups. He
then weighted the differences for the allele's effect size (its estimated
positive or negative effect on educational attainment). For each allele, he
used two different estimates of effect size: one from between-family data and
the other from within-family data.
4.
Alongside this list of weighted alleles, he created a second list to simulate
genetic drift by randomly flipping the sign of effect size for 10,000
permutations.
5.
When effect size was calculated from between-family data, the two lists clearly
differed from each other. When it was calculated from within-family data, the
overall difference was much smaller and easily explained by genetic drift.
Bird
(2021) prefers the second dataset to the first, whereas Piffer (2019) prefers
the first. Who is right? All things being equal, data should come from within
families. There is less statistical noise because siblings have similar upbringings.
With less noise, group differences can more easily be identified.
Yet,
here, we have the opposite. We see a significant difference between Europeans
and Africans in the between-family data, but not in the within-family data.
Why? The reason is that the between-family data came from over a million
subjects whereas the within-family data came from 20,000 sibling pairs. Being
smaller, the second dataset had a lot more noise. Sure, there should have been
less noise, all things being equal. But some things weren't.
Doing the
comparison again but better
I
suspect Kevin Bird still prefers within-family data. Fine. Let's repeat the
comparison with a much larger sample of sibling pairs. There would then be less
noise and probably a significant difference between African and European
alleles in their effect on educational attainment. Kevin seems to anticipate
this eventuality:
While the results presented here are more consistent with neutral evolution rather than divergent natural selection, it is not possible to rule out that data sets with more power could present different results. Additionally, although within-family effect sizes are recommended over between-family effect sizes, if the within-family effect sizes are re-estimated for SNPs ascertained by a between-family GWAS, there is still likely to be some level of confounding from population structure. (Bird 2021, p. 7)
He
elaborates on the last point:
[...] the [polygenic] scores might be biased by a variety of factors, including the nonrandom ways that society is geographically structured [...]. For instance, Black people in the US, for reasons unrelated to genetics, live in areas with poorer air quality and more exposure to environmental toxins (Bird 2021, p. 8)
Yet,
as he notes further on, these SNP alleles were identified only in European
subjects, and their effects on educational attainment were estimated only from
European data. So how could different alleles among Europeans be spuriously associated with differences in
educational attainment among Europeans
because of socioeconomic deprivation among Black Americans? Where and when do
the latter come into this presumably spurious association?
Kevin
Bird is right to point out that the allele effects were calculated from
European data and may be less applicable to people of other origins. In fact,
there is growing evidence that the genetic architecture of cognition is
different in sub-Saharan Africans (Frost 2019). By ignoring that factor,
however, we introduce even more noise into the data and muddle even more any
differences that may exist between Africans and Europeans. The data may indeed
be of low quality, but that shortcoming would, if anything, obscure group
differences. Again, Kevin is making a coherent point within an incoherent
argument.
Other ways?
There
are other ways to distinguish between genetic drift and natural selection. One
way is to measure the ratio of nonsynonymous alleles to synonymous alleles. If
a trait has little functional value and is thus vulnerable to genetic drift,
nonsynonymous alleles will tend to proliferate and become as numerous as
synonymous alleles (Tomoko 1995). Of course, if nonsynonymous alleles greatly
outnumber synonymous alleles, there may be natural selection for diversity
(Rana et al. 1999).
An
SNP, by its very nature, has alleles that differ from each other by only one
base substitution, and this fact limits our ability to distinguish between
genetic drift and natural selection. It would thus be interesting to identify
genetic polymorphisms that are associated with educational attainment but have
several nucleotides.
If
such a polymorphism is undergoing genetic drift, the most frequent alleles will
be the ancestral allele and those that differ from it by one base substitution.
The less frequent ones will be those that differ by two or more base
substitutions. In short, the frequency of an allele will be inversely related
to the number of base substitutions that separate it from the ancestral allele.
The
picture is different with natural selection. The most frequent alleles will not
necessarily be the ones that differ the least from the ancestral allele. If
allele frequency is graphed as a function of base substitutions, the result
will not be a smoothly decreasing exponential curve. The most successful allele
may differ from the ancestral one by several base substitutions.
References
Bird,
K.A. (2021). No support for the hereditarian hypothesis of the Black-White
achievement gap using polygenic scores and tests for divergent selection. American Journal of Physical Anthropology.
Feb. 1-12, DOI: 10.1002/ajpa.24216.
https://www.gwern.net/docs/genetics/selection/2021-bird.pdf
Frost,
P. (2019). Differences in the genetic architecture of cognition? Evo and Proud, September 25
https://evoandproud.blogspot.com/2019/09/differences-in-genetic-architecture-of.html
Lee,
J. J., Wedow, R., Okbay, A., Kong, E., Maghzian, O., Zacher, et al. (2018).
Gene discovery and polygenic prediction from a genome-wide association study of
educational attainment in 1.1 million individuals. Nature Genetics 50(8): 1112-1121.
https://academicworks.medicine.hofstra.edu/cgi/viewcontent.cgi?article=5038&context=articles
Tomoko,
O. (1995). Synonymous and nonsynonymous substitutions in mammalian genes and
the nearly neutral theory. Journal of
Molecular Evolution 40 (1): 56-63
Piffer,
D. (2019). Evidence for Recent Polygenic Selection on Educational Attainment
and Intelligence Inferred from Gwas Hits: A Replication of Previous Findings
Using Recent Data. Psych 1(1): 55-75.
https://doi.org/10.3390/psych1010005
Rana,
B.K., D. Hewett-Emmett, L. Jin, B.H.J. Chang, N. Sambuughin, M. Lin, et al.
(1999). High polymorphism at the human melanocortin 1 receptor locus. Genetics 151(4): 1547-1557.
Wright
S. (1978). Evolution and Genetics of Populations,
Volume 4. University of Chicago, Chicago, IL.
1 comment:
This is an interesting post, but quite technical so I hesitated to comment on it. Would consanguineous marriage affect the result of a test for drift?
Post a Comment