Tuesday, April 27, 2021

Polygenic scores and Black Americans


Sunday Morning in Virginia (1877), by Winslow Homer. 

The ability to acquire language may be the mental domain where people of sub-Saharan African descent have undergone the most cognitive evolution since their separation from other humans.




If we look at SNP alleles associated with educational attainment, we see differences between Europeans and sub-Saharan Africans. In a previous post I asked whether the cause was genetic drift or natural selection (Frost 2021).


That post brought a comment on Twitter:


Or maybe the fact that educational attainment is based on whiteness and familial wealth in the USA but not in Africa? And familial wealth tends to be concentrated in specific closed groupings of people who only breed with each other?


I don’t think so. First, the alleles were identified in subjects from the Netherlands Twin Registry, the Finnish Twin Cohort, the Swedish Twin Registry, the Avon Longitudinal Study of Parents and Children, the UK Biobank and 23andme. Of those sources, only 23andme had American subjects.


Second, let's suppose that those alleles are incidentally related to educational attainment. Maybe they are just something that wealthy Europeans share with each other through inbreeding, a bit like the Habsburg jaw. Those alleles should therefore be useless for predicting educational attainment in other populations. Are they?


Let me answer that question by discussing two recent studies:



The Guo et al. study


Guo et al. (2019) used the same alleles to predict success on a cognitive test (verbal ability) by 8,078 Americans of different ethnic backgrounds. Two polygenic scores were calculated: one based on alleles associated with educational attainment (education PGS) and the other based on alleles associated with IQ (IQ PGS).


The polygenic scores significantly correlated with test results for all major ethnic backgrounds, except one:


The education PGS was significantly predictive of verbal ability in all estimated models and its coefficients were similar in size except for the black sample in which the coefficient was much smaller. The IQ PGS significantly predicted verbal ability in all samples except the black sample. (Guo et al. 2019)


[...] The incremental R2 s or the R2 s of "pure" PGS effects were 1.8%, 0.1%, 1%, 1.8%, 1.7%, and 1% for whites, blacks, Asians, Hispanic whites, the combined sample and the overall sample, respectively.


The literature is showing a consistent trend: polygenic scores have much less power to predict cognitive ability in people of sub-Saharan African descent than in people of European or Asian descent. In this case, the polygenic scores were ten to eighteen times worse at predicting verbal ability in Black Americans than they were at predicting verbal ability in White, Asian, and Hispanic White Americans.


Why? The reason may be that Eurasians and sub-Saharan Africans have different gene pools. Some alleles for higher cognitive ability are available in one gene pool but not in the other. There is undoubtedly overlap between the two, but not total overlap. Intelligent Nigerians, for instance, may owe their intelligence to alleles that exist only in sub-Saharan Africa.


To return to the Twitter comment, it seems clear that polygenic scores are predicting something that correlates with cognitive ability, and that "something" is not an artefact of wealthy people being related to each other and sharing the same genes. It's already a stretch to believe that close family ties are shared by high achievers throughout the United Kingdom, the Netherlands, Sweden, and Finland. Does the same family clique also include high achievers of Asian American origin? 



The Rabinowitz et al. study


Rabinowitz et al. (2019) used an education PGS to predict cognitive ability in Black American participants, specifically three cohorts from first grade to young adulthood (at which point their DNA was collected and analyzed).


The results? The PGS significantly correlated with pursuit of postsecondary education. The correlation was weak or insignificant, however, for performance on school tests. The PGS did not predict performance on a standardized reading test for any of the three cohorts, and it predicted performance on a standardized math test for only one of them. In addition, the PGS negatively correlated with having a criminal record (but only in the male subjects).


A problem here may be the young age of the participants. Cognitive ability seems to become less malleable and more hardwired with age. We can help children do better on IQ tests, but the improvement tends to disappear by adulthood (Frost 2008). Consequently, academic success in childhood may be too clouded by environmental factors to show a significant correlation with genetic factors.


On the other hand, the PGS did predict some things better than others. It predicted general academic success (pursuit of postsecondary education) and compliance with rules (absence of a criminal record). For actual school tests, it had some power to predict success on the reading test but none at all on the math test. The ability to acquire language may be the mental domain where people of sub-Saharan African descent have undergone the most cognitive evolution since their separation from other humans. The PGS cannot predict superior reading ability among Black Americans because too many of the relevant alleles are exclusive to the sub-Saharan African gene pool and remain to be identified by scientific studies.


The take-home message? At present, we can create polygenic scores that provide a rough idea of cognitive ability in people of sub-Saharan African descent. To get more than a rough idea, we need to identify the relevant alleles specific to that population.





Frost, P. (2008). IQ: Interaction between race and age. The Unz Review, May 20



Frost, P. (2021). The mismeasure of genetic differentiation. Evo and Proud, April 13



Guo, G., Lin, M.J., and K.M. Harris. (2019). Socioeconomic and Genomic Roots of Verbal Ability. bioRxiv, 544411.



Rabinowitz, J.A., S.I.C. Kuo, W. Felder, R.J. Musci, A. Bettencourt, K. Benke, ... and A. Kouzis. (2019). Associations between an educational attainment polygenic score with educational attainment in an African American sample. Genes, Brain and Behavior, e12558.


Tuesday, April 20, 2021

Selection for fair skin in Europeans and North Asians


Selection for fair skin in different human populations (Huang et al. 2021)

Selection for fair skin was about four times stronger among ancestral Europeans than it was among ancestral North Asians or the earlier shared ancestors of both groups. So says a recent genome study.


Huang et al. (2021) examined genes that influence skin pigmentation to calculate the strength of selection for lighter skin among the ancestors of today’s Europeans and North Asians. They concluded that selection for lighter skin was strongest among the unique ancestors of present-day Europeans, with a selection pressure of 25.9. It was about four times weaker among the unique ancestors of North Asians (5.61) and the earlier shared ancestors of both groups (6.5). East Asians actually became darker after they split from North Asians, with a negative selection pressure of -5.53.


Our estimate shows that the modern European lineage had the largest selective pressure (s4=0.0259/generation) on light pigmentation than the other branches, suggesting that recent natural selection favoured light pigmentation in Europeans. Recent studies using ancient DNA could support our observation of recent directional selection in Europeans (Huang et al. 2021, p. 3)


This finding supports earlier findings. Modern humans remained dark-skinned in Europe long after they had spread north into northern latitudes some 45,000 years ago. It was not until 20,000 years ago that alleles for white skin made their appearance (Beleza et al. 2013; Canfield et al. 2014; Norton and Hammer 2007). As a Science correspondent concluded: "The implication is that our European ancestors were brown-skinned for tens of thousands of years" (Gibbons 2007).


Those ancestors were initially proto-Eurasians, and it was only later that they differentiated to become respectively Europeans and North Asians. Only then, and only in the European lineage, did skin color begin to lighten at a fast rate. This rapid evolution seems to have been confined to a relatively small area that stretched from the Baltic to central Siberia. Elsewhere, in western and southern Europe, people remained dark-skinned until almost the dawn of history, as shown by DNA dated to 11,000 years ago from England, 8,000 years ago from Luxembourg, and 7,000 years ago from Spain (Brace et al. 2019; Lazaridis et al. 2014; Olalde et al. 2014).


The fair skin phenotype, together with a variety of hair and eye colors, would later spread throughout all of Europe, while going extinct east of the Urals. In the latter region it would persist into historic times. At sites in south-central Siberia, dating from the third millennium BC to the fourth century AD, genetic analysis has shown that most of the buried individuals had blue or green eyes, light hair (blond, red, light brown), and light skin (Bouakaze et al. 2009). South Siberian peoples were, in fact, described as having "green eyes" and "red hair" in old Chinese records (Keane 1886, p. 703).


It seems that Europeans acquired their current appearance very fast, perhaps ten to twenty thousand years ago during the last ice age. Initially confined to northeastern Europe and parts of Siberia, the new phenotype would in time spread to the rest of the continent ... on the eve of recorded history. Only then did all Europeans come to look “European” (Frost 2014; Frost 2020).




Beleza, S., A.M. Santos, B. McEvoy, I. Alves, C. Martinho, E. Cameron, et al. (2013). The timing of pigmentation lightening in Europeans. Molecular Biology and Evolution 30(1): 24-35. https://doi.org/10.1093/molbev/mss207


Bouakaze, C., C. Keyser, E. Crubézy, D. Montagnon, and B. Ludes. (2009). Pigment phenotype and biogeographical ancestry from ancient skeletal remains: inferences from multiplexed autosomal SNP analysis. International Journal of Legal Medicine 123(4): 315-325.



Brace, S., Y. Diekmann, T.J. Booth, Z. Faltyskova, N. Rohland, S. Mallick, et al. (2019). Ancient genomes indicate population replacement in Early Neolithic Britain. Nature Ecology & Evolution 3(5): 765-771. https://doi.org/10.1038/s41559-019-0871-9


Canfield, V.A., A. Berg, S. Peckins, S.M. Wentzel, K.C. Ang, S. Oppenheimer, and K.C. Cheng. (2014). Molecular phylogeography of a human autosomal skin color locus under natural selection. G3, 3(11): 2059-2067. https://doi.org/10.1534/g3.113.007484


Frost, P. (2014). The puzzle of European hair, eye, and skin color. Advances in Anthropology 4(2): 78-88. http://www.scirp.org/journal/PaperInformation.aspx?PaperID=46104


Frost, P. (2020). White Skin Privilege: Modern Myth, Forgotten Past. Evolutionary Studies in Imaginative Culture 4(2): 63-82. https://doi.org/10.26613/esic/4.2.190



Gibbons, A. (2007). American Association of Physical Anthropologists Meeting: European skin turned pale only recently, gene suggests. Science 20 April 2007, 316(5823): 364.



Huang, X., S. Wang, L. Jin, and Y. He. (2021). Dissecting dynamics and differences of selective pressures in the evolution of human pigmentation. Biology Open 15 February 2021; 10(2): bio056523. https://doi.org/10.1242/bio.056523


Keane, A.H. (1886). Asia with Ethnological Appendix. London: Edward Stanford.


Lazaridis, I., N. Patterson, A. Mittnik, G. Renaud, S. Mallick, K. Kirsanow, et al. (2014). Ancient human genomes suggest three ancestral populations for present-day Europeans. Nature 513(7518): 409-413. https://doi.org/10.1038/nature13673


Norton, H.L., and M.F. Hammer. (2007). Sequence variation in the pigmentation candidate gene SLC24A5 and evidence for independent evolution of light skin in European and East Asian populations. Program of the 77th Annual Meeting of the American Association of Physical Anthropologists, p. 179.


Olalde, I., M.E. Allentoft, F. Sanchez-Quinto, G. Santpere, C.W.K. Chiang, M. DeGiorgio, et al. (2014). Derived immune and ancestral pigmentation alleles in a 7,000-year-old Mesolithic European. Nature 507 (7491): 225-228. https://doi.org/10.1038/nature12960


Tuesday, April 13, 2021

The mismeasure of genetic differentiation


Red Tree, Piet Mondrian (1908-10)

If we look at SNP alleles associated with educational attainment, we see differences between Europeans and sub-Saharan Africans. Is genetic drift the cause? Or natural selection?



IQ has long been the yardstick of cognitive ability. As such, it describes phenotype, not genotype: it measures how your inborn potential has developed in your environment. Genotype is the inborn component of IQ. It can be inferred from twin studies, family studies, and adoption studies, but those approaches are indirect and far from perfect.


To measure genotype directly, we need to identify the alleles that affect the development of cognitive ability. We also need to measure the size of each allele’s effect. Recently, much progress has been made. By using genome-wide association studies (GWAS), researchers have identified many alleles that are associated with educational attainment (EA). EA is not quite the same as IQ—it also includes things like sitting still in class and brownnosing the teacher—but it's a good approximation.


In the most recent study of this sort, Lee et al. (2018) identified 1,271 single-nucleotide polymorphisms (SNPs) that are significantly associated with high EA in a sample of over one million people of European ancestry. Together, the SNPs can explain 11-13% of the variance in EA among individuals. This new yardstick is called the "polygenic score."


The polygenic score is more accurate for populations than for individuals. If we compare the mean polygenic score of a population and its mean IQ, the correlation is 90% (Piffer 2019). This high correlation is due to the logic of sampling: to estimate the mean cognitive ability of a population, we don't have to identify all of the relevant SNPs, just a large enough sample.


Like mean IQ, the mean polygenic score differs among human populations. It seems to have increased during the northward spread of modern humans out of Africa and into the temperate zone of Europe and Asia, with East Asians having the highest scores. This geographic pattern is in line with IQ data. The mean polygenic score is also very high among Ashkenazi Jews and Finns, again in line with IQ data (Piffer 2019).



Kevin Bird’s paper


The above findings have been disputed by the American researcher Kevin Bird in a recent paper. Although Europeans and sub-Saharan Africans have different alleles at genes associated with educational attainment, he argues that these differences correspond to small differences in cognitive ability. In fact, they are more consistent with genetic drift than with natural selection.


To prove his argument, he performed two analyses of the data: an Fst and a test for polygenic selection. In my opinion, both analyses have serious problems.


The Fst


This is the most common measure of genetic differentiation. If the Fst is low, differentiation is trivial and consistent with genetic drift. If it is high, differentiation is significant and consistent with natural selection.


For SNPs associated with EA, Kevin Bird reports an Fst of 0.111. Is that low or high? When Sewall Wright (1978, pp. 82-85) created this measure, he defined four categories of differentiation:


0 to 0.05 - little genetic differentiation

0.05 to 0.15 - moderate genetic differentiation

0.15 to 0.25 - great genetic differentiation

0.25 to 1 - very great genetic differentiation


Those categories are widely cited in the literature. A search in Google Scholar for "moderate genetic differentiation" and "0.05 - 0.15" shows over two hundred papers.


So does an Fst of 0.111 mean moderate genetic differentiation? Not according to Kevin Bird, who sees nothing at all below a benchmark of 0.118. That benchmark may be valid, but it cannot be easily verified and does not appear elsewhere in the literature. Nor does Kevin explain why it is better than the ones put forward by Sewall Wright. In fact, he makes no reference to them.


One may also question the Fst of 0.111. For the data source, the reader is referred to Lee et al. (2018), but that study was done only with European subjects. Moreover, Kevin Bird used 1,259 SNPs to calculate that Fst, even though he found only 685 SNPs that had data on both Africans and Europeans.


The Fst of 0.111 seems to be the diversification of those SNPs in Europeans. That value is what would be expected, but it says nothing about diversification between Europeans and sub-Saharan Africans.


The polygenic selection analysis


The other analysis is more on subject. Kevin Bird compared European data with African data as follows:


1. First, he looked through the 1000 Genomes Project for SNP data on Europeans and sub-Saharan Africans. He found data on five European-descended populations (Utah residents, Tuscans, Finns, British, Iberians) and five African populations (Yoruba, Luhya, Gambians, Mende, Esan). The two datasets had information on 685 of the 1,271 SNPs associated with educational attainment.


2. For each SNP, he noted the allele frequencies in Europeans and the allele frequencies in sub-Saharan Africans.


3. He calculated the differences in allele frequencies between the two groups. He then weighted the differences for the allele's effect size (its estimated positive or negative effect on educational attainment). For each allele, he used two different estimates of effect size: one from between-family data and the other from within-family data.


4. Alongside this list of weighted alleles, he created a second list to simulate genetic drift by randomly flipping the sign of effect size for 10,000 permutations.


5. When effect size was calculated from between-family data, the two lists clearly differed from each other. When it was calculated from within-family data, the overall difference was much smaller and easily explained by genetic drift.


Bird (2021) prefers the second dataset to the first, whereas Piffer (2019) prefers the first. Who is right? All things being equal, data should come from within families. There is less statistical noise because siblings have similar upbringings. With less noise, group differences can more easily be identified.


Yet, here, we have the opposite. We see a significant difference between Europeans and Africans in the between-family data, but not in the within-family data. Why? The reason is that the between-family data came from over a million subjects whereas the within-family data came from 20,000 sibling pairs. Being smaller, the second dataset had a lot more noise. Sure, there should have been less noise, all things being equal. But some things weren't.



Doing the comparison again but better


I suspect Kevin Bird still prefers within-family data. Fine. Let's repeat the comparison with a much larger sample of sibling pairs. There would then be less noise and probably a significant difference between African and European alleles in their effect on educational attainment. Kevin seems to anticipate this eventuality:


While the results presented here are more consistent with neutral evolution rather than divergent natural selection, it is not possible to rule out that data sets with more power could present different results. Additionally, although within-family effect sizes are recommended over between-family effect sizes, if the within-family effect sizes are re-estimated for SNPs ascertained by a between-family GWAS, there is still likely to be some level of confounding from population structure. (Bird 2021, p. 7)


He elaborates on the last point:


[...] the [polygenic] scores might be biased by a variety of factors, including the nonrandom ways that society is geographically structured [...]. For instance, Black people in the US, for reasons unrelated to genetics, live in areas with poorer air quality and more exposure to environmental toxins (Bird 2021, p. 8)


Yet, as he notes further on, these SNP alleles were identified only in European subjects, and their effects on educational attainment were estimated only from European data. So how could different alleles among Europeans be spuriously associated with differences in educational attainment among Europeans because of socioeconomic deprivation among Black Americans? Where and when do the latter come into this presumably spurious association?


Kevin Bird is right to point out that the allele effects were calculated from European data and may be less applicable to people of other origins. In fact, there is growing evidence that the genetic architecture of cognition is different in sub-Saharan Africans (Frost 2019). By ignoring that factor, however, we introduce even more noise into the data and muddle even more any differences that may exist between Africans and Europeans. The data may indeed be of low quality, but that shortcoming would, if anything, obscure group differences. Again, Kevin is making a coherent point within an incoherent argument.



Other ways?


There are other ways to distinguish between genetic drift and natural selection. One way is to measure the ratio of nonsynonymous alleles to synonymous alleles. If a trait has little functional value and is thus vulnerable to genetic drift, nonsynonymous alleles will tend to proliferate and become as numerous as synonymous alleles (Tomoko 1995). Of course, if nonsynonymous alleles greatly outnumber synonymous alleles, there may be natural selection for diversity (Rana et al. 1999).


An SNP, by its very nature, has alleles that differ from each other by only one base substitution, and this fact limits our ability to distinguish between genetic drift and natural selection. It would thus be interesting to identify genetic polymorphisms that are associated with educational attainment but have several nucleotides.


If such a polymorphism is undergoing genetic drift, the most frequent alleles will be the ancestral allele and those that differ from it by one base substitution. The less frequent ones will be those that differ by two or more base substitutions. In short, the frequency of an allele will be inversely related to the number of base substitutions that separate it from the ancestral allele.


The picture is different with natural selection. The most frequent alleles will not necessarily be the ones that differ the least from the ancestral allele. If allele frequency is graphed as a function of base substitutions, the result will not be a smoothly decreasing exponential curve. The most successful allele may differ from the ancestral one by several base substitutions.





Bird, K.A. (2021). No support for the hereditarian hypothesis of the Black-White achievement gap using polygenic scores and tests for divergent selection. American Journal of Physical Anthropology. Feb. 1-12, DOI: 10.1002/ajpa.24216.



Frost, P. (2019). Differences in the genetic architecture of cognition? Evo and Proud, September 25



Lee, J. J., Wedow, R., Okbay, A., Kong, E., Maghzian, O., Zacher, et al. (2018). Gene discovery and polygenic prediction from a genome-wide association study of educational attainment in 1.1 million individuals. Nature Genetics 50(8): 1112-1121.



Tomoko, O. (1995). Synonymous and nonsynonymous substitutions in mammalian genes and the nearly neutral theory. Journal of Molecular Evolution 40 (1): 56-63


Piffer, D. (2019). Evidence for Recent Polygenic Selection on Educational Attainment and Intelligence Inferred from Gwas Hits: A Replication of Previous Findings Using Recent Data. Psych 1(1): 55-75. https://doi.org/10.3390/psych1010005   


Rana, B.K., D. Hewett-Emmett, L. Jin, B.H.J. Chang, N. Sambuughin, M. Lin, et al. (1999). High polymorphism at the human melanocortin 1 receptor locus. Genetics 151(4): 1547-1557.



Wright S. (1978). Evolution and Genetics of Populations, Volume 4. University of Chicago, Chicago, IL.