# Forbidden Regressions: Testing Racialism as a Scientific Hypothesis

It is important to distinguish between racism and racialism. The word ‘racism’ emerged in the late-1930s with the first batch of Boasian antiracists, in what may be thought of as the heroic age of Boasian antiracism, when they began to agitate against National Socialism. It did the same work in the popular mind that ‘race prejudice’ had done earlier. For the intellectuals, at least for the small minority of Boasians who popularized it, it did a lot more work. The difference between race prejudice and racism was that the former singled out particularly hateful or enthusiastic instances or individuals for censure, whereas the latter pointed towards the systematic nature of racial discrimination and domination. The modern way of thinking about racism simply did not exist in that foreign country; race prejudice did not mean what racism would come to mean. The term itself took off with the 1960s antisystemic turn, and achieved its present hegemonic status in the 1990s — marking the rise of Boasian antiracism as the hegemonic ideology of Western elites in general and American elites in particular. The Google Ngram graph displayed below ends in 2008. Since then there has been a third surge, as Zach Goldberg and others are documenting under the rubric of ‘the Great Awokening‘.

Racialism is not racism. Racialism is actually a composite of two separate scientific hypotheses. The first asserts that there are discrete geographic races or allopatric subspecies in man. This is a scientific hypothesis in the strictest sense. The concept of allopatric subspecies goes to the very heart of the process of speciation in the modern evolutionary synthesis. In Ernst Mayr’s now classic account, speciation occurs due to the fission of extant species as a result of prolonged genetic isolation — usually as a consequence of geographic isolation or allopatry. In this frame, subspecies are incipient species that are more or less along the road to full speciation (which occurs when they can coexist with each other sympatrically, ie in the same space, without procreating in the wild) depending on how long ago they split. The crucial question for raciation or subspeciation in any species is thus splitting time, or time to last common ancestor. That is the basis for phylogeny across the taxonomic hierarchy, including at the intraspecific level.

Until 1987, it was believed by all scientific authorities that continental human populations had been genetically isolated from each other for a million years during the Pleistocene. In that year, the splitting time of continental populations was drastically reduced with two scientific milestones. First, Rebecca Louise Cann, Mark Stoneking, and Allan Charles Wilson’s paper “Mitochondrial DNA and Human Evolution” appeared in Nature. They showed that coalescence times for Mitochondrial DNA could not be much more than 200,000 years. In the same year, the arrival of thermoluminescence dating placed a similar upper bound on the oldest remains of Homo sapiens in Africa. The splitting time of continental populations would continue to fall for the next few decades as tighter bounds were obtained in both radiometric dating and molecular estimates of time to last common ancestor. Current estimates suggest that the splitting time of present day continental populations cannot be much more than 50 thousand years — in most cases, considerably less.

These developments immediately called into question the hitherto strongly-held assumption of the existence of geographic races in man; although it would be a while before biologists and anthropologists would abandon the idea. As late as 2002, we find the grand old man himself attesting to the reality of geographic races in man. Although in Mayr’s defense, we must point out his extremely enthusiastic review of Coon’s The Origin of Races in 1962. It seems that Mayr may have been personally responsible for saving the subspecies concept after Wilson and Brown’s assault in 1954, including a proposal to abolish the formal trinomial nomenclature, and the wide agreement it found among systematists — as attested to by the responses it got in Systematic Zoology. (More on the 1950s subspecies fight in systematics another day.)

The upshot of the recency of continental populations was that not enough time had passed for them to have been very far along in the process of speciation. They’re too recently split to have undergone much derivation at all. Continental populations can therefore hardly be described as incipient species or subspecies. Despite mankind’s global range, there are no subspecies in man because of the recency of the splitting time of continental populations. Put simply, we belong to the same geographic race — the African one. The scientific discovery of recency warmed the hearts of Boasian antiracists. The Out-of-Africa consensus played no small part is the self-assurance of Boasian antiracism in the bright new antiracist morning of the 1990s. The scientific fact of recency, however, had genocidal implications for our deep past. For there were, in fact, geographic races in man — races that were largely eliminated by our race during the Pleistocene. This story would be repeated again and again throughout our deep history. The extermination and replacement of New World populations, most successfully in Tasmania, being just the most recent iteration.

Within the parameters of the evolving ideology of Boasian antiracism, any discussion of biological differences between populations became increasingly taboo. Some time in the 2000s, I believe, the claim that biological races exist in our species became inadmissible in polite company. Races were to be regarded as no more than social constructs; indeed, real abstractions with often violent and nefarious consequences. This is no doubt true. But the discovery of the nonexistence of races in man was not what motivated the Boasian assault on the race concept! That was prompted by the second hypothesis of racialism.

This second hypothesis is racialism sensu stricto. It too is a scientific hypothesis. It says that the large-scale pattern of world order, the manifest asymmetries of competence, income, wealth and power among populations, in short, the polarization of the world, is all a consequence of a simple fact. This simple fact, that Boasian antiracism has placed beyond the Overton Window, is that, not only are there discrete anthropological races in man, they are differentially endowed; above all, in cognitive ability and behavioral propensities like patience, risk aversion, altruism, reciprocity and trust. And this differential endowment is what explains world order. This explanatory work performed by racialism is the most important reason why it enjoyed the hold that it did on men’s minds throughout the twentieth century.

At the same time that Boasian antiracism emerged as hegemonic, another set of scientific developments were afoot that would reinscribe racialism as a scientific hypothesis not in an explicit theory of the natural hierarchy of the races but more generally in genetic differences between populations. This was the upgrading of racialism with the revolutions in molecular anthropology. It would say: It does not matter whether or not there are geographic races in man. As long as they are genetically differentiated, world order may still be explained by biological differences between populations. While many neoracialists clung to old school racial categories, a new breed of biologically inclined economists would explicitly theorize racialism in its new guise as biological reductionism. I had not realized the scale of this literature when I wrote about my failure to replicate Galor’s hump-shaped relationship between genetic variation (a function of distance from Africa) and societal achievement.

The work of Becker, Enke and Falk, and others in this genre, is still half-baked. We shall see how laying it out explicitly allows us to test it properly. Since I am a “ronin scholar” (as an anonymous commentator described me in some other corner of the Internet), I am free to publish otherwise forbidden regressions.

The good thing about racialism is that it is a straightforward hypothesis to test. For if racialism holds as a scientific hypothesis, then phylogenetic distance between populations should explain their differential socioeconomic performance. This is the crux of racialism; everything else is secondary. Instead of mucking around in mediating variables like propensities for patience and suchlike, let us test whether phylogenetic distances between populations track differences in socioeconomic development indices better than … better than what?

The Policy Tensor has long argued that large-scale structure in the human world is due to the large-scale structure of our lifeworld. Specifically, what explains world order — above all, the polarization of the world between the developed north and the underdeveloped south — is the vicious cycle between disease burdens and nutrition. Health insults steal the nutrients that would otherwise be channeled into greater adult height and higher cognitive ability. What population differences in cognitive test scores measure is net nutritional status — the combined effect of nutrition and disease on health status. It’s the same with height, modulo the African puzzle.

We therefore propose a null model that explains population differences in both outcome variables (income per capita, sociodemographic index, state capacity) and mediating health variables (mean adult height, cognitive test scores, and BMI) as functions of differences in nutrition and disease burdens. Nutrition will be proxied by protein intake per capita obtained from FAO’s Food Balances. Disease burdens will be proxied by life expectancy, which is a function of disease mortality rates, obtained from GHDx, where we also obtain data on per capita income, sociodemographic index and state capacity. We use the infamous international cognitive test score dataset of Lynn and Vanhanen (2009).

Racialism or the reductionist hypothesis says that population differences in both the mediating and outcome variables are a function of phylogenetic distance. We use the measure of genetic distance (Fst) compiled by Spolaore and Wacziarg (2017) that is based on the microsatellite data of Pemberton, DeGiorgio and Rosenberg (2013). We obtain a measure of phylogenetic distance from the raw genetic distances. (Using raw genetic distance yield similar results.) Phylogenetic distance or patristic distance is distance measured along the phylogenetic tree — our best estimate of splitting time between populations. We use the UPGMA algorithm and the Euclidean distance metric to obtain the following ultrametric phylogenetic tree.

We know that the mediating variables (mean adult height, mean test scores, mean BMI) are heritable, so we expect at least some effect of phylogenetic distance. Indeed, twin studies show that the heritability of adult height is around 0.9, that of adult BMI is around 0.8, and that of adult cognitive ability is 0.6. We show that, despite this heritability at the individual level, distance in health status swamps phylogenetic distance for the mediating variables, and makes it irrelevant or worse, for the outcome variables.

Our econometric strategy is as follows. We have data on 72 populations (adjusted for ethnic composition in the computation of phylogenetic distances). We obtain absolute distances from the observations for each dyad, $p_{ij}=|p_{i}-p_{j}|$. We center and rescale all predictors to have mean 0 and variance 1. We then estimate mixed-effects models of the following form:

$p_{ij}=\beta_{0}+\beta_{1}\text{Disease}_{ij}+\beta_{2}\text{Nutrition}_{ij}+\beta_{3}\text{PhyloDist}_{ij}+\mu_{i}+\mu_{j}+\epsilon_{ij}$,

where $\mu_{i} \sim N(0,\sigma^2_{\mu})$ and $\mu_{j} \sim N(0,\sigma^2_{\mu})$ are random effects for country $i$ and $j$, and the disturbance term $\epsilon_{ij} \sim N(0,\sigma^2_{\epsilon})$ is orthogonal to the country random effects. As a robustness check, we also control for absolute latitude, geodesic distance, and an interaction term for nutrition and disease. (Results available upon request.)

We begin with the proximate responses, starting with mean adult stature. We find that nutrition is the strongest predictor of mean adult height. The slope for phylogenetic distance is indistinguishable from zero. Even though adult height has a heritability coefficient of 0.9, phylogenetic distance is an extremely poor predictor of population differences in height.

 Table 1. Mean adult height. Slope std error F-stat P Intercept -0.04 0.05 0.7 0.40 Disease 0.10 0.02 25.8 0.00 Nutrition 0.65 0.02 1773.4 0.00 PhyloDist 0.03 0.03 1.1 0.30 Source: Spolaore and Wacziarg (2017), GHDx, author’s computations. All predictors are centered and rescaled to have mean 0 and variance 1. Maximum Likelihood estimates with N=2,622 dyads. We display the F-statistic and the p-value for the likelihood ratio test of the null that the slope vanishes. Estimates in bold are significant at the 5 percent level.

Table 2 reports the ML estimates for the body mass index. We find that phylogenetic distance is the most significant predictor of difference in BMI. As we have noted before, BMI is strongly correlated with bi-iliac width or pelvic bone width, and that it is more deeply canalized than femur length or stature. It is therefore reassuring to find that phylogenetic distance is a stronger predictor of BMI than disease burdens and nutrition. BMI contains information on climatic adaptation and population history during the Pleistocene which confounds the economic interpretation of the cross-sectional variation in this variable.

 Table 2. Body Mass Index. Slope std error F-stat P Intercept 0.22 0.04 26.9 0.00 Disease 0.07 0.02 11.8 0.00 Nutrition 0.25 0.02 221.2 0.00 PhyloDist 0.36 0.03 168.1 0.00 Source: Spolaore and Wacziarg (2017), GHDx, author’s computations. All predictors are centered and rescaled to have mean 0 and variance 1. Maximum Likelihood estimates with N=2,622 dyads. We display the F-statistic and the p-value for the likelihood ratio test of the null that the slope vanishes. Estimates in bold are significant at the 5 percent level.

Table 3 reports the estimates for mean cognitive test scores. We find that differences in disease burdens are the strongest predictor of differences in cognitive test scores, followed by nutrition and phylogenetic distance. Together, the heath status variables have an elasticity of 0.85, compared to 0.30 for phylogenetic distance. These relative magnitudes are much more asymmetric than what we should expect given the heritability estimates of cognitive ability, suggesting that the environment at least partially swamps heritable differences in cognitive ability.

 Table 3. Cognitive test scores. Slope std error F-stat P Intercept 0.17 0.05 10.7 0.00 Disease 0.53 0.02 805.0 0.00 Nutrition 0.32 0.01 462.4 0.00 PhyloDist 0.30 0.02 150.3 0.00 Source: Spolaore and Wacziarg (2017), Lynn and Vanhanen (2009), GHDx, author’s computations. All predictors are centered and rescaled to have mean 0 and variance 1. Maximum Likelihood estimates with N=2,622 dyads. We display the F-statistic and the p-value for the likelihood ratio test of the null that the slope vanishes. Estimates in bold are significant at the 5 percent level.

So there is some evidence of the relevance of phylogenetic distance in understanding population differences in the mediating variables. We now move on to outcome variables, starting with per capita income.

Table 4 reports our estimates for population differences in log per capita GDP. We find that the gradient of phylogenetic distance is small and only marginally significant. Meanwhile, the gradients for disease and nutrition are large and highly significant. Biological reductionism thus fails in the most important explanandum of them all.

 Table 4. Log per capita income. Slope std error F-stat P Intercept -0.02 0.06 0.1 0.71 Disease 0.43 0.02 595.1 0.00 Nutrition 0.40 0.01 861.4 0.00 PhyloDist 0.05 0.02 4.0 0.05 Source: Spolaore and Wacziarg (2017), GHDx, author’s computations. All predictors are centered and rescaled to have mean 0 and variance 1. Maximum Likelihood estimates with N=2,622 dyads. We display the F-statistic and the p-value for the likelihood ratio test of the null that the slope vanishes. Estimates in bold are significant at the 5 percent level.

Table 5 reports our estimates for population differences in the Sociodemographic Index (SDI), a function of income per person, fertility rate, and literacy rate. It is the probably the best proxy we have to measure the modernization process. Astonishingly, not only is phylogenetic distance not a good predictor, it bears the wrong sign! Meanwhile, both nutrition and disease burdens sport extremely large and robust gradients.

 Table 5. Sociodemographic index. Slope std error F-stat P Intercept -0.10 0.04 5.3 0.02 Disease 0.47 0.02 799.4 0.00 Nutrition 0.51 0.01 1541.9 0.00 PhyloDist -0.10 0.02 20.6 0.00 Source: Spolaore and Wacziarg (2017), GHDx, author’s computations. All predictors are centered and rescaled to have mean 0 and variance 1. Maximum Likelihood estimates with N=2,622 dyads. We display the F-statistic and the p-value for the likelihood ratio test of the null that the slope vanishes. Estimates in bold are significant at the 5 percent level.

Table 6 reports our estimates for the state capacity index. Again, phylogenetic distance turns out to have the wrong sign. Meanwhile, both nutrition and especially disease burdens sport extremely large and robust gradients. Is this really all there is to the hegemonic ideology of the twentieth century?

 Table 6. State capacity index. Slope std error F-stat P Intercept -0.14 0.04 11.5 0.00 Disease 0.63 0.02 1185.7 0.00 Nutrition 0.32 0.01 497.5 0.00 PhyloDist -0.23 0.02 97.3 0.00 Source: Spolaore and Wacziarg (2017), GHDx, author’s computations. All predictors are centered and rescaled to have mean 0 and variance 1. Maximum Likelihood estimates with N=2,622 dyads. We display the F-statistic and the p-value for the likelihood ratio test of the null that the slope vanishes. Estimates in bold are significant at the 5 percent level.

The last three tables are devastating for racialism as a scientific hypothesis. Of course, we already suspected that we would find something like this since parts of the world that are now rich and powerful are actually genetically disadvantaged relative to regions that are poor and weak. But these results raise an interesting question. If reality has an antiracist bias, then why are these regressions forbidden by antiracism? It is almost as if Boasian antiracists are themselves afraid that they would not like what we would find if we were to shine a bright light into this dark corner. Indeed, intellectual oppression of any sort is a backhanded complement of sorts since it is premised on the suspicion that the outcome of investigation will be unattractive to the supressor’s agenda. This suggests that underneath the thin façade of antiracist discourse, Boasians themselves are convinced that there is empirical support for racialism. As someone I respect greatly told me when I first showed them some of my early superficial results in horror: Perhaps it is right that we not talk of such things. Maybe the world is awful like that. But we are under no obligation to talk about it. Perhaps it’s best to shut up when reality is so politically unpalatable. I did not listen to him then and kept investigating. What I have found is that there is nothing to hide. We have nothing to lose but our own unfounded suspicions.

All of the above underscores the recency and superficiality of Boasian antiracism. Hegemonic it may be. But right underneath it lurk racialist assumptions about the way the world as it actually is.

Appendix. We report the results of our robustness tests. We test for two possibilities. One is that phylogenetic distance contains a weaker signal than genetic distance. The other possibility is that our econometric strategy yields spurious results. In all tables below, we report, in that order, OLS estimates without controlling for country effects, OLS estimates controlling for country fixed-effects (the preferred strategy of Becker, Enke, and Falk), and maximum likelihood estimates with country random effects, as reported above — all three first with phylogenetic distance and then with genetic distance. Except in the models with country fixed effects, the intercept is included in the regression but not displayed. We can see that our estimation strategy and feature choice has a marginal effect on the gradient estimates. Our results are robust to both possibilities.

 Table A1. Height. OLS Slope std error P Disease 0.11 0.02 0.00 Nutrition 0.67 0.02 0.00 Phylogenetic –0.07 0.02 0.00 OLS country fixed effects Slope std error P Disease 0.10 0.02 0.00 Nutrition 0.65 0.02 0.00 Phylogenetic 0.04 0.03 0.17 ML country random effects Slope std error P Disease 0.10 0.02 0.00 Nutrition 0.65 0.02 0.00 Phylogenetic 0.03 0.03 0.30 OLS Slope std error P Disease 0.11 0.02 0.00 Nutrition 0.67 0.02 0.00 Genetic -0.05 0.02 0.00 OLS country fixed effects Slope std error P Disease 0.10 0.02 0.00 Nutrition 0.65 0.02 0.00 Genetic 0.04 0.02 0.14 ML country random effects Slope std error P Disease 0.10 0.02 0.00 Nutrition 0.66 0.02 0.00 Genetic 0.02 0.02 0.28
 Table A2. Body Mass Index. OLS Slope std error P Disease 0.08 0.02 0.00 Nutrition 0.26 0.02 0.00 Phylogenetic 0.32 0.02 0.00 OLS country fixed effects Slope std error P Disease 0.07 0.02 0.00 Nutrition 0.25 0.02 0.00 Phylogenetic 0.37 0.03 0.00 ML country random effects Slope std error P Disease 0.07 0.02 0.00 Nutrition 0.25 0.02 0.00 Phylogenetic 0.36 0.03 0.00 OLS Slope std error P Disease 0.09 0.02 0.00 Nutrition 0.27 0.02 0.00 Genetic 0.24 0.02 0.00 OLS country fixed effects Slope std error P Disease 0.03 0.02 0.20 Nutrition 0.27 0.02 0.00 Genetic 0.38 0.03 0.00 ML country random effects Slope std error P Disease 0.04 0.02 0.04 Nutrition 0.27 0.02 0.00 Genetic 0.35 0.02 0.00
 Table A3. Cognitive Test Scores. OLS Slope std error P Disease 0.60 0.02 0.00 Nutrition 0.26 0.02 0.00 Phylogenetic 0.29 0.02 0.00 OLS country fixed effects Slope std error P Disease 0.53 0.02 0.00 Nutrition 0.32 0.01 0.00 Phylogenetic 0.30 0.03 0.00 ML country random effects Slope std error P Disease 0.53 0.02 0.00 Nutrition 0.32 0.01 0.00 Phylogenetic 0.30 0.02 0.00 OLS Slope std error P Disease 0.59 0.02 0.00 Nutrition 0.27 0.02 0.00 Genetic 0.27 0.02 0.00 OLS country fixed effects Slope std error P Disease 0.48 0.02 0.00 Nutrition 0.33 0.01 0.00 Genetic 0.33 0.02 0.00 ML country random effects Slope std error P Disease 0.49 0.02 0.00 Nutrition 0.33 0.01 0.00 Genetic 0.32 0.02 0.00
 Table A4. Log per capita GDP. OLS Slope std error P Disease 0.45 0.02 0.00 Nutrition 0.37 0.02 0.00 Phylogenetic 0.11 0.02 0.00 OLS country fixed effects Slope std error P Disease 0.43 0.02 0.00 Nutrition 0.40 0.01 0.00 Phylogenetic 0.04 0.02 0.12 ML country random effects Slope std error P Disease 0.43 0.02 0.00 Nutrition 0.40 0.01 0.00 Phylogenetic 0.05 0.02 0.05 OLS Slope std error P Disease 0.46 0.02 0.00 Nutrition 0.37 0.02 0.00 Genetic 0.07 0.02 0.00 OLS country fixed effects Slope std error P Disease 0.42 0.02 0.00 Nutrition 0.40 0.01 0.00 Genetic 0.04 0.02 0.06 ML country random effects Slope std error P Disease 0.42 0.02 0.00 Nutrition 0.40 0.01 0.00 Genetic 0.05 0.02 0.02
 Table A5. Sociodemographic Index. OLS Slope std error P Disease 0.46 0.02 0.00 Nutrition 0.47 0.01 0.00 Phylogenetic -0.04 0.02 0.06 OLS country fixed effects Slope std error P Disease 0.47 0.02 0.00 Nutrition 0.51 0.01 0.00 Phylogenetic –0.11 0.02 0.00 ML country random effects Slope std error P Disease 0.47 0.02 0.00 Nutrition 0.51 0.01 0.00 Phylogenetic –0.10 0.02 0.00 OLS Slope std error P Disease 0.46 0.02 0.00 Nutrition 0.47 0.01 0.00 Genetic -0.03 0.02 0.05 OLS country fixed effects Slope std error P Disease 0.46 0.02 0.00 Nutrition 0.50 0.01 0.00 Genetic -0.08 0.02 0.00 ML country random effects Slope std error P Disease 0.46 0.02 0.00 Nutrition 0.50 0.01 0.00 Genetic -0.07 0.02 0.00
 Table A6. State Capacity. OLS Slope std error P Disease 0.62 0.02 0.00 Nutrition 0.28 0.02 0.00 Phylogenetic –0.16 0.02 0.00 OLS country fixed effects Slope std error P Disease 0.64 0.02 0.00 Nutrition 0.33 0.01 0.00 Phylogenetic –0.26 0.03 0.00 ML country random effects Slope std error P Disease 0.63 0.02 0.00 Nutrition 0.32 0.01 0.00 Phylogenetic –0.23 0.02 0.00 OLS Slope std error P Disease 0.61 0.02 0.00 Nutrition 0.28 0.02 0.00 Genetic -0.11 0.02 0.00 OLS country fixed effects Slope std error P Disease 0.61 0.02 0.00 Nutrition 0.32 0.01 0.00 Genetic -0.16 0.02 0.00 ML country random effects Slope std error P Disease 0.61 0.02 0.00 Nutrition 0.31 0.01 0.00 Genetic -0.14 0.02 0.00

## One thought on “Forbidden Regressions: Testing Racialism as a Scientific Hypothesis”

1. Benign says:

Really great work. The ruling elites know this; it is why they oppress the poor by not providing adequate health services or nutrition; and then pretend the poor are poor because of socially unmentionable intrinsic characteristics. A wonderful expose of of what is happening around the world now. The utter hypocrisy of the elites ….