The American Economic Association, which is supposed to be the most prestigious professional association of economists in the world, just published Becker, Enke and Falk’s “Ancient Origins of the Global Variation in Economic Preferences“. They hypothesize that ‘differential historical experiences that have accumulated over thousands of years of separation might have given rise to different preferences as of today’ so that some populations are more thrifty, patient, risk-averse, altruistic or trusting than others. They expect behavioral differences between populations to be at least in part genetic.
Given that attitudes like risk aversion, trust, and altruism are transmitted across generations (Dohmen et al. 2012) and that part of this transmission appears to be genetic in nature (Cesarini et al. 2009), the different genetic endowments induced by long periods of separation could also generate differences in preferences.
This is classic biological reductionism. It is Ashraf and Galor (whom we have debunked before) on steroids. Anke Becker is starting a tenure track position at Harvard Business School in the fall. Benjamin Enke is already a tenure-track professor at the Department of Economics at Harvard. Armin Falk is Professor of Economics at the University of Bonn and Chief Executive Officer of the BRIQ institute on behavior and inequality. Surely these are serious folk? And the paper has already been peer reviewed and published in AEA Papers and Proceedings. Surely, one would think, it cannot contain obvious mistakes. One would be wrong.
I checked their work with my own hands. So what exactly are they doing? Well, they have a measure of genetic distance (Fst) compiled by Spolaore and Wacziarg (2017) that is based on the microsatellite data of Pemberton, DeGiorgio and Rosenberg
(2013). And they have phenotypic data compiled for BRIQ in the Global Preference Survey by Gallup. What they are trying to show is that genetic distance tracks variation in phenotypic differences in patience, risk-aversion, thriftiness, altruism and trust in strangers et cetera. In order to show this, they regress absolute differences in each of these survey variables on genetic distance via regressions of the form:
where d(i,j) is the genetic distance between countries i and j, and f(i) and f(j) are country fixed-effects. They always include country fixed effects and report standardized gradients for genetic distance. It is found to be highly significant for all behavioral phenotypes:
I found it very curious as to why they always include country fixed effects. That’s a lot of parameters. Alternately, each of these estimates corresponds to very many regressions; something that is completely obscured by the presentation. Why don’t they report straightforward estimates alongside these? I got even more curious when I read their footnote.
The empirical results suggest that such country fixed effects indeed go a long way in addressing omitted variable concerns. For instance, in the analyses presented below, for patience and negative reciprocity we sometimes observe statistically significant negative coefficients on ancestral distance if country fixed effects are not included, which we find very hard to interpret. These results entirely disappear with country fixed effects. [Emphasis mine.]
Do country fixed effects go a long way towards addressing omitted variable concerns? Or do they swamp the underlying patterns? How can we tell if you do not give us the basic regression estimates? Table 1 presents both together.
Table 1. Gradient estimates. | ||||
Response is absolute differences in means of: | Genetic distance (feature) | std error | Country FE | R^2 |
Patience | -0.053 | 0.008 | No | 0.016 |
0.170 | 0.020 | Yes | 0.477 | |
Risk-taking | 0.090 | 0.009 | No | 0.071 |
0.328 | 0.018 | Yes | 0.622 | |
Positive reciprocity | 0.044 | 0.009 | No | 0.015 |
0.216 | 0.020 | Yes | 0.490 | |
Negative reciprocity | -0.020 | 0.006 | No | 0.004 |
0.114 | 0.021 | Yes | 0.446 | |
Altruism | -0.040 | 0.008 | No | 0.011 |
0.101 | 0.019 | Yes | 0.532 | |
Trust | 0.050 | 0.008 | No | 0.028 |
0.301 | 0.020 | Yes | 0.419 | |
Source: Global Preferences Survey, Spolaore and Wacziarg (2017), Pemberton, DeGiorgio and Rosenberg (2013), author’s computations. The feature is genetic distance. The response is absolute differences in country means of survey variables denoted in the first column. We report Newey-West robust standard errors for the OLS estimates without country fixed-effects. Estimates in bold are significant at the 5 percent level. |
A number of observations are in order. First, without country fixed-effects, the standardized slope coefficients are negative and significant not just for negative reciprocity and patience, but also altruism. This may be a minor breach of scholarly trust by itself, but along with all the rest, it is damning. Second, the inclusion of covariates in a regression almost always attenuates the gradient estimate of interest. But here we find a systematic pattern of the slope coefficient being attenuated in the simple linear regressions relative to the ones where we admit fixed-effects by country. This is very peculiar. Third, the overall pattern that emerges cannot be straightforwardly interpreted in the manner suggested by the authors. Three of the coefficients are negative and three positive, casting considerable doubt on the theory that genetic distance predicts differences in behavioral propensities. Fourth, without country fixed-effects, the percentage of variation explained is very small: 0.4 percent for negative reciprocity at the lower end to 7.1 percent for risk aversion. Fifth, the standardized slopes without the country fixed effects are very small. The largest, 0.09 for risktaking, corresponds to a 9 percent of standard deviation increase in the response due to a one standard deviation increase in genetic distance. That’s trivial. The statistical significance of the coefficient of genetic distance is very likely due to the explosion of sample size in the dyadic regressions.
Moreover, it is not simply a matter of whether or not one should always include country fixed effects. At the minimum one should report estimates for both with and without controls like country dummies. Notice that the fixed effect account for the vast bulk of explained variation. How do we know that the signal is not swamped by the inclusion of these dummies?
Furthermore, one of the basic things we do when we compare distances is to carry out Mantel tests. These are much more reliable than dyadic regressions since they are designed to be used to test the similarity of symmetric distance matrices. Table 2 reports the results for Mantel tests. We compute cultural distance as the euclidean distance of national populations in the six-dimensional space spanned by the z-scores for patience, risk-taking, positive reciprocity, negative reciprocity, altruism and trust.
Table 2. Mantel Tests. | |||
Pearson | |||
h | P | ||
Geodesic distance | Genetic distance | 0.474 | 0.000 |
Geodesic distance | Cultural distance | -0.030 | 0.687 |
Genetic distance | Cultural distance | 0.070 | 0.145 |
Spearman | |||
h | P | ||
Geodesic distance | Genetic distance | 0.497 | 0.000 |
Geodesic distance | Cultural distance | -0.018 | 0.626 |
Genetic distance | Cultural distance | 0.085 | 0.099 |
Source: Global Preferences Survey, Spolaore and Wacziarg (2017), Pemberton, DeGiorgio and Rosenberg (2013), author’s computations. Geodesic distances are computed via the Haversine formula. Estimates in bold are significant at the 5 percent level. |
We strongly reject the null of zero correlation between genetic distance and geodesic distance. This is to be expected. But it serves as a good check that the genetic distance data of Spolaore and Wacziarg is reliable. We cannot reject the null of zero correlation between cultural distance on the one hand, and genetic and geodesic distances on the other. The empirical evidence is rather consistent with the notion that cultural characters are weakly and discordantly correlated with genetic distance between populations.
But perhaps we are expecting too much concordance from behavioral phenotypic characters? If we follow the authors’ lead in defining distance for each phenotype as the absolute value of the mean difference in each measure of behavioral propensities, we can subject them to the much more appropriate Mantel tests separately. Table 3 displays our test results.
Table 3. Mantel tests for similarity with genetic distance matrix. | ||||
Pearson | P | Spearman | P | |
Patience | -0.128 | 0.980 | -0.108 | 0.971 |
Risk-taking | 0.266 | 0.001 | 0.263 | 0.000 |
Positive reciprocity | 0.122 | 0.051 | 0.117 | 0.032 |
Negative reciprocity | -0.066 | 0.835 | -0.067 | 0.882 |
Altruism | -0.104 | 0.927 | -0.087 | 0.922 |
Trust | 0.168 | 0.006 | 0.162 | 0.005 |
Source: Global Preferences Survey, Spolaore and Wacziarg (2017), Pemberton, DeGiorgio and Rosenberg (2013), author’s computations. Estimates in bold are significant at the 5 percent level. |
A couple of observations are again in order. First, using Pearson’s test statistic, we cannot reject the null for four out of six behavioral phenotypes. Spearman’s is more powerful. Still, we can only reject the null in three our of six if we use Spearman’s version of the Mantel test. Second, the Mantel test statistic for only one of the phenotypes, risk aversion, is moderately large; for trust, and even more so for positive reciprocity, the test statistic is very modest and only marginally significant.
The overall pattern is strongly suggestive of at best a very weak relationship between some behavioral phenotypic characters and genetic distance. More likely than a Pleistocene origins of genetically-determined differences in behavioral propensities, ie racialism sensu stricto, is the hypothesis that genetic distance is somewhat correlated with cultural institutions since both vary by continent. And it is cultural institutions that causally condition behavioral propensities by structuring the risks and rewards associated with different behavioral phenotypes in the recent past.
Despite how neat it sounds, not everything can be reduced to biology. In fact, the rich and powerful regions of the world today are genetically disadvantaged compared to the poor and weak regions. Not only does biology not explain world order, we are actually working against the evidence from genetics when we try to explain the polarization of the world by appealing to genetic differences between populations.
Postscript. Just a quick test of one more claim in their paper. They construct an aggregate measure of cultural distance by summing up the absolute mean differences for the six cultural traits. As before, we compute their measure and subject it to the much more appropriate Mantel test. Table 4 reports the results. The results are devastating to their theory. Even averaged over the six traits, we cannot reject the null of zero correlation between genetic distance and cultural distance.
Table 4. Mantel tests for mean cultural distance and genetic distance. | |||
Pearson | P | Spearman | P |
0.102 | 0.054 | 0.078 | 0.121 |
Source: Global Preferences Survey, Spolaore and Wacziarg (2017), Pemberton, DeGiorgio and Rosenberg (2013), author’s computations. Estimates in bold are significant at the 5 percent level. |
As their footnote reveals, they find it ‘difficult to interpret’ such negative results. But such negative results are to be expected if cultural traits are themselves discordant. For if they are discordant than averaging over them does not get you very far at all. Indeed, we find near-total discordance between the cultural traits themselves. Of the fifteen possible pairs, using either Spearman or Pearson, we find that we can reject the null in only one case, that of altruism and positive reciprocity.
Table 5. Mantel tests for concordance of cultural traits. |
|||||
Pearson | |||||
Risktaking | Posrecip | Negrecip | Altruism | Trust | |
Patience | -0.042 | -0.066 | 0.006 | -0.052 | 0.024 |
Risktaking | 0.116 | 0.005 | -0.086 | 0.053 | |
Posrecip | -0.045 | 0.491 | 0.079 | ||
Negrecip | -0.041 | -0.051 | |||
Altruism | 0.089 | ||||
P | |||||
Risktaking | Posrecip | Negrecip | Altruism | Trust | |
Patience | 0.715 | 0.872 | 0.416 | 0.783 | 0.315 |
Risktaking | 0.058 | 0.442 | 0.891 | 0.192 | |
Posrecip | 0.762 | 0.000 | 0.085 | ||
Negrecip | 0.732 | 0.821 | |||
Altruism | 0.075 | ||||
Spearman | |||||
Risktaking | Posrecip | Negrecip | Altruism | Trust | |
Patience | -0.030 | -0.041 | 0.021 | -0.033 | 0.031 |
Risktaking | 0.082 | -0.010 | -0.071 | 0.078 | |
Posrecip | -0.034 | 0.428 | 0.065 | ||
Negrecip | -0.033 | -0.042 | |||
Altruism | 0.065 | ||||
P | |||||
Risktaking | Posrecip | Negrecip | Altruism | Trust | |
Patience | 0.700 | 0.793 | 0.309 | 0.721 | 0.226 |
Risktaking | 0.076 | 0.548 | 0.893 | 0.064 | |
Posrecip | 0.754 | 0.000 | 0.081 | ||
Negrecip | 0.731 | 0.831 | |||
Altruism | 0.103 | ||||
Source: Global Preferences Survey, Spolaore and Wacziarg (2017), Pemberton, DeGiorgio and Rosenberg (2013), author’s computations. Estimates in bold are significant at the 5 percent level. |
We call on the authors to withdraw their paper from AEA Papers and Proceedings. We call on the editors of AEA Papers and Proceedings to publish an erratum. And we call on the American Economic Association to examine the quality of the peer review process at the journals published under its name.
Postpostscript. In private communication, the authors clarified that they are interested in the predictive information contained in “temporal distance”. This term is suggestive of distance to last common ancestor (LCA), that is not captured by the metric they are using, Fst. A closer analog of what they have in mind is patristic distance or phylogenetic distance, that is, distance between populations measured along the phylogenetic tree. This is a much better proxy ‘for the length of time since two populations shared common ancestors’. Here we compute the phylogeny implied by the pairwise genetic distances and obtain the phylogenetic distance from it. We use the MATLAB function seqlinkage and the UPGMA algorithm to obtain the phylogenetic tree displayed below.
We then compute pairwise phylogenetic distances from the phylogenetic tree using MATLAB’s phytree/pdist function. Finally, we redo the Mantel tests and reestimate the standardized slope coefficients with and without controlling for country fixed effects. Table 6 displays the results of the Mantel tests. These estimates should be compared with those in Table 3. We can see that the results are similar and the coefficients bear the same signs. But the estimates for risktaking are slightly smaller than before, while those for altruism and trust are larger. The negative coefficients, for patience, negative reciprocity and altruism, are all modestly attenuated. These patterns suggest that phylogenetic distance contains more information than pairwise genetic distances; that additional information is being extracted by the hierarchical clustering of the different clades.
Table 6. Mantel tests for similarity with pairwise phylogenetic distance matrix. | ||||
Pearson | P | Spearman | P | |
Patience | -0.113 | 0.971 | -0.094 | 0.964 |
Risktaking | 0.249 | 0.001 | 0.261 | 0.000 |
Positive reciprocity | 0.156 | 0.016 | 0.150 | 0.009 |
Negative reciprocity | -0.078 | 0.891 | -0.086 | 0.947 |
Altruism | -0.082 | 0.878 | -0.073 | 0.880 |
Trust | 0.206 | 0.001 | 0.187 | 0.002 |
Source: Global Preferences Survey, Spolaore and Wacziarg (2017), Pemberton, DeGiorgio and Rosenberg (2013), author’s computations. Estimates in bold are significant at the 5 percent level. |
What does this imply about slope coefficients? We redo the computations and collect the results in Table 7. It should be compared with those in Table 1 but I made a coding error. Apologies for that. In the ones reported in Table 1, I did not standardize the response variable to have mean 0 and variance 1. Table 8 reports the corrected estimates for the gradients of both genetic distance (Fst) and phylogenetic distance.
Table 7. Gradients of phylogenetic distance. | ||||||
No controls | Controlling for country FE | |||||
Response | SLOPE | std error | R^2 | SLOPE | std error | R^2 |
Patience | -0.172 | 0.029 | 0.013 | 0.206 | 0.022 | 0.478 |
Risktaking | 0.410 | 0.042 | 0.062 | 0.295 | 0.021 | 0.613 |
Positive reciprocity | 0.239 | 0.038 | 0.025 | 0.270 | 0.022 | 0.493 |
Negative reciprocity | -0.122 | 0.030 | 0.006 | 0.131 | 0.024 | 0.446 |
Altruism | -0.122 | 0.032 | 0.007 | 0.131 | 0.021 | 0.533 |
Trust | 0.310 | 0.039 | 0.042 | 0.382 | 0.023 | 0.426 |
Source: Global Preferences Survey, Spolaore and Wacziarg (2017), Pemberton, DeGiorgio and Rosenberg (2013), author’s computations. Estimates in bold are significant at the 5 percent level. Both response and feature have been standardized to have mean 0 and variance 1. We report Newey-West standard errors for the straightforward gradients. |
These gradient estimates make a lot more sense. They are comparable in order of magnitude to those obtained from the double-counting dyadic regressions while controlling for country fixed-effects. But otherwise, the story remains unchanged. Table 8 compares the gradients of our two features. We can see that, where the gradients are positive (risktaking, positive reciprocity, and trust), those for phylogenetic distance are strictly larger. This means that the additional information contained in phylogenetic distances is significant — it contains a much stronger signal than genetic distances.
Table 8. Gradients of genetic distance and phylogenetic distance. | ||||||
Genetic distance | Phylogenetic distance | |||||
Response | SLOPE | std error | R^2 | SLOPE | std error | R^2 |
Patience | -0.157 | 0.023 | 0.016 | -0.172 | 0.029 | 0.013 |
Risktaking | 0.355 | 0.036 | 0.071 | 0.410 | 0.042 | 0.062 |
Positive reciprocity | 0.151 | 0.030 | 0.014 | 0.239 | 0.038 | 0.025 |
Negative reciprocity | -0.084 | 0.026 | 0.004 | -0.122 | 0.030 | 0.006 |
Altruism | -0.126 | 0.027 | 0.010 | -0.122 | 0.032 | 0.007 |
Trust | 0.205 | 0.031 | 0.028 | 0.310 | 0.039 | 0.042 |
Source: Global Preferences Survey, Spolaore and Wacziarg (2017), Pemberton, DeGiorgio and Rosenberg (2013), author’s computations. Estimates in bold are significant at the 5 percent level. Both response and feature have been standardized to have mean 0 and variance 1. We report Newey-West standard errors. |
Finally, we redo the Mantel test for the average of population differences in means of the six cultural characters and present it in Table 9. This can be compared with the results reported in Table 4. We can see that, while still small, they are significant at the 5 percent level. This is yet another indication that phylogenetic distance contains more information than genetic distance per se. This particular result may be of interest to the authors.
Table 9. Mantel tests for mean cultural distance and phylogenetic distance. | |||
Pearson | P | Spearman | P |
0.138 | 0.014 | 0.109 | 0.045 |
Source: Global Preferences Survey, Spolaore and Wacziarg (2017), Pemberton, DeGiorgio and Rosenberg (2013), author’s computations. Estimates in bold are significant at the 5 percent level. |