Musings on the Microstructure of the Market for Risk


Margin Call (2011)

In closing the previous dispatch I offered that we may be missing a theoretical piece of the puzzle. Here I offer some musings on what sort of structure I think we need to get an even better handle on asset prices.

My understanding of the microstructure of the dealer ecosystem suggests to me that we have three kinds of market players in the market for risk: sell-side, buy-side, and noise traders. US securities broker-dealers on the sell-side make markets by trading at quoted prices. They also provide funding for the trades which consumes balance sheet capacity (the risk-bearing capacity of the sell-side relative to the scale of the buy-side which I have argued is the right pricing kernel in intermediary asset pricing). Noise traders are needed to close the model. More on them later.

Balance sheet capacity is a joint function of the relative ease of funding in the wholesale funding market on the one hand and the market clearing price of risk in the over-the-counter derivatives market on the other. When the price of risk is low (ie when asset valuations are high) more funding can be secured against the same collateral than when the price of risk is high (ie asset valuations are low). This generates a dangerous feedback loop between the market price of risk and the ease of funding.

To be sure, default-remote bonds serve as collateral in the rapidly spinning rehypothecation flywheel because the stability of the flywheel requires the absence of default risk. The proximate cause of the GFC was the fateful introduction of private-label RMBS into the flywheel. And it was the great sucking sound of the wholesale funding market that generated the housing finance boom. Once debt burdens triggered a massive wave of defaults and credit risk reached the flywheel it tottered and shrank, but continued to spin rapidly in its shrunken state on public collateral. But the crunch of the wholesale funding market generated a massive seizure in the machine of global credit creation, sending a massive shockwave that propagated worldwide. Only those with autonomous financial systems insulated by thick regulatory firewalls and those too remote to have been penetrated by global finance managed to come out in one piece.

In the aftermath of the GFC, an intrusive enforcement regime of limits on bank leverage, balance sheet surveillance, risk-assessment, and other regs have reduced the elasticity of dealer balance sheets. The sharply reduced risk-bearing capacity of the system is reflected in breakdown of the iron law of covered interest-rate parity, volatility spikes, and the risk on-risk off behavior of asset prices. Due to the upper bound on the leverage of global banks, the ease of funding has become a function of US monetary policy with the result that the strength of the dollar has emerged as a barometer of the price of balance sheets. Indeed, the strength of the dollar is now priced in the cross-section of US stock returns.

With the dealers pinned down, fluctuations in the market price of risk can be expected to driven by developments on the buy side. Investment strategies of large asset managers are variations on a small number of themes. Big institutional investors like pension funds and insurance companies (‘real money investors’ in the finance jargon) are bound by regulation and governed by similar investment philosophies to maintain asset allocations in certain definite proportions, which requires periodic and tactical rebalancing of their portfolios. When strategists speak of rotation in and out of asset classes, it is these real money investors that they usually have in mind. Also on the buy side are less constrained hedge funds who make up for their smaller size ($3 trillion AUM in the aggregate) by their tactical agility and willingness to make lots of leveraged bets funded by the dealers. Somewhat between the two are leveraged bond portfolios like Pimco who are interested in holding positions with ‘equity like returns with bond like volatility’ (Bill Gross:”Holy Cow Batman, these bonds can outperform stocks!“). That’s your buyside.

Then we have the noise traders. We can think of them as low information small retail investors, or plainly speaking, the small fry whose herd behavior is driven by sentiment. They kick asset prices away from fundamentals by randomly bidding asset prices too far up or down, thereby generating positive risk premia that are then harvested by the big fish. More generally, the game is subtly rigged towards the house by structural advantages of the dealers. In particular, privileged access to order flow information puts dealers is a position of tactical advantage. Apart from trading for the house, traders at dealer firms share order flow information (and therefore the information premium) with their networks on the buy side in exchange for a larger volume of trades with their attendant commissions. Moreover, since exposure to fluctuations in balance sheet capacity comes with a juicy risk premium, dealers and their counterparties in the market for risk enjoy higher risk-adjusted returns than the small fry even without exploiting order flow information. Furthermore, traders on the sell-side are not beyond generating handsome profits by pumping asset prices up and down or otherwise loading the dice. Although the real scandal is what is perfectly legal.



Stock Market Fluctuations Are Driven by Investor Herd Behavior

FT AlphaVille linked to an interesting blog post by Nick Maggiulli on Dollars and Data that examined the long-run stock return predictability in terms of equity allocations. Nick shows that high allocations predict lower ten-year returns. Here’s a replication of the main result.


The result must be taken with a pinch of salt. Is it a feature or a bug? The cause for concern is that overlapping regressions generate spurious correlations. There is good reason to be skeptical of the extremely high coefficient estimate (r=-0.897, p<0.001). It likely reflects the medium-term cycle in Equity Allocation. (We use the same metric as Nick and in the original blog post at PhilosophicalEconomics.) Econometrically, regression estimates rely on the assumption that the series is stationary (no detectable temporal patterns like trends and cycles) which is manifestly violated here. See next figure.


What is required for kosher statistical inference is to transform the series so that it is at least roughly stationary. The best way to do that is to difference the series. Here we look at changes in the natural logarithm (ie, compounded rate of return) of the SP500 Index and Equity Allocation. The two series are manifestly stationary and appear to be strongly contemporaneously correlated.


Indeed, contemporaneous percentage changes in Equity Allocation strongly predict quarterly returns on the SP500. Our gradient estimate (b=1.25, t-Stat=30.7) implies that 1 percent higher allocation to equities predicts a 1.25 percent quarterly return on the SP500 over and above the unconditional mean of 1.79 percent per quarter. Equity Allocation explains 78 percent of the variation in stock market returns. See next figure.


The empirical evidence is rather consistent with the idea that fluctuations in the stock market reflect investor herd behavior. Specifically, the stock market goes up when investors rebalance to equities and goes down when investors rotate out of equities to bonds and cash. This is not only an important amplifier of dealer risk appetite and monetary policy shocks but also an important source of fluctuations in its own right. So stocks are getting culled across the board as we speak precisely due to investor rebalancing prompted by higher yields. (In turn, higher yields reflect either the expectation that the Fed will hike faster, a higher term risk premium, or both. The two can be disentangled using the ACM term-structure model as I illustrated not too long ago. [P.S. It’s risk premium; although Matt Klein doesn’t seem to buy the ACM decomposition.)

Tying market fluctuations empirically to investor herd behavior goes some way towards explaining the excess volatility of the stock market that has long puzzled economists. My wager is that stock markets fluctuate dramatically more than reassessments of underlying fundamentals could possibly warrant because of fluctuations driven by investor rebalancing.

The question is whether this is due to the herd behavior of small investors, or whether it is due to the inadvertently-coordinated rebalancing among large asset managers because they face similar mandates. If the former, that leads us to questions of investor sentiment. If the latter, it leads us straight back to market structure. In particular, it draws our attention to the buy side. Instead of paying exclusive attention to dealers and wholesale funding markets, perhaps we should also interrogate the investor behavior of large asset managers as an independent source of fluctuations in the price of risk.

In either case, knowing that rebalancing investor herds drive stock market fluctuations is not very useful since data on equity allocation is only available at the end of the quarter. Or is it not? Can we not think of Equity Allocation (hence implicitly investor herd behavior) as a risk factor for pricing the cross-section of stock excess returns? Indeed we can. Turns out, percentage changes in Equity Allocation are priced in the cross-section of expected excess returns. We illustrate this with 100 Size-Value portfolios from Kenneth French’s library.


What we find is that instead of a linear pricing relationship whereby higher betas imply monotonically higher expected returns in excess of the risk-free rate, the relationship is quadratic. Portfolios whose equity allocation betas is moderately high outperform portfolios with extreme betas in both directions. So an easy way to make money is to hold portfolios that are, depending on your risk appetite, long or overweight moderate beta stocks, and short or underweight extreme beta stocks.

Note that stock portfolios that are more sensitive to tidal investor flows are generally more volatile. See next figure.


The big puzzle that thus emerges is why these frontier assets (stock portfolios that are highly sensitive to investor rebalancing) don’t sport high expected returns. For the fundamental insight of modern asset pricing is that risk premia (expected returns in excess of the risk-free rate) exist because investors require compensation to hold systematic risk (but not idiosyncratic risk since that can be easily diversified away). In other words, assets that pose a greater risk to investors’ balance sheets ought to sport higher returns. We have shown that the tidal effect of inadvertently-coordinated investor rebalancing is a significant and systematic risk factor for all investors. So why isn’t there a monotonic relationship between the sensitivity of portfolio returns to investor rebalancing and the risk premium embedded in the cross-section? Why is the price of risk quadratic and not linear in beta? Clearly, we are missing a theoretical piece of the puzzle.



Wage Growth Predicts Productivity Growth

Tip of the hat to Ted Fertik for bringing Servaas Storm’s unpacking of total-factor-productivity growth (TFP) to my attention. Storm shows that TFP can be regarded mechanically as a weighted sum of the growth rates of labor and capital productivity, roughly in a 3:1 ratio in that order. TFP, of course, is a measure of our ignorance. It nudges us to look inside firms, ie the supply side. This leads down the path to situated communities of skilled practice, ie Crawford’s ‘ecologies of attention.’ But perhaps it is better to work directly with labor productivity, certainly if Storm is right. Some say that high wages incentivize firms to invest in labor-saving innovations thereby increasing labor productivity. This is certainly consistent with the standard microeconomics view of firm behavior in that they are expected to do whatever it takes to get a competitive advantage in the market. A straightforward implication of this hypothesis is that real wage growth ought to predict productivity growth. We’ll see what the evidence has to say about this presently. But let us first note the policy implications of the theory.

The fundamental challenge of contemporary Western political economy is how to restore economic dynamism. The first-best solution to the rise to China is for the West to maintain its technical and economic lead. Similarly, the first-best solution to political instability and the crisis of legitimacy is a revival in the underlying pace of economic growth. So far no one has offered a credible solution; Trump’s tariffs, nationalist socialism á la Streeck, restoration of high neoliberalism á la Macron, are all small bore. But if the hypothesis that a significant causal vector points from real wage growth to productivity growth holds, then a bold new Social Democratic solution to the fundamental challenge of Western political economy immediately becomes available.

What I have in mind is a new mandate for central bankers. To wit, Congress should mandate the Federal Reserve to maximize real median wage growth subject to monetary and labor market stability. Until now central banks have targeted labor market slack as understood in terms of employment and inflation. But the real price of labor (more precisely, productivity-adjusted real median wage) is also an excellent measure of labor market slack. The hypothesis implies that targeting productivity-adjusted real median wage growth could restore productivity growth; perhaps dramatically. My suggestion is consistent with social democracy’s concern with distributional questions as well as with standard central banking practice. So if the result holds, it’s very useful indeed.

We start of by checking that real wage growth predicts productivity growth in the United States. The correlation is large and significant (r=0.531, p<0.001). This is suggestive. Wages_productivity.png

In order to systematically investigate this question we interrogate the data from the International Labor Organization (ILO). The ILO provides estimates of real output per worker, unemployment rate, and the growth rate of real wages. We restrict our sample to N=30 industrial countries since wage growth has diverged so significantly between the slow-growing advanced economies and fast-growing developing countries. We estimate a number of linear models and collect our gradient estimates in Table 1.

We begin in the first column that reports estimates for the simple linear model that explains productivity growth by 1-year lagged real wage growth. In the second column, we introduce controls for a temporal trend and lagged productivity growth. This sharply reduces our estimate for the gradient suggesting that the estimate reported in column 1 was inflated due to autocorrelation. We introduce country-fixed effects (ie country dummies) in column 3, which modestly reduces our estimate of the gradient. Instead of country fixed-effects, in the fourth column, we control for the unemployment rate, which turns out to be significant and which very modestly increases our gradient estimate. In the last two columns we introduce random effects for country and year. What this means is that instead of dummies for each country and year which is equivalent to having fixed intercepts by country and year, we admit the possibility that the intercept for a given country and year is random.

Table 1. Linear mixed-effect model estimates.
Intercept Yes Yes Yes Yes No No
Trend No Yes Yes Yes Yes Yes
AR(1) No Yes Yes Yes Yes Yes
Country fixed-effect No No Yes No No Yes
Unemployment Rate (lagged) No No No Yes Yes Yes
Real wage growth (lagged) 0.233 0.136 0.104 0.140 0.108 0.100
standard error 0.037 0.042 0.046 0.042 0.037 0.044
Country random effect No No No No Yes Yes
Year random effect No No No No Yes Yes
Source: ILO. Estimates in bold are significant at the 5 percent level. Dependent variable is real output per worker at market exchange rates. The number of observations is 480. 

We note that the gradient for lagged real wage growth remains significant across our linear models even after controlling for a temporal trend, lagged term for the dependent variable, lagged unemployment rate, country fixed-effects, and random effects for country and year. We can thus be fairly confident that real wage growth predicts productivity growth across the industrial world. The next step would be to embed this in a macro model to interrogate the viability of real median wage growth targeting by central banks.


The Arrow of Time in World Politics

Structures don’t live out there in the wild; they are explanatory schemas that live in the discourse and in men’s minds as mental maps. Temporal structure is made of diachronic patterns (roughly, time-variation) as opposed to synchronic patterns (roughly, cross-sectional variation). An example of the former would be the relative decline of Britain in 1895-1905. An instance of the latter would be the pattern of diplomacy during the July Crisis. What historians in the tradition of Braudel are interested is slow-moving temporal structure. Synchronic pattern are of concern to historians interested in the history of events, that Braudel characterized as mere ‘surface disturbances’ that ‘the tides of history carry on their strong backs.’

Koselleck says every concept has its own “internal temporal structure.” I think, let’s try this at home. Here I locate the variable that’s doing most of the work in explaining the diachronic pattern of international politics in some IR theories I find interesting, and find a representation as an internal temporal structure inherited from the logic of the theory. It turns out to be surprisingly useful in thinking about the Chinese question.

In the discourse of realist IR, the explanandum is the historical pattern of relations among great powers at the center of the world-system beginning in Europe c. 1494 or c. 1648 depending on the scholar. Waltz’ main achievement was to isolate the systemic security interaction of great powers as a separate level of analysis and successfully claim disciplinary autonomy for international relations within political science. He did that by importing the trick from microeconomics where the market interaction of firms had been isolated as an independent disciplinary domain of enquiry within economics. In neorealism, systemic security interaction between homogenous units differentiated solely by a scale parameter called power is posited as a theoretical model of an international system. Waltz identified the structure of an international system with the distribution of power among the units. In the familiar metaphor, great powers are considered to be like billiard balls differentiated only by size. Time has no place in Waltz’ admissible class of abstract international systems. The result, predictably, is ‘rigor mortis’ (Walt, 1999).


Kenneth Waltz (1924-2013)

But Waltz was not done yet in painting himself into a corner. No sir, he proceeds to throw out almost all the information in the distribution of power, that he had identified as the structure of the system. Polarity, the number of great powers in the system, is, Waltz argued, an efficient explanation of the stability properties of the international system in that bipolar systems are stable whereas multipolar systems are unstable. The de facto structure in Waltz’ theory then is not what he had declared to be the structure of the international system, the distribution of power. It is not even polarity per se. In his main arguments about system stability, all the work is done by a Boolean variable. Systems are either bipolar or multipolar. Moreover, the record of great power relations from c. 1648, is a single sample path; according to Waltz only two system structures have ever existed. Waltz thus cornered himself into an explanatory impasse that proved impossible to escape, which would be impressive if getting stuck in degenerate paradigms were uncommon.

Waltz withdrew into a sullen silence as the bipolar world, whose stability he had explained with great authority, evaporated into thin air on account of the unanticipated capitulation of the weaker party. With great conviction, he then proceeded to pronounce the unipolar world to be so unstable as to be not worthy of the attention of a serious student of world politics. But the unipolar world simply refused to lay down and die. It was 24 years old and going strong when Waltz passed away in 2013. Poor fellow must have been painfully conscious of his lost wagers with world history.

In contrast to Waltz one-dimensional explanatory scheme or structure, Gilpin’s is two-dimensional. The distribution of power is projected onto a time-axis and allowed to evolve. This decisive discursive move reveals a cyclic, punctuated equilibrium-type pattern in world politics. Hegemonic wars punctuate long periods of system stability. These periods of system stability exhibit stable patterns of international politics that are reenacted over and over again, and tend to change very slowly. For example, the identity of the maritime hegemon (a natural monopoly); the identity of the dominant politico-military actors in different regions; the territorial order; the diplomatic rank ordering; national questions (eg, the Kurdish question); and, with apologies to WG Sebald, no doubt the cabinet of zombie curiosities: dead treaties, agreements, multilateral coordinating bodies, aid programs, and peace processes et cetera; they stand frozen in the instant they came to grief, waiting to be toppled over by strong winds and consigned to the storage rooms of museums and libraries. The reproduction of world order is founded on the absence of a reconsideration of the world question rather than just the military position of the dominant power. To be precise, the stability of world order rests on the fear of world war. It rests on the unwillingness of potential insurgents to reopen the world question.


Robert Gilpin (1930-2018)

Gilpin’s motor of world history, the law of uneven growth, is a particular instance of the Second Law which states that the entropy of any system increases over time. For international systems, entropy is defined as the dispersion of power. Hegemonic war eliminates some great powers, weakens others, and strengthens only a few, so that when world time resets to zero at the close of the hegemonic war, power in the system is highly concentrated, ie entropy is low. The system deconcentrates over time in accordance with the Second Law. The expected dispersion of power in the system is thus maximal on the eve of a hegemonic war. The Second Law in the form of the law of uneven growth thus endows the system with a temporal structure. We define world time in terms of this temporal structure. More precisely, world time can be identified with the entropy of the international system.

Victorious powers forge a world order from scratch. But underneath the stable patterns of world order, pace Foucault, world time continues its relentless march. World time can be thought of as the clock of the rise and fall of the great powers. World time measures the erosion of the dominant power’s relative power and influence. World time enters world history at two levels of causation, at the level of the discourse and in extradiscoursive reality. Great power statesman are in effect trying to read world time when they try to ascertain the changing balance of world power. World time also enters history through brute facts about the relative distribution of war potential. The tick-tock echoes between the steel factories and the minds of statesmen. The motor of hegemonic war in Gilpin is identical to the one identified by Thucydides 2400 years ago. Under the tick-tock of world time, the declining hegemon discovers merit in the logic of preventive war. Perhaps it is best to reach a decision while time is still on one’s side. Knowing that the dominant power might succumb to the seduction of preventive war, the rising power too is alarmed by the tick-tock of world time. Rising powers know that they must accumulate power fast or they will be crushed.

In Gilpin’s theory, world time is cyclical. The tick-tock of world time announces the approach of the time of the great reckoning when the fates of great civilizations will once again be determined. World time is then quite literally set to zero, and starts afresh with the creation of a new world order forged by the victors from the smoking ruins of the hegemonic war. Until, of course, world time signals autumn. The fundamental problem with cyclical time is the temporality of world systems. Sequential world orders cannot be assumed to be independent. Hegemonic wars are fought precisely over the terms of the world order. World orders are forged in the shadow of hegemonic war from the ruins of the previous world order. They are often consciously constructed in light of the lessons of the failure of the previous attempt at world order. In any event, they are informed by what went before even if the old hegemon is eliminated from the roster of great powers. So the reset of world time is quite problematic from a historical point of view.

Ashley Tellis introduced the theory of hard realism in his doctoral dissertation, where we find a very different temporal structure. Tellis starts with the observation that the first-best solution to the threat posed by a great power rival is to eliminate it. Starting with a given roster of great powers, Tellis reasons that repeated struggles between the great powers would eliminate the weakest in each round thereby introducing a temporal structure in great power history that we can call system time. Put simply, the roster of great powers would feature a larger number of great powers earlier in system time compared to later. With each subsequent round the number of great powers would dwindle until it is reduced to a singleton.


Ashley Tellis (1961-)

System time here can thus be operationalized as the polarity of the system over the very long run. Whereas world time is cyclical, curling round like a Riemann surface, system time is linear but finite. It begins with the emergence of an international system and culminates in a unipolar world once all but a single power have been eliminated from the roster of great powers. The temporal pattern suggested by system time not only captures the deep temporal structure of Western history and the contemporary world-system, it is also consistent with the evidence of other international systems across time and space, as Kaufman, Little, and Wohlforth document.

While system time and world time are ahistorical but not teleological since they are derived from the logic of the theory. Both operate simultaneously under the surface of international relations, the first even more slowly than the second. Yet another temporal structure, a third arrow of history, is associated to the hockey-stick, that we may call exponential time. Put simply, the hockey-stick introduces a temporal logic of escalation in world politics. World struggles that take place later do so at levels of destruction and power potential an order of magnitude higher than the previous tournament. Exponential time is the temporal representation of the hockey-stick. It is nonlinear but bounded from above. What happens is that the destructive power eventually becomes so great that a world struggle spells the final and irrevocable coda of human history.

World time, system time, and exponential time exist in every instant of world politics. But they are not symmetrically situated. System time continues to unfold even as world time is reset with each hegemonic war. Even as system time increases linearly, exponential time increases exponentially. In the end exponential time catches up with both world time and system time. For when exponential time hits the upper bound, the iron logic of mutually-assured destruction frustrates the historical logic of both world time and system time by eliminating the very possibility of world war. For the dominant power can no longer eliminate the rising power through preventive war for fear of nuclear annihilation so the temporal logic of world time is frustrated. And since no first-rank power can be eliminated in a world struggle, the logic of system-time is frustrated as well.

Should the United States follow world time and launch a cold war against China? Should the US obey system time and eliminate China with a splendid first-strike while it still can? Or should the United States buy into the notion that the iron logic of mutually assured destruction has repealed the laws associated with world time and system time, and seek a modus vivendi with China in the secure knowledge that neither can be eliminated from first-rank? If we are guaranteed to end up in the impasse of strategic stalemate then why not skip the security competition and go straight to détente?


Cognitive Test Scores Measure Net Nutritional Status

At the heart of the racialist imaginary is the notion of the natural hierarchy of the races. Not only are there discrete types of humans, racialism insists, they are differentially endowed. Turn of the century high racialism construed this racial hierarchy in terms of a racial essence. This racial essence was supposed to control men’s character, merit, behavioral propensities, and capacity for refinement and civilization. To be sure, racial essence was thought of as multidimensional. But no educated Westerner at the turn of the century would beg to disagree with the notion that the races could be put into a natural hierarchy.

What explains the hold of high racialism on the turn of the century Western imaginary? Some of it was obviously self-congratulation. But that can’t be the whole story. There were some pretty smart people in the transatlantic world at the turn of the century. Why did they all find high racialism so compelling? Because critical thinkers interested in a question sooner or later find themselves sifting through the scientific literature, part of what needs explanation is the consensus on scientific racialism. Put another way, we should ask why the best-informed of the day bought into high racialism.

Broadly speaking, I think there were three factors at play. First, in the settler colonies and metropoles of the early modern world, migrant populations from far away found themselves living cheek-by-jowl with others. This created a visual reality of discrete variation out of what were in fact smoothly-varying morphologies. What were geographic clines reflecting morphological adaptation to the macroclimate in the Old World appeared to be races in the New World. In effect, early modern population history created a visual reality that begged to be described as a world of discrete races.

Second, and more important, was the weight of the taxonomic understanding of natural history. The hold of the taxonomic paradigm was so strong that it seemed to be the only way to comprehend the bewildering human variation revealed by the collision of the continents. The existence of specific races and their place in the natural hierarchy may be questioned but that racial taxonomy was a useful way to understand human variation was simply taken for granted. Unbeknownst to the best-informed of the day, this was a very strong assumption to make about the world.

Third, and most important, was the sheer weight of the explanandum. What made racial taxonomy so compelling was what it was mobilized to explain: the astonishing scale of global polarization. As Westerners contemplated the human condition at the turn of the century, the dominant fact that cried out for explanation was the highly uneven distribution of wealth and power on earth. It did really look like fate had thrust the responsibility of the world on Anglo-Saxon shoulders; that Europe and its offshoots were vastly more advanced, civilized and powerful that the rest of the world; that Oriental or Russian armies simply couldn’t put up a fight with a European great power; that six thousand Englishmen could rule over hundreds of millions of Indians without fear of getting their throats cut. The most compelling explanation was the most straightforward one. To the sharpest knives in the turn of the century drawer, what explained the polarization of the world was the natural hierarchy of the races.

It is this that distinguishes racialism from racism. The former is fundamentally an explanation of global polarization; the latter is a politico-ethical stance on the social and global order. In principle, it is possible to racialist without being racist but not vice-versa. In practice, however, few racialists could sustain politico-ethical neutrality on race relations.

During the nineteenth century, the discourse of Anglo-Saxon self-congratulation morphed from the traditional mode that saw Anglo-Saxons as blessed by Providence to the notion that they were biologically superior to all other races on earth. Driven by settler colonial racialism, the vision of the colorblind empire was definitely shelved by London in favor of a global racial order after the turn of the century. Things came to a head on the South Africa question where the settlers demanded apartheid. London gave in after a brief struggle. The resolution of the South Africa question in 1906 was a key moment in the articulation of the global color line.

The first real pushback against high racialism came from scholars at Columbia in the 1930s. Franz Boas and his students, most prominently Ruth Benedict, led the charge. They punctured the unchallenged monopoly of high racialism but the larger edifice survived into the Second World War. The discourse of high racialism collided with reality at the hinge of the twentieth century. As Operation Barbarossa began, Western statesman and intelligence agencies without exception expected the Soviet Union to collapse under the German onslaught in a matter of weeks. If France capitulated in six weeks, how could the Slav be expected to stand up to the Teuton for much longer? That the Slav could defeat the Teuton was practically unthinkable in the high racialist imaginary. Not only did the Soviet Union not collapse, it went on to single-handedly crush what was regarded as the greatest army the world had ever seen. This was because Stalinism proved to be a superior machine civilization than Hitlerism where it mattered—where it has always mattered to the West—on the battlefield. The Slav could indeed defeat the Teuton. The evidence from the battlefield required an unthinkable revision of the natural hierarchy of the races, directly antithetical to the core of the racialist imaginary, ie Germanic racial supremacism.

It would seem that Auschwitz, that great trauma of modernity, more than anything else pushed racialism beyond the pale. If so, it took surprisingly long. It was not until the sixties that racial taxonomy became unacceptable in the scientific discourse. Recall that it was in 1962 that Coon was booed and jeered at the annual meeting of the American Association of Physical Anthropologists, an association that had until quite recently been the real home of American scientific racialism. The anti-systemic turn of the sixties opened the floodgates to radical critiques of the mid-century social order and the attendant conceptual baggage, including a still-pervasive racialism.

It took decades before racialism was pushed beyond the boundaries of acceptable discourse. But by the end of the century a definite discipline came to be exercised in Western public spheres. In the Ivory Tower, a consensus had emerged that races did not reflect biological reality but were rather social constructs with all-too-often violent consequences. Whatever systematic differences that did exist between populations were considered to be trivial and/or irrelevant to understanding the social order. This consensus continues to hold the center even though it is fraying at the margins.

In fact, one can date the rise of neoracialism quite precisely. This was the publication of Murray and Herrnstein’s The Bell Curve in 1994. Although most of the book examined intelligence test scores exclusively for non-Hispanic White Americans and explored the implications of relentless cognitive sorting on the social order, critics jumped on the single chapter that replicated known results on racial differences in IQ. (Responding to the hullabaloo the American Psychological Association came out with a factbook on intelligence that was largely consistent with the main empirical claims of the book.) Herrnstein passed away around the time when the book came out. But, ever since then, Murray has been hounded by protestors every time he makes a public appearance. At Middlebury College last year, a mob attacked Murray and his interviewer, Professor Allison Stanger, who suffered a concussion after someone grabbed her hair and twisted her neck. I think we must see this aggressive policing of the Overton Window (the boundary of the acceptable discourse) as the defining condition of what I call neoracialism. It is above all a counter-discourse. Those espousing these ideas feel themselves to be under siege; as indeed they are.

Neoracialism retains the taxonomic paradigm of high racialism but it is not simply the reemergence of high racialism. For neoracialism is tied to two hegemonic ideas of the present that were nonexistent back when high racialism had the field to itself.

The first of these is the fetishization of IQ. The test score is not simply seen as a predictor of academic performance, for which there is ample evidence. (For the evidence from the international cross-section see Figure 1). It is seen much more expansively as a test of overall merit; as if humans were motor-engines and the tests were measuring horsepower. The fetish is near-universal in Western society; right up there with salary, the size of the house, and financial net worth. It is an impoverished view of man, sidelining arguably more important aspects of the human character: passion, curiosity, compassion, integrity, honesty, fair-mindedness, civility, and so on.


Figure 2. Source: Lynn and Meisenberg (2017).

The second hegemonic idea is the blind acceptance of the reductionist paradigm. Basically, behavior is reduced to biology and biology to genetics. Both are dangerous fallacies. The first reduction is laughable in light of what may be called the first fundamental theorem of paleoanthropology: What defines modern humans is behavioral plasticity, versatility, and dynamism untethered to human biology. In other words, modern humans are modern precisely in as much as their behavior is not predictable by biology.

The reduction of biology to genetics is equally nonsensical in light of what may be called the first fundamental theorem of epigenetics: Phenotypic variation cannot be reduced to genetics, and indeed, even the environment. For even after controlling for both there is substantial biological variation left unexplained. Not only is there substantial phenotypic variation among monozygotic twins (those who have identical genomes), even genetically-cloned microbes cultured in identical environments display significant phenotypic variation. The only way to make sense of this is to posit that subtle stochastic factors perturb the expression of the blueprint contained in DNA even under identical environmental conditions. This makes mincemeat out of the already philosophically-tenuous paradigm of reductionism.

So neoracialism is a counter-discourse in contemporary history that is rigidly in the grip of the three fallacies: that racial taxonomy gives us a good handle on human variation, that IQ is the master variable of modern society and the prime metric of social worth, and that DNA is the controlling code of the human lifeworld à la Dawkins. Because the last two are much more broadly shared across Western society, including much of the Left, the critique of neoracialism has been relatively ineffective.

But beyond the rigidities of the contemporary discourse, there is a bigger reason for the rise of neoracialism. Simply put, racialism was marginalized without replacement. The explanatory work that racialism was doing in making sense of the world was left undone. No alternate compelling explanation for global polarization was offered. Instead, under the banner of Modernization, population differences were simply assumed to be temporary and expected to vanish in short order under the onslaught of Progress. Indeed, even discussion of global polarization became vaguely racist and therefore unacceptable in polite company. With the nearly-uniform failure of the mid-century dream of Modernization, the door was thus left ajar for the resurrection of essentialist racial taxonomy to do the same explanatory work it had always performed. It is the absence of a scientific consensus on a broad explanatory frame for human polarization that is the key permissive condition for neoracialism.

A scientific consensus more powerful that neoracialism, based on thermoregulatory imperatives, is emerging that ties systematic morphological variation between contemporary populations to the Pleistocene paleoclimate on the one hand, and contemporary everyday living standards (nutrition, disease burdens, thermal burdens) on the other. Disentangling the two has been my obsession for a while. I finally found what those in the know already knew. Basic parameters of the human skeleton are adapted to the paleoclimate.

At the same time as these developments in paleoanthropology and economic history, recent progress in ancient-DNA research has highlighted the importance of population history. I tried to bring the paleoanthropology and population history literature into conversation by showing how population history explains European skeletal morphology over the past thirty thousand years. My argument is based on known facts about the paleoclimate during the Late Pleistocene and known facts about population history. The paleoclimate is the structure and population history is the dynamic variable. It is that which allows us to predict dynamics in Late Pleistocene body size variables. We were of course forced into this explanatory strategy by the brute fact that population history and the paleoclimate are the main explanatory variables available for the Pleistocene.

I do not mean to imply that technology and organization did not causally affect human morphology, eg we have ample evidence of bilateral asymmetry in arm length as an adaptation to the spear-thrower. But all such adaptations are superstructure over the basic structure of human skeleton that reflects morphological adaptation to the paleoclimate of the Pleistocene that began 2.6 Ma. In Eurasia in particular, it reflects adaptation to the macroclimate after the dispersal of Anatomically Modern Humans from Africa 130-50 Ka. Because the Late Pleistocene, 130-10 Ka, is so long compared to the length of time since the Secondary Products Revolution 5 Ka, and especially the Secondary Industrial Revolution 0.1 Ka, and despite the possibility that evolution may have accelerated in the historical era, the Late Pleistocene dominates the slowest-moving variables of the human skeleton. Indeed, I have shown that pelvic bone width and femur head diameter reflect adaptation to the paleoclimate of the region where the population spent the Late Pleistocene.

I feel that economic historians have been barking up the wrong tree. The basic problem with almost all narratives of the Great Divergence (as the historians frame it) or the exit from the Malthusian Trap (as the economists would have it) is that the British Industrial Revolution, 1760-1830, does not revolutionize everyday living standards in England. This is easy to demonstrate empirically whether one relies on per capita income, stature, or life expectancy. In general, the economic, anthropometric, and actuarial data is consistent with a very late exit from the Malthusian world; the hockey stick is a story of the 20th century.

The evidence is rather consistent with the hypothesis that the extraordinary polarization of living standards across the globe is a function of the differential spread of the secondary industrial revolution, 1870-1970, (senso stricto: the generalized application of powered machinery to work on farms, factory floors, construction sites, shipping, and so on; senso lato: the application of science and technology to the general problem of production and reproduction). So proximately, what needs to be explained is the spread of the secondary industrial revolution. Specifically, the main explanandum is this: Why is there a significant gradient of output per worker (and hence per capita income) along latitude? Why can’t tropical nations simply import the machinery necessary to increase their productivity to within the ballpark of temperate industrial nations and thereby corner the bulk of global production? Despite the wage bonus and the ‘second unbundling’, global production has failed to rebalance to the tropics. Why??

I proposed a simple framework that tied output per worker to the rate of intensity of the work performed on the same machine; and the rate of intensity of work performed to the thermal environment of the farm, factory floor, construction site, dockyard and so on—in accordance with the human thermal balance equation. This was not very original—the claim is consistent with known results in the physiological and ergonomics literature. What I am saying in effect is that the difference is not so much biology, education, or culture. To put it bluntly, educated and disciplined male, White, Anglo-Saxon workers from the MidWest would not be able to sustain the intensity of work performed on the same machine at the same rate in Bangladesh as in Illinois. Like the Bangladeshis, they would have to take frequent breaks and work less so as not to overheat. This mechanically translates into lower productivity and hence lower per capita income.

I appreciate the increasing attention to thermal burdens in light of global warming. Recently, Upshot had a fascinating report tying gun violence outdoors but not indoors (!!!) to temperature spikes. Earlier, in an extraordinary study, Harvard’s Goodman tied students’ test scores to the thermal burden on the day of the test. That goes some way towards explaining the gradient of latitude in the international cross-section of test scores—an uncomfortable empirical fact well outside the Overton Window that neoracialists insistently point to as empirical “proof” of the relevance of racial taxonomy to understanding the global order. We’ll return to the empirical evidence from the correlates of test scores presently.

Following in the footsteps of Murray and Herrnstein, Richard Lynn published The Global Bell Curve in 2008. It went to the heart of the matter. Here, global polarization is tied precisely to test scores. Some populations are rich and powerful, and others are poor and weak because, we are told, the former are cognitively more endowed than the latter. That’s the master narrative offered here. One finds different versions in other neoracialist accounts. Rushton claimed racial differences in cranial capacity, that we debunked. Wade finds racial taxonomy more persuasive than the geographic clines favored by geneticists. In what he calls his more speculative chapters, Wade does the full double reduction: differences in behavioral patterns are mobilized to explain the world order and DNA is mobilized to explain behavioral patterns. Gene-culture coevolution and other speculations are thrown around to explain global polarization.

The heart of neoracialism isn’t, What’s the controlling variable for human variation per se. The question at the heart of neoracialism is, What’s the controlling variable for human variation that is relevant to the social order, the global order, the manifest and multiple hierarchies of our lifeworld? A presumed innate hierarchy of the races in general ability is doing all the work in neoracialism for it is mobilized to explain all of global polarization in one fell swoop. Neoracialism looks for a master variable that explains the presumed rank ordering of human societies. Whence the fetishization of IQ (thought to be ultimately controlled by DNA, although all efforts to explain test scores by DNA have been frustrated). In the minds of neoracialists and those who are tempted to join them, it is test scores that explain the cross-section of per capita income. A lot is thus at stake in that equation. That’s the context of Lynn’s The Global Bell Curve.

The rigidities of the liberal discourse have meant that a very fruitful way of thinking about systematic variation in the test scores of human populations have been overlooked. We argue that test scores contain information on everyday living standards. Put simply, they are a substitute for per capita income, stature, or life expectancy. They measure net nutritional status which is a function of nutritional intake and expenditure on thermoregulation, work, and fighting disease. (Net nutritional status is just jargon for the vicious feedback loop between nutrition and disease; they must be considered jointly.) We show this by showing that the best predictors of test scores are the Infant Mortality Rate and animal protein (dairy, eggs and meat) intake. More generally, we show that all metrics of net nutritional status are strong predictors of test scores.

While it may be conceivable that variation in cognitive ability explains variation in per capita income, given the universal availability of modern medicine, the claim that variation in cognitive ability explains variation in the Infant Mortality Rate is really tenuous. Given the empirical correlation we document below, it is much more plausible that tropical disease burdens suppress test scores than vice-versa. In other words, it makes no sense to infer that the racial hierarchy supposedly revealed by test scores explains disease burdens, but it make ample sense to infer that disease burdens explain test scores. This is the crucial wedge of our intervention.

We begin our empirical analysis by noting the Heliocentric pattern of test scores. Table 1 displays Spearman’s rank correlation coefficients for test scores on the one hand and absolute latitude and Effective Temperatures on the other. Spearman’s coefficient is a distribution-free, robust estimator of the population correlation coefficient (r) and more powerful than Pearson’s coefficient. Effective Temperature is computed from maximum and minimum monthly averages via the formula in Binford (2001): ET=(18*max-10*min)./(max-min+8), where the max and min temperatures are expressed in Celsius. ET is meant to capture the basic thermal parameter of the macroclimate.

Table 1. Heliocentric polarization in test scores.
Spearman’s rank correlation coefficients.
N=86 IQ test score (measured) IQ test score (estimated) Educational Attainment
Absolute latitude 0.65 0.65 0.63
Effective Temperature -0.64 -0.62 -0.59
Source: Lynn and Meisenberg (2017), Trading Economics (2018), Binford (2001), author’s computations. Estimates in bold are significant at the 1 percent level. 

Note that Effective Temperature is just a function of absolute latitude (r=-0.949, p<0.001). Our estimate of the correlation coefficient between absolute latitude and measured IQ test scores is large and significant (r=0.654, p<0.001), implying a gradient so large that moving 10 degrees away from the equator increases expected test scores by 4 points. Effective Temperature is also a strong correlate of measured IQ (r=-0.639, p<0.001), implying that an increase in Effective Temperature by just 5 degrees reduces expected test scores by 11 points. The fundamental question for psychometry then is, What explains these gradients?

Answering this question requires pinning down the proximate causal structure of test scores. We argue that test scores measure net nutritional status. Table 2 marshals the evidence. We see that all measures of net nutritional status (Infant Mortality Rate, animal protein intake per capita, life expectancy, stature, protein intake per capita, and calorie intake per capita) are strong correlates of test scores. The strongest is Infant Mortality Rate (r=-0.859, p<0.001) which captures the vicious feedback-loop between nutrition and disease burdens. By itself, Infant Mortality Rate explains three-fourths of the variation in measured test scores reported by Lynn and Meisenberg (2017). The results are robust to using estimated test scores or Educational Attainment instead of measured test scores.

Table 2. Pairwise correlates of test scores.
Spearman’s rank correlation coefficients.
IQ test score (measured) IQ test score (estimated) Educational Attainment
Infant Mortality Rate (log) -0.86 -0.85 -0.84
Animal protein intake per capita 0.80 0.76 0.76
Life expectancy 0.76 0.68 0.70
Stature 0.74 0.74 0.73
Per capita income (log) 0.68 0.59 0.74
Protein intake per capita 0.64 0.82 0.63
Calorie intake per capita 0.54 0.67 0.57
Source: Lynn and Meisenberg (2017), World Bank (2014), Trading Economics (2018), FAO (2018), author’s computations. Estimates in bold are significant at the 1 percent level. 

Figure 2. Infant mortality rate (World Bank, 2014) predicts test scores (Lynn and Meisenberg, 2017).

Our estimate for the correlation between animal protein intake per capita and measured test scores is also extremely large (r=0.802, p<0.001). Astonishingly, each additional gram of animal protein intake per capita increases expected test scores by 0.4 points. By itself, animal protein intake explains two-thirds of the international variation in mean test scores. Although not as strong, calorie intake per capita (r=0.541, p<0.001) and protein intake per capita (r=0.649, p<0.001) are also strong correlates of test scores. The pattern suggests that the lower test scores of poor countries reflect lack of access to high-quality foods like eggs, dairy and meat.


Figure 3. Animal protein (FAO, 2018) predicts test scores (Lynn and Meisenberg, 2017).

The main import of the extremely high correlations between test scores on the one hand and Infant Mortality Rate (r=-0.859, p<0.001) and per capita protein intake (r=0.802, p<0.001) on the other is clear: Health insults control investment in cognitive ability. Energy and nutrition that could be channeled towards cognitive ability have to be diverted to dealing with health insults arising jointly from malnutrition and disease.

We have checked that stature is much more plastic than pelvic bone width. And we have shown that the divergence in stature is a story of the 20th century, ie it carries information of modern polarization. The strong correlation between test scores and stature (r=0.760, p<0.001) therefore suggests that test scores also contain information on modern polarization. The strength of the correlation between test scores and life expectancy (r=0.761, p<0.001) reinforces this interpretation.


Source: Lynn and Meisenberg (2017), Clio Infra (2018).

What Table 2 shows is that systematic variation in test scores between populations is a function of systematic variation in net nutritional status. The correlations make no sense if neoracialism is approximately correct, but they make ample sense if test scores reflect net nutritional status. If a country has low test scores you can be somewhat confident that it is poor (R^2=44%) but you can be much more confident that it faces malnutrition (R^2=64%) and especially high disease burdens (R^2=74%). This implies that the causal vector points the other way, from polarization to test scores. Far from explaining global polarization as in the high racialist imaginary, test scores are explained by inequalities in everyday living standards. The evidence from psychometry adds to other evidence of global polarization from economics, anthropometry, and demography that continues to demand explanation.

We have suggested that the current radio silence over systematic variation in test scores fosters neoracialism. We must break this silence and talk openly and honestly about such questions lest we leave the interpretation of these patterns to neoracialists. More generally, an effective rebuttal of neoracialism requires a more compelling explanation of global polarization. Given the discursive hegemony of science, I want to persuade progressives that this requires taking science as the point of departure. My wager is that a much more compelling picture is indeed emerging from the science itself that explains global polarization, and more generally, systematic variation in human morphology and performance, not in terms of racial taxonomy but rather in terms of the Heliocentric geometry of our lifeworld that structures thermoregulatory, metabolic, and epidemiological imperatives faced by situated populations.


The Story of Hominin, Part I: pre-Homo

In a contemplative article, Bernard Wood, the doyen of hominin taxonomy, identified three boundary problems in the business. The first is, of course, where to draw the line between ape and man; more precisely, the boundary between “ape-like” hominid and man-like hominin fossils. [Hominid includes the great apes; hominin does not.] It is not simply a matter of lineage for there are many species (by which we simply mean morphological types or taxa; not the biological definition—the ability to produce fertile offspring—which cannot be determined on the basis of extant evidence) in the fossil record that are neither the ancestors of modern humans nor that of modern chimps. So the question is above all where to place these dead-ends. In practice, the boundary is fuzzy and boils down to the degree to which the species is arboreal (adapted to living in the canopy) or bipedal. Committed bipeds are regarded as closer to modern humans; committed arboreals as closer to apes; with fossils displaying both adaptations somewhere in between.


Source: Wood and Collard (1999).

The second is the boundary of the genus Homo. Drawing this boundary is an equally fraught enterprise for now we are talking about demarcating which species can be regarded as human senso lato (in a loose sense)Committed bipedalism is not enough, it is reasonable to demand that species in our genus approach the human form if not behavior. Human-like behavior, especially tool manufacture, is of course enough to guarantee you a spot in the genus. But things are complicated due to the fact that multiple hominin species coexisted in eastern Africa when the first tools appear around 2.5 Ma (millions of years ago) so that it is impossible in general to attach the first industries to particular species. In practice, paleoanthropologists regard big brains as the ticket to entry with cranial capacity above 600 cubic centimeters (cc) regarded as the conventional boundary. The defining exception or marginal case is H. habilis (“handy man”, 2.5-1.4 Ma) which has definitely been tied to the first lithic technology but sports an average cranial capacity of only 552 cc. (Although Gamble reports an average of 609 cc. Compare figures 1 and 2.) We will presently return to encephalization in human evolution.


Source: Gamble (2013).

The third is the boundary between fully-human, H. sapiens senso stricto (in the strict sense), and near-human hominin. This is perhaps the most fraught boundary of the three. It is also uncomfortably close to the question of the origin of the races, that perennial obsession of high racialism. There is a virtual consensus that what distinguishes humans from other hominins is behavior—we are more sophisticated, behaviorally-plastic, and dynamic than those other guys. But this is very far from being free of problems. First, the appearance of anatomically modern humans considerably predates the evidence for behavioral modernity however defined, so that the origin of our species senso stricto is thereby shrouded in mystery. Second, some but not all late archaic hominins, in particular Neanderthals, are associated with advanced lithic technology (including Levallois tools) indistinguishable from contemporaneous anatomically modern humans (say around 100 Ka).

It would seem that there are two ways of resolving this boundary problem. Either we impose a less stringent criteria for behavioral modernity (say tool manufacture requiring multiple processes in a specific sequence) and therefore be generous with inclusion. Or we impose a more stringent criteria for behavioral modernity (say art, ornaments, burial, colonization of extreme environments, sea-faring, projectile weapons, and so on; in short, culture, versatility and dynamism) and be thereby stingy with inclusion. But both generous and stingy definitions are deeply-problematic.

For if you say advanced tool manufacture (say Acheulean tools c. 1 Ma, in particular, bifacial hand-axes) is sufficient criteria for inclusion then archaic hominin (including our ancestors) in the western Old World would be included but not those in eastern Eurasia for the spread of the Acheulean industry after 1 Ma was confined to west of the infamous but accurate Movius line. The eastern Old World, populated by late archaic hominins, continued to manufacture unsophisticated Oldowan tools developed c. 2.5 Ma for hundreds of thousands of years after Acheulean industry becomes dominant in the western Old World.

Movious Line.png

Source: Lycett and Bae (2010).

If on the other hand you say let’s be stingy and restrict H. sapiens senso stricto to much more recent hominin populations that display the full suite of modern behavior, then that would imply that populations in western Eurasia (Europe and the Near East) and Sahul (Australia, Tasmania, New Guinea) were fully-human by 50-40 Ka, tens of thousands of years before the rest of the world (say 20 Ka). No doubt this is extremely controversial. But the empirical evidence for global polarization in the Late Pleistocene is overwhelming. Basically, apart from some ephemeral early evidence in southern Africa around 90 Ka, the appearance of the full-package of behavioral modernity around 40 Ka is confined to western Eurasia to which the term Upper Paleolithic is properly applicable. To reach Sahul, of course, required sea-faring so that those populations were definitely fully-modern. A different term, Late Stone Age, applies to Africa from 50 Ka on. It is defined in terms of advanced lithic technology and does not sport evidence of the full-package of behavioral modernity. I’m walking on coals here … many Africanist prehistorians would be furious. But my goal is to problematize the modern-premodern dichotomy in prehistory.

My general point is that all three dichotomies are necessarily fuzzy. Any schema involving sharp boundaries in hominin taxonomy is guaranteed to be shot through with contradictions and glaring anomalies. In short, it’s a fool’s errand. In what follows we will not take a position on these boundary questions.

The title of the present dispatch is also a nod at another problem in narrative accounts of the human career. Namely, the temptation to take a teleological approach is very strong in this domain. ‘The Story of Man’ presumes that tracing how we got to where we are is a sufficient account of the hominin career. But that ignores the evolutionary dead-ends; more sympathetically, it ignores hominin forms and strategies different from ours that went out of business. The explosion of taxa is a testament to the extraordinary variation in hominin morphology (and therefore life history and survival strategy) that is not only interesting in itself, but also informs our own story. Just as a liberal order after the failure of Communism is quite different from the counterfactual without the Communist experiment, the story of us that emerges from a full consideration of alternative lifeways pursued by sister species is different and richer than a story that emerges from the tenuous assumption of telos in human evolution. In other words, we must tell the story of the others as well as of us because the story of the others informs the interpretation of our own story. What follows is a sketch of a broad-brush history of our tribe, the Hominini.

The story begins with the Planet of the Apes. During the Miocene, 23-5 Ma, the climate was much warmer and wetter. Siberia and Greenland were not glaciated and rainforests covered much of the Old World. Apes emerged out of Africa and colonized much of the Old World presumably jumping from canopy to canopy.


Source: Gamble (2013).

By the time of our last common ancestor with the chimps c. 7-6 Ma, it was still warm but the temperatures had fallen dramatically. Greenland was glaciated, the rainforests had receded, and apes had become confined again to Africa. This is where the first hominin began, very tentatively, to walk. The emergence of committed bipedalism was an excruciatingly slow process. Although we have scant fossil evidence for the period, it is clear that for millions of years, hominin refused to commit to bipedalism. They retained skeletal features like opposable toes that show that they were still arboreal and only occasional bipeds. Very little is known about early pre-australopith hominins c. 7-4 Ma other than they sport an ape-like morphology. Indeed, the only thing that distinguishes them from apes is that they were occasional bipeds. In fact, some experts suggest that they are too ape-like to be considered hominin. Regardless, one or more of these hominins, most likely from the genus Ardipithecus, evolved into Australopithecines, when things start to get really interesting.

Early Hominin

Source: Liberman (2013).

Towards the end of the Pliocene, 5.3-2.6 Ma, the climate became much cooler. Rainforests disappeared from eastern Africa and Woodland and savannah expanded. A vast number of hominin taxa suddenly explode in the fossil record at this time. Most of them have been placed in the genus Australopithecus c. 4-1 Ma. The most famous australopith, of course, is Lucy (named after the Beatles’ song Lucy in the Sky with Diamonds) who lived in Ethiopia 3.2 Ma and belonged to the taxon A. afarensis. It is clear from her skeleton that Lucy was an obligate biped, eg the absence of opposable toes. But the earliest evidence for obligate bipedalism is from the Laetoli footprints made c. 3.6 Ma that have also been associated with A. afarensis.

There is no consensus on why bipedalism emerged at this time. Some have claimed that bipedalism might have been a postural adaptation with those able to stand upright being able to gather more fruit. Others have suggested that selection of bipedalism was due to the thermodynamic efficiency of bipedal locomotion in the expanding savannah or woodland habitats where sustenance was more sparsely distributed. Still others have emphasized the thermoregulatory advantage of upright walking. It has been suggested that australopiths engaged in midday scavenging when competition from quadruped scavengers (disadvantaged because they expose a much greater surface area to the sun) was absent or less intense. In all cases, the logic leads straight to the question of foraging strategy and therefore diet. We’ll return to this question shortly.

Australopiths, like all early hominin, had small bodies and small brains. Australopith females, for instance, were just 1.1m tall and weighed 28-35kg. Male Australopiths averaged 1.4m in stature and weighed in at 40-50kg. Thus, males were about 40-50 percent larger than females. From figure 2 we see that the index of sexual dimorphism for Australopiths (1.53) is closer to gorillas (1.68) than modern humans (1.16). This suggests that male Australopiths fought each other for access to females.

With an average cranial capacity of just 464 cc, their brains were nearly as small as that of modern chimps. Their brains were small not just in absolute volume but also relative to body mass. Indeed, their encephalization quotient comes to just 2.6, or less than half as much as modern humans who sport an EQ of around 6-7. Given their brain size, Dunbar’s social brain hypothesis suggests a network size of 67 individuals, less than half that of contemporary human population (136). They also had a much faster life history; taking about 12 years to reach adulthood.

life history

Source: Liberman (2013).

While all early hominin were small-bodied and small-brained, they differed markedly in their dietary strategies. Both gracile Australopiths and robust Australopiths (the latter have recently secured their own genus, Paranthropus) ate a wide-variety of fruit, insects, leaves, tubers, roots, and the occasionally scavenged meat as attested by their greater molar size (compared to Ardipithecus). But what distinguishes the two is their masticatory (chewing) apparatus.


Source: Evans et al. (2016).

Taxa in the genus Paranthropus in general, and the taxon P. boisei in particular, were what Wood called megadonts. Their powerful masticatory muscles and very large molars allowed them to crush and grind hard foods such as nuts, seeds, roots, and tubers in the back of the jaw. Since the genus Homo emerged from the gracile Australopiths, the megadonts are the classic dead-end. None of their descendants survived. They vanish from the fossil record after 1.3 Ma. So the temptation is rather strong to see the roots of their doom in dietary specialization in hard-to-digest and poor quality foods. That temptation must be resisted. Microwear evidence from their tooth enamel suggests that their diets were just as varied as the gracile Australopiths. The decisive difference in dietary strategy between the two was in fallback foods, ie what they resorted to eating when their preferred food was unavailable. However, it is clear that their strategy did not generate the sort of feedback loop between foraging strategy, gut morphology, and encephalization that emerged in our lineage. That will be the subject of part II when we examine the career of the genus Homo. Stay tuned.


Source: Aiello and Wheeler (1995).





Population History and European Morphology since the Upper Paleolithic

Christopher Ruff, a paleoanthropologist at the Johns Hopkins University School of Medicine and the director of the Center for Functional Anatomy and Evolution, has been very generous with his time. He has helped me greatly in refining my understanding on human morphology and our population history. Ruff and coworkers have recently published Skeletal variation and adaptation in Europeans: Upper Paleolithic to the Twentieth Century, 2018. The study examines a total of 2,179 individual skeletons since the Upper Paleolithic beginning 33 thousand of years ago (henceforth, Ka). He has been kind enough to share their data with me. What follows is based on my interrogation of this data in light of our population history.

The basics of west Eurasian population history are now well-understood. The following account is based on the scheme presented in David Reich’s Who We Are and How We Got Here, 2018. The picture that emerges from ancient-DNA studies is straightforward. Basically, there are three major departures in west Eurasian population history. (Here we focus specifically on Europe.) The first is the arrival of Homo sapiens during the early Upper Paleolithic around 40 Ka into a continent already populated by Neanderthals. H. sapiens had already mixed with Neanderthal populations in the Near East immediately upon their exit from Africa. There was further admixture in Europe.

Estimates of the precise degree of admixture are quite sensitive to assumptions about the neutrality of genomic sequences acquired from the Neanderthals since even genes acquired in a single admixture event could rapidly get fixed throughout the population if they come under selection; conversely, the prevalence of genomic sequences not under selection contains information on the degree of admixture. Almost all estimates however fall into single digits. This is perhaps not because mating was infrequent despite near-continuous contact but rather because of the large disparity in population sizes. The colonizers were dramatically more populous than the natives so that very high degree of mixing for the latter is consistent with low rates of mixing for the former. (Similar to interracial marriage rates in the present-day United States.)

By the Last Glacial Maximum 26 Ka, the Neanderthals had long vanished. It is not clear if they went extinct or were simply absorbed into H. sapiens populations. Upper Paleolithic populations of Europe were already morphologically-adapted to the macroclimate with more northern populations displaying bigger bodies in accordance with Bergmann’s Rule. This population is basically swamped by a second pulse around 9 Ka when the Neolithic Revolution generates a major population pulse of farmers in the central Eurasian region who explode out in both easterly and westerly directions. The former would go on to found the Dravidian-Harappan Civilization. In Europe, the hunter-gatherers survived in isolated pockets; especially at northern latitudes. Since the Neolithic farmers came from the Near East their morphology was adapted to the much warmer macroclimate of the central region. In accordance with Bergmann’s Rule, we expect them to be smaller than the more cold-adapted populations of the Upper Paleolithic. We’ll presently see what the data has to say about this.

A third major population pulse was triggered by the Secondary Products Revolution in the fourth millenium BCE. In the central region, this dramatic transformation in the material possibility frontier gives rise to the very first state society at Uruk. The introduction of these advanced technologies—especially the wagon, as David Antony has argued—from the core of the Uruk world-system to the periphery, in this case, north of the Caucasus, makes the systematic economic exploitation of the sparsely-endowed steppe possible for the first time. This material revolution in the hitherto very sparsely-populated steppe is in turn responsible for the ethnogenesis of the Yamnaya, the speakers of Proto-Indo-European (the mother tongue whose descendents are spoken by half the world’s population today).

Yamnaya pastoralists explode outward almost immediately from their homeland. By 5 Ka, a massive population pulse reaches Europe, another India, and a third the Altai mountains in Kazakhstan. The Yamnaya are an extremely violent and hierarchical rank-society; obsessed with martial glory, competitive feasting, and other male bonding rituals. They conquer the first-farmers of Europe and eventually the isolated pockets of hunter-gatherers (in particular, in Scandinavia). These migrations are extremely sex-biased. Yamnaya warrior-pastoralists likely took the women and slaughtered the men in raids and skirmishes as the horizon moved inexorably westward.


Source: David Reich (2018).

The end-result of this population history is that contemporary European populations are a sex-biased admixture of Pleistocene hunter-gatherers, Neolithic farmers, and Yamyaya pastoralists; in the reverse order in terms of weight in the population structure. These three populations were morphologically-adapted to very different macroclimates during the Late Pleistocene. Specifically, the first can be expected to be adapted to local conditions in Europe that were highly polarized by latitude (southern Europe was never glacial whereas northern Europe witnessed the massive glacial-interglacial whipsaw), the second to the considerably warmer conditions of the Late Pleistocene in the Near East, and the last, the Yamyaya, to the more-permanently glacial conditions of the Russian steppe. We thus expect systematic time-variation in European morphology consistent with this population history. More precisely, we expect the slower-moving morphological parameters (eg, pelvic bone width, femur head diameter) to fall after the invasion of the farmers from the central Eurasian region and rise after the invasion of the pastoralists from the Eurasian steppe.

Figure 1 and 2 display the pelvic bone width of the skeletons in the Ruff et al. (2018) dataset. We have resized the points by the number of skeletons in the dataset for given region and period. We have also merged some periods in the original dataset for simplicity. [Early Upper Paleolithic (33-26 Ka) and Late Upper Paleolithic (22-11 Ka) have been folded into Upper Paleolithic (33-11 Ka); Mesolithic (11-6 Ka) and Neolithic (7-4 Ka) into Neolithic (11-4 Ka); Bronze (4-3 Ka) and Iron/Roman (2.3-1.7 Ka) into Yamnaya (4-1.7 Ka); Early Medieval (c. 600-950) and Late Medieval (c. 1000-1450) into Medieval; and Early modern (c. 1500-1850) and Very recent (c. 1900-2000) into Modern (c. 1500-2000).]


Figure 1. Pelvic bone width in Europe, Men. Source: Ruff et al. (2018).

The evidence that emerges is pretty unambiguous. For both men and women there is a significant fall in pelvic bone width during the Neolithic transition, and a substantial rise contemporaneous with the Yamnaya transition. Since there was no major population replacement in the Medieval-Modern passage, the decline in body size cannot be traced to population history. Note the French outlier that attenuates the modern decline for women. Without the outlier, the modern decline in women’s pelvic bone width would be as significant as men’s.


Figure 1. Pelvic bone width in Europe, Women. Source: Ruff et al. (2018).

Similar results hold if we look at femur head diameter which is also strongly canalized (very slow-moving). Femur head diameter is the main weight-carrying parameter of the human body and as slow to change as pelvic bone width.



The length of the thigh bone (femur) is much more developmentally-plastic than either pelvic bone width of femur head diameter. Yet, we know that even femur length (and hence stature) exhibits morphological adaptation to the macroclimate. The evidence that emerges from this dataset is consistent with our previous findings.



Finally, for the sake of completeness, we include graphs for stature. These ought to be congruent with the results for femur length since the former is a linear function of the latter.



The evidence that emerges is consistent with the idea that population history confounds the interpretation of the time-variation (as opposed to the just the cross-section as we have argued until now) of morphological parameters of the human body. In order to make valid inferences, population history must be kept in mind.

I constructed an index of body size by adding up the z-scores of femur head diameter and pelvic bone width. It is a less noisy measure of body-size that those considered above. The overall pattern revealed by the Body-Size Index is very compelling. We observe that size falls in the Upper Paleolithic-Neolithic passage and rises with the arrival of the Yamnaya precisely as predicted by population history. The modern decline does not correspond to any major population movement and therefore cannot be explained by population history.


The decrease in body-size is consistent with other evidence of gracialization during the transition to modernity. Men have not only become smaller, even their faces have become less aggressive (not to speak of manners and behavior). Could it be that changing social norms against the warrior code rewarded variants with traits less associated with aggression with greater reproductive success? Or did the rewards for body-size decline with better technology for grunt work? We don’t know. There is certainly a case to be made for a general process of gracialization related to modernity.

P.S. On second thoughts, it may not be wise to combine sexes after all. There is good reason to think that women’s pelvic bone width is more plastic than men’s because maternal mortality can adjust the latter quite rapidly. The best measure we have is men’s pelvic bone width. Here I graph the mean pelvic bone width of European men without combining periods. The evidence is consistent with the population history noted above. The overall pattern suggests strong gracialization after the Last Glacial Maximum, a further fall in body-size with the arrival of Neolithic farmers, a dramatic rise with the arrival of Yamnaya pastoralists; followed by slow upward drift until the end of the Middle Ages, strong gracialization in the early modern period, and very partial restoration after c. 1900. The big question thrown up by the present investigation is of course the dramatic decline in European body-size after the Black Death. But the overall elephant shaped pattern is very interesting as well.