Causal Inference from Linear Models

For the past few decades, empirical research has shunned all talk of causation. Scholars use their causal intuitions but they only ever talk about correlation. Smoking is “associated to” cancer, being overweight is “correlated with” higher morbidity rates, college education is the strongest “correlate of” Trump’s vote gains over Romney, and so on and so forth. Empirical researchers don’t like to use causal language because they think that causal concepts are not well-defined. It is a hegemonic postulate of modern statistics and econometrics that all falsifiable claims can be stated in the language of modern probability. Any talk of causation is frowned upon because causal claims simply cannot be cast in the language of probability. For instance, there is no way to state in the language of probability that smoking causes cancer, that the tides are caused by the moon or that rain causes the lawn to get wet.

Unfortunately, or rather fortunately, the hegemonic postulate happens to be untrue. Recent developments in causality—a sub-discipline of philosophy—by Judea Pearl and others, have made it possible to talk about causality with mathematical precision and use causal models in practice. We’ll come back to causal inference and show how to do it in practice after a brief digression on theory.

Theories isolate a portion of reality for study. When we say that Nature is intelligible, we mean that it is possible to discover Nature’s mechanisms theoretically (and perhaps empirically). For instance, the tilting of the earth on its axis is the cause of the seasons. It’s why the northern and southern hemispheres have opposite seasons. We don’t know that from perfect correlation of the tilting and the seasons because correlation does not imply causation (and in any case they are not perfectly correlated). We could, of course, be wrong, but we think that this is a ‘good theory’ in the sense that it is parsimonious and hard-to-vary—it is impossible to fiddle with the theory without destroying it. [This argument is due to David Deutsch.] In fact, we find this theory so compelling that we don’t even subject it to empirical falsification.

Yes, it is impossible to derive causal inference from the data with absolute certainty. This is because, without theory, causal inference from data is impossible, and theories on their part can only ever be falsified; never proven. Causal inference from data is only possible if the data analyst is willing to entertain theories. The strongest causal claims a scholar can possibly make take the form: “Researchers who accept the qualitative premises of my theory are compelled by the data to accept the quantitative conclusion that the causal effect of X on Y is such and such.”

We can talk about causality with mathematical precision because, under fairly mild regularity conditions, any consistent set of causal claims can be represented faithfully as causal diagrams which are well-defined mathematical objects. A causal diagram is a directed graph with a node for every variable and directed edges or arrows denoting causal influence from one variable to another, e.g., {X\longrightarrow Y} which says that Y is caused by X where, say, X is smoking and Y is lung cancer.

The closest thing to causal analysis in contemporary social science are structural equation models. In order to illustrate the graphical method for causal inference, we’ll restrict attention to a particularly simple class of structural equation models, that of linear models. The results hold for nonlinear and even nonparametric models. We’ll work only with linear models not only because they are ubiquitous but also for pedagogical reasons. Our goal is to teach rank-and-file researchers how to use the graphical method to draw causal inferences from data. We’ll show when and how structural linear models can be identified. In particular, you’ll learn which variables you should and shouldn’t control for in order to isolate the causal effect of X on Y. For someone with basic undergraduate level training in statistics and probability it should take no more than a day’s work. So bring out your pencil and notebook.

A note on attribution: What follows is largely from Judea Pearl’s work on causal inference. Some of the results may be due to other scholars. There is a lot more to causal inference than what you will encounter below. Again, my goal here is purely pedagogical. I want you, a rank-and-file researcher, to start using this method as soon as you are done with the exercises at the end of this lecture. (Yes, I’m going to assign you homework!)

Consider the simple linear model,

{\large Y := \beta X + \varepsilon }

where {\varepsilon} is a standard normal random variable independent of X. This equation is structural in the sense that Y is a deterministic function of X and {\varepsilon} but neither X nor {\varepsilon} is a function of Y. In other words, we assume that Nature chooses X and {\varepsilon} independently, and Y takes values in obedience to the mathematical law above. This is why we use the asymmetric symbol “:=” instead of the symmetric “=” for structural equations.

We can embed this structural model into the simplest causal graph {X\longrightarrow Y} , where the arrow indicates the causal influence of X on Y . We have suppressed the dependence of Y on the error {\varepsilon}. The full graph reads {X\longrightarrow Y \dashleftarrow\varepsilon}, where the dotted lines denotes the influence of unobserved variables captured by our error term. The path coefficient associated to the link {X\longrightarrow Y} is {\beta}, the structural parameter of the simple linear model. A structural model is said to be identified if the structural parameters can in principle be estimated from the joint distribution of the observed variables. We will show presently that under our assumptions the model is indeed identified and the path coefficient {\beta} is equal to the slope of the regression equation,


where {\rho_{YX}} is the correlation between X and Y and {\sigma_{X}} and {\sigma_{Y}} are the standard deviations of X and Y respectively.  {r_{YX}} can be estimated from sample data with the usual techniques, say, ordinary least squares (OLS).

What allows straightforward identification in the base case is the assumption that X and {\varepsilon} are independent. If X and {\varepsilon} are dependent then the model cannot be identified. Why? Because in this case there is spurious correlation between X and Y that propagates along the “backdoor path” {X\dashleftarrow\varepsilon\dashrightarrow Y}. See Figure 1.


Figure 1. Identification of the simple linear model.

Here’s what we can do if X and {\varepsilon} are dependent. We simply find another observed variable that is a causal “parent” of X (i.e., {Z\longrightarrow X} ) but independent of {\varepsilon}. Then we can use it as an instrumental variable to identify the model. This is because there is no backdoor path between Y and Z (which identifies {\alpha\beta} ) and X and Z (which identifies {\alpha}). See Figure 2.


Figure 2. Identification with an instrumental variable.

In that case, {\beta}  is given by the instrumental variable formula,


More generally, in order to identify the causal influence of X on Y in a graph G, we need to block all spurious correlation between X and Y. This can be achieved by controlling for the right set of covariates (or controls) Z. We’ll come to that presently. First, some graph terminology.

A directed graph is a set of vertices together with arrows between them (some of whom may be bidirected). A path is simply a sequence of connected links, e.g., {i\dashrightarrow m\leftrightarrow j\dashleftarrow k} is a path between i and k. A directed path is one where every node has arrows that point in one direction, e.g., {i\longrightarrow j\leftrightarrow m\longrightarrow k} is a directed path from i to k. A directed acyclic graph is a directed graph that does not admit closed directed paths. That is, a directed graph is acyclic if there are no directed paths from a node back to itself.

A causal subgraph of the form {i\longrightarrow m\longrightarrow j} is called a chain and corresponds to a mediating or intervening variable m between i and j. A subgraph of the form {i\longleftarrow m\longrightarrow j} is called a fork, and denotes a situation where the variables i and j have a common cause m. A subgraph of the form {i\longrightarrow m\longleftarrow j} is called an inverted fork and corresponds to a common effect. In a chain {i\longrightarrow m\longrightarrow j} or a fork {i\longleftarrow m\longrightarrow j}, i and j are marginally dependent but conditionally independent (where we condition on m). In an inverted fork {i\longrightarrow m\longleftarrow j} on the other hand, i and j are marginally independent but conditionally dependent (once we condition on m). We use family connections to talk in short hand about directed graphs. In the graph {i\longrightarrow j}, i is the parent and j is the child. The descendants of i are all nodes that can be reached by a directed path starting at i. Similarly, the predecessors of j are all nodes from which j can be reached by directed paths.

Definition (Blocking). A path p is blocked by a set of nodes Z if and only if p contains at least one arrow-emitting node that is in Z or p contains at least one inverted fork that is outside Z and has no descendant in Z. A set of nodes Z is said to block X from Y, written {(X\perp Y |Z)_{G}}, if Z blocks every path from X to Y.

The logic of the definition is that the removal of the set of nodes Z completely stops the flow of information from Y to X. Consider all paths between X and Y . No information passes through an inverted fork {i \longrightarrow m\longleftarrow j} so you can ignore the paths that contain inverted forks. Likewise, no information passes through a path without an arrow-emitting node so those can also be ignored. The rest of the paths are “live” and we must choose a set of nodes Z whose removal would block the flow of all information between X and Y along these paths. Note that whether Z blocks X from Y in a causal graph G can be decided by visual inspection when the number of covariates is small, say less than a dozen. If the number of covariates is large, as in many machine learning applications, a simple algorithm can do the job.

If Z blocks X from Y in a causal graph G, then X is independent of Y given Z. That is, if Z blocks X from Y then X|Z and Y |Z are independent random variables. We can use this property to figure out precisely which covariates we ought to control for in order to isolate the causal effect of X on Y in a given structural model.

Theorem 1 (Covariate selection criteria for direct effect). Let G be any directed acyclic graph in which {\beta} is the path coefficient of the link {X\longrightarrow Y}, and let {G_{\beta}} be the graph obtained by deleting the link {X\longrightarrow Y}. If there exists a set of variables Z such that no descendant of Y belongs to Z and Z blocks X from Y in {G_{\beta}}, then {\beta} is identifiable and equal to the regression coefficient {r_{YX\cdot Z}}. Conversely, if Z does not satisfy these conditions, then {r_{YX\cdot Z}} is not a consistent estimand of {\beta}.

Theorem 1 says that the direct effect of X on Y can be identified if and only if we have a set of covariates Z that blocks all paths, confounding as well as causal, between X and Y except for the direct path {X\longrightarrow Y}. The path coefficient is then equal to the partial regression coefficient of X in the multivariate regression of Y on X and Z,

{Y =\alpha_1Z_1+\cdots+\alpha_kZ_k+\beta X+\varepsilon.}

The above equation can, of course, be estimated by OLS. Theorem 1 does not say that the model as a whole is identified. In fact, the path coefficients associated the links {Z_{i}\longrightarrow Y} that the multivariate regression above suggests, are not guaranteed to be identified. The regression model would be fully identified if Y is also independent of {Z_{i}} given {\{(Z_{j})_{j\ne i}, X\}} in G_{i} for all {i=1,\dots,k}.

What if you wanted to know the total effect of X on Y ? That is, the combined effect of X on Y both through the direct channel (i.e., the path coefficient {\beta}) and through indirect channels, e.g., {X\longrightarrow W\longrightarrow Y} ? The following theorem provides the solution.

Theorem 2 (Covariate selection criteria for total effect). Let G be any directed acyclic graph. The total effect of X on Y is identifiable if there exists a set of nodes Z such that no member of Z is a descendant of X and Z blocks X from Y in the subgraph formed by deleting from G all arrows emanating from X. The total effect of X on Y is then given by {r_{YX\cdot Z}}.

Theorem 2 ensures that, after adjustment for Z, the variables X and Y are not associated through confounding paths, which means that the regression coefficient {r_{YX\cdot Z}} is equal to the total effect. Note the difference between the two criteria. For the direct effect, we delete the link {X\longrightarrow Y} and find a set of nodes that blocks all other paths between X and Y . For the total effect, we delete all arrows emanating from X because we do not want to block any indirect causal path of X to Y.

Theorem 1 is Theorem 5.3.1 and Theorem 2 is Theorem 5.3.2 in the second edition of Judea Pearl’s book, Causality: Models, Reasoning, and Inference, where the proofs may also be found. These theorems are of extraordinary importance for empirical research. Instead of the ad-hoc and informal methods currently used by empirical researchers to choose covariates, they provide a mathematically precise criteria for covariate selection. The next few examples show how to use these criteria for a variety of causal graphs.

Figure 3 shows a simple case (top left) {Z\longrightarrow X\longrightarrow Y} where the errors of Z and Y are correlated. We obtain identification by repeated application of Theorem 1. Specifically, Z blocks X from Y in the graph obtained from deleting the link {X\longrightarrow Y} (top right). Thus, {\alpha} is identified. Similarly, Y blocks Z from X in the graph obtained from deleting the link {Z\longrightarrow X} (bottom right). Thus, {\beta} is identified.


Figure 3. Identification when a parent of X is correlated with Y.

Figure 4 shows a case where an unobserved disturbance term influences both X and Y. Here, the presence of the intervening variable Z allows for the identification of all the path coefficients. I’ve written the structural equation on the top right and checked the premises of Theorem 1 at the bottom left. Note that the path coefficient of {U\dashrightarrow X} is known to be 1 in accordance with the structural equation for X. Hence, the total effect of X on Y equals {\alpha\beta+\gamma}.


Figure 4. Model identification with an unobserved common cause.

Figure 5 presents a more complicated case where the direct effect can be identified but not the total effect. The identification of {\delta} is impossible because X and Z are spuriously correlated and there is no instrumental variable or intervening available available.


Figure 5. A more complicated case where only partial identification is possible.

If you have reached this far, I hope you have acquired a basic grasp of the graphical methods presented in this lecture. You probably feel that you still don’t really know it. This always happens when we learn a new technique or method. The only way to move from “I sorta know what this is about” to “I understand how to do this” is to sit down and work out a few examples. If you do the exercises in the homework below, you will be ready to use this powerful arsenal for live projects. Good luck!


  1. Epidemiologists argued in the early postwar period that smoking causes cancer. Big Tobacco countered that both smoking and cancer are correlated with genotype (unobserved), and hence, the effect of smoking on cancer cannot be identified. Show Big Tobacco’s argument in a directed graph. What happens if we have an intervening variable between smoking and cancer that is not causally related to genotype? Say, the accumulation of tar in lungs? What would the causal diagram look like? Prove that it is then possible to identify the causal effect of smoking on cancer. Provide an expression for the path coefficient between smoking and cancer.
  2. Obtain a thousand simulations each of two independent standard normal random variables X and Y. Set Z=X+Y. Check that X and Y are uncorrelated. Check that X|Z and Y|Z are correlated. Ask yourself if it is a good idea to control for a variable without thinking the causal relations through.
  3. Obtain a thousand simulations each of three independent standard normal random variables {u,\nu,\varepsilon}. Let {X=u+\nu} and {Y=u+\varepsilon}. Create scatter plots to check that X and Y are marginally dependent but conditionally independent (conditional on u). That is, X|u and Y|u are uncorrelated. Project Y on X using OLS. Check that the slope is significant. Then project Y on X and u. Check that the slope coefficient for X is no longer significant. Should you or should you not control for u?
  4. Using the graphical rules of causal inference, show that the causal effect of X on Y can be identified in each of the seven graphs shown in Figure 6.
  5. Using the graphical rules of causal inference, show that the causal effect of X on Y cannot be identified in each of the eight graphs in Figure 7. Provide an intuitive reason for the failure in each case.

    Figure 6. Graphs where the causal effect of X on Y can be identified.


    Figure 7. Graphs where the causal effect of X on Y cannot be identified.


Regional Polarization and Trump’s Electoral Performance

Tom Edsall suggested that I look at the regional socioeconomic correlates of Trump’s electoral performance. Why that didn’t cross my mind before I know not. But here goes. 

Political polarization in the United States means that the overwhelming best predictor of a major party presidential candidate’s electoral performance is the performance of the previous candidate of the party. This was clearly the case in this election. [All data in this post is at the county level. The socioeconomic data is from GeoFRED while the vote count is from here.]


In what follows, therefore, we will look at the correlates of Trump’s performance relative to Mitt Romney’s in 2012. This is the cleanest way to control for partisan polarization. We’re going to examine the socioeconomic indicators of counties where Trump gained vote share compared to Romney.

Specifically, we will divide the counties into five buckets: Blowout, where Trump’s vote share was 5 percent below Romney’s; Major Loss, where Trump’s vote share was between 5 and 2.5 percent below Romney’s; Moderate Loss, where his vote share was between 2.5 and at par with Romney’s; Moderate Gain, where Trump increased the GOP’s share by less than 2.5 percent; Major Gain, where he increased it by between 2.5 and 5 percent; and finally, Land Slide, where Trump gained more than 5 percent relative to Romney.

More sophisticated strategies are certainly possible. But this strategy will allow us to visualize the data cleanly.

We begin with the number of counties. This chart is no surprise to anyone who watched the results on election night. A lot more of the map was colored red than in 2012. There was a major swing in a large number of counties.


But most such counties are very sparsely populated. The most populous counties actually went for Clinton at higher rates than they had gone for Obama in 2012. These two charts illustrate the GOP’s astonishing geographic advantage.


Let’s move on to socioeconomic variables. The next two charts show the median household income and per capita incomes averaged over all the counties in each of the six buckets. Both paint a consistent picture: Trump did worse than Romney in a typical affluent county, but did better than him in poorer counties. But neither was a strong correlate of Trump’s performance. Median household income and per capita income explain only 13 percent and 10 percent of the variation in Trump’s performance relative to Romney respectively.


The percentage of college graduates on the other hand, is a very strong predictor. It explains 35 percent of the variation in Trump’s relative performance. High school diploma rate is, however, a poor predictor. Still, counties where Trump did worse than Romney typically had higher percentages of people with high school diplomas.


Trump did better than Romney in counties where poverty and unemployment rates are relatively high. Although the gradient is not constant.


Similarly, Trump did well in counties where the proportion of people relying on food stamps is high.


But his performance was uncorrelated with crime rates. On the other hand, it was correlated with youth idleness rate—the percentage of 16-19 year olds who are neither working nor employed.


Similarly, counties where Trump improved on Romney’s performance had higher percentages of families with children that are single parent households.


Finally, Trump did worse than Romney in counties with positive net migration rates and he did better in counties with negative net migration rates. This is the only dynamic variable we have in the dataset. (The others are snapshots and do not tell how things are changing in the counties.) It is therefore very interesting to find a clean correlation between net migration rates and Trump’s relative performance. The upshot is that Trump did well in places that are hemorrhaging people.


A consistent picture emerges from all these charts. Trump got to the White House by outperforming Mitt Romney is counties that are less educated, have lower incomes and higher poverty rates, where a greater proportion of people rely on food stamps, where many young adults are idle and children are growing up in broken homes. This is the America that is getting left behind. People are quite literally leaving these counties for greener pastures.

We have yet to tackle the why of it all. Why has America become so regionally polarized? Is it global trade? Automation? Skill-biased technological change? The neoliberal policy consensus? The political economy of Washington, DC? A fairly coherent narrative can be constructed along any of these threads. It is much harder to evaluate their relative importance. And even harder to devise meaningful policy solutions.

While we quietly thank our stars that Trump is getting tamed by adult supervision, we cannot go back to ignoring fly-over country. For we now know quite well what happens when we do.






Zones of Poverty and Affluence in America

In BoBos in Paradise, David Brooks popularized the notion of Latte Towns: “upscale liberal communities, often in magnificent natural settings, often university-based, that have become the gestation centers for America’s new upscale culture.” Charles Murry, in Coming Apart, compiles a list of superzips where the affluent and the educated are concentrated:


Superzips in the United States. Source: Data by Charles Murray, compiled by Gavin Rehkemper.

On the other side of the great divide, we know about endemic poverty in Appalachia and, of course, the Deep South. Much of the doomed cohort analyzed by Case and Deaton is concentrated in these poverty belts.

Combined and uneven development has left America regionally polarized. This affects the politics of the nation and the country’s cohesiveness as a society. To better understand the challenges, it is important to map the regional polarization of America.

Before we come to the maps, a basic question needs to be considered. The affluent are concentrated in the superzips and the poor in the poverty belts, but what about the rest? Surely, the bulk of the population lives neither in zones of grinding poverty nor in zones of mass affluence. Are the rest of these zones homogeneous? Or is there internal structure in the middling bulk of America?

In order to answer this question, I looked at county-level socioeconomic data from GeoFRED. I wanted to see if the counties sorted themselves out into natural clusters. It turns out that there are four basic clusters of counties: Affluent, Middle America, Near-Poor, and Poor. These four clusters differ systematically from each other. Moreover, no matter which subset of socioeconomic indicators you use to do the sorting, you obtain very nearly the same clusters.


The Geography of Class in America
Poor Near-Poor Middle America Affluent
College Graduates 12% 16% 23% 37%
Some College 20% 25% 33% 47%
High School Graduates 75% 83% 89% 91%
Median Household Income 34,302 42,787 52,800 73,170
Per Capita Income 31,107 36,226 45,010 64,218
Unemployment rate 7% 6% 4% 4%
Single Parent Households 41% 34% 28% 25%
Inequality (ratio) 16% 13% 12% 13%
Poverty Rate 26% 18% 12% 9%
SubPrime Rate 37% 29% 22% 20%
Youth Idleness Rate 13% 10% 6% 5%
Food Stamps 27% 17% 10% 7%
Crime Rate (per thousand) 10 8 6 6
Population (millions) 23.0 78.5 134.0 80.7
Population share (sample) 7% 25% 42% 26%
No. of counties 582 1,177 1,077 231
Source: GeoFRED, author’s calculations.

Only 231 out of 3,067 counties can be classified as affluent. But they contain 81 million people, or a quarter of the US population. The median household income in these counties is 73,170. In affluent counties, 91 percent of adults have a high school diploma and 37 percent have college degrees. The poverty rate is 9 percent and only 7 percent of residents rely on food stamps. About a quarter of the families with children are single parent households. Only 5 percent of young adults aged 16-19 are neither studying nor working. The crime rate is low and the unemployment rate is below the national average.

Some 582 out of 3,067 counties can be classified as poor. They are home to 23 million people, or 7 percent of the US population. The median household income is 34,302; less than half that of the affluent counties. A quarter of adult residents in these counties lack a high school diploma and only 12 percent have college degrees. More than a quarter of residents fall below the poverty line and 27 percent rely on food stamps for survival. Some 41 percent of families with children are single parent households and 13 percent of young adults are neither studying nor working. The crime rate is high and the unemployment rate is above the national average.

The vast of bulk of US counties, 74 percent, are neither affluent nor poor. They contain 212 million people, almost exactly two-thirds of the US population. Of these 2,254 counties, 1,177 are near-poor. They are home to 78 million people, or 25 percent of the population. On almost any socioeconomic indicator, these counties are closer to the poor counties than the affluent ones.

Finally, there are 1,077 moderately affluent counties in Middle America. This is where the middling bulk of the US population—42 percent—lives. They are home to 134 million people, which is more than the population of Japan or Mexico. There is a significant gap in incomes and college graduation rates between moderately affluent and affluent counties. But on other socioeconomic indicators, they are not far apart.

Although affluent counties are sprinkled throughout the country, coastal United States is home to all multi-county clusters of mass affluence. A vast zone of affluence stretches across the northeastern seaboard, from the suburbs of DC all the up to Vermont.

Eastern zone of affluence

Inside this eastern zone of affluence there are two major clusters. One is centered around New York City. It is the richest, most populous cluster of counties in the United States. The City’s per capita income is nearly a hundred and sixty thousand dollars.

NYC zone of affluence

The second is centered on Washington, DC. The two suburban counties of Fairfax and Prince William are brown because GeoFRED does not have data on them. Both are easily affluent. According to the 2010 census, the median household incomes of Fairfax and Prince William counties were 105,416 and 91,098 respectively.

DC zone of affluence

The Western zone of affluence is centered on San Francisco and comparable in affluence to the DC area. It obeys the same distance decay law that characterizes the eastern zones of affluence: The closer one gets to the leading city the more affluent the area. Note that Marin County has a higher per capita income than San Francisco itself. Both have per capita incomes in six figures—a property shared by only 13 counties in the entire United States.

Western zone of affluence

On to the other side of the ledger. There are some counties in the Western United States with high poverty rates. But these counties are sparsely populated. Because they are geographically large, national maps provide a misleading picture to the naked eye. The exception is the cluster of high poverty rate counties in Arizona and New Mexico. At the center of the cluster of three dark-hued counties that visually dominate the map is Apache County, Arizona. (The narrow strip that runs north-south along the Arizona-New Mexico border.) Only 10 percent of Apache residents have a college degree; 26 percent don’t even have a high school diploma. Some 37 percent of residents are below the poverty line and rely on food stamps. Per capita income in the county is just shy of thirty thousand dollars. Nearly half the families with children are single parent households. An astonishing 55 percent of county residents have a credit score below 660, meaning that they are considered subprime.

Western poverty

Big multi-county clusters of widespread poverty are concentrated in the southeastern United States. There is a vast poverty belt stretching across the Deep South and another big cluster in Appalachia. You can walk a thousand miles from Texas to the eastern seaboard—say from Marion County, TX, to McIntosh County, GA—without stepping foot in any county with a poverty rate below 20 percent.

Eastern poverty

Kentucky has its own zone of wrenching poverty centered at Owsley County. In the map it is the one in the northern cluster of dark counties (where the poverty rate is more than 30 percent) that is surrounded on all sides by other dark counties. Here, 38 percent of the residents fall below the poverty line. The median household income is a mere 23,047. Only 11 percent of adults are college graduates and 41 percent lack a high school diploma. An astounding 55 percent of county residents rely on food stamps.

We have only scratched the surface of regional socioeconomic polarization in the United States. I will report again when I have more substantial results.


Theory of Primary State Formation

A ‘primary state’ or ‘pristine state’ is a first-generation state that evolves without contact with any preexisting states. The evolution of secondary states is strongly influenced by existing states. In particular, nonstate societies are always at risk of being conquered by neighboring states; they can emulate established states; and they can borrow techniques and know-how from preexisting states. All secondary state formation thus takes place in the context of preexisting states. In order to understand how states emerged in the first place, it is therefore important to restrict attention to primary states. We are only certain about six cases of primary state formation: Hierakonpolis in Upper Egypt, Uruk in Mesopotamia, Mohenjodaro in the Indus Valley, the Erlitou state in the Yiluo Basin in China, the Zapotec state in Mesoamerica, and the Moche state in the Andes. The earliest ones—in Mesopotamia and Egypt—emerged in the fourth millennium BCE. But before we examine primary state formation, we have to briefly review what came before.


Locations where primary state formation took place.

Fifty thousand years ago, behaviorally modern humans burst forth from Africa into Eurasia. By the end of the Pleistocene, they had eliminated archaic humans who had hitherto occupied the Eurasian landmass; and populated Northern Europe, Siberia, Australia and the Western Hemisphere—regions that had hitherto been devoid of people.[1] At this stage in human social evolution, societies were remarkably similar across the globe. Everywhere, people lived together in small, mobile bands—with no more than a few dozen individuals—of unspecialized hunter-gatherers. All practiced shamanism—abstract religious beliefs would have to wait until the Axial Age. There was no political authority to speak of. Leadership was not inherited but acquired. ‘Big men’ sometimes exercised coercion and leadership—but there was no ‘office’ of the chief that would have to be filled if the big man died or fell out of favor with the community. Not only were there no rulers, there was no class structure. For tens of thousands of years, human society was thoroughly egalitarian. Conflict between neighboring bands took the form of raids; there were no wars of conquest and subjugation.

The Neolithic Revolution witnessed the advent of permanent settlements, farming and animal husbandry. With agrarian wealth came social stratification. Social rank became hereditary. Big men increasingly hailed from the ranks of the elite. However, village communities retained their autonomy for a long time. The decisive breakthrough came with supravillage integration—the establishment of chiefdoms.

A chiefdom is defined as a centralized regional polity where authority is permanently centralized in the ‘office’ of the chief, which exists apart from the man who occupies it and is passed down from one generation to the next. Chiefdoms usually have populations in the thousands. There is a lot of variation among chiefdoms. Simple chiefdoms have just two levels of hierarchy (a small number of villages controlled by a center). Complex chiefdoms have three levels (villages clustered around towns controlled by a city.) A paramount chiefdom in an exceptionally powerful chiefdom that has subordinated others.


A paramount chiefdom is an exceptionally powerful chiefdom that has subordinated others.

While both chiefdoms and states feature centralized coercive authority, chiefly authority is non-bureaucratic—all authority rests in the office of the chief. In contrast, states possess internally specialized administrative organization—authority is partially delegated to administrators, tax collectors, police, judges, military commanders and so on.

While all primary states emerged from chiefdoms, it is wrong to think of the chiefdom as a political form that would naturally evolve into the state if left to its own devices. Indeed, only a few ever made the phase transition; the vast majority of chiefdoms did not.

The central question of primary state formation then is: Why, and under what conditions, did some chiefdoms make the transition to statehood?

The reason that the distinction between chiefdoms and states is important is because chiefdoms cannot be scaled up whereas states can and often did. Why can’t chiefdoms be scaled up? Wright (1977) argued that because authority in a chiefdom is not differentiated, any delegation of authority approaches total delegation; a situation ripe with potential for insubordination, insurrection, or fission. It is in the chief’s vital interest to avoid delegating authority, which means that he has to rule his entire domain from the center. As a consequence, there is an effective spatial limit to the territorial expansion of a chiefdom determined by the distance the chief, or the chief’s representative, could go from the center to the periphery of the domain and back on the same day.

The ruler of a state on the other hand, can dispatch subordinates—whose authority has been defined narrowly enough—to locations far from the capital to manage local affairs with little risk of insurrection. The delegation of partial authority thus allows the state to expand its territory well beyond the spatial limits associated with chiefdoms. Moreover, the optimal strategy for a state ruler is to divide and segment authority as much as possible and delegate wholeheartedly so as to minimize the likelihood of insurrection by subordinates.

The question of primary state formation then boils down to this: Given that it was in the vital interest of the chiefs to avoid delegating authority, why were some compelled to do so anyway and under what conditions did they succeed?

Spencer (1987) suggested that if a chief seeks to implement a new strategy of internal administrative specialization, the chances of success will be enhanced if the shift is made quickly and extensively. Spencer (2010) proposed a ‘territorial-expansion model’ whose basic idea is that territorial expansion is an essential, integral part of the process of primary state formation: Without territorial expansion beyond the spatial limit of a chiefdom there is no incentive for the chief to delegate partial authority and expansion beyond the spatial limit of a chiefdom is impossible without such delegation.

[Simultaneous internal specialization and expansion] will help ensure that the new parcels of authority are defined narrowly enough so that no dispatched administrative assistant in the new order enjoys sufficiently broad authority to foment a successful insurrection. From this perspective, we would expect an evolutionary transition from chiefdom to state to be marked by a qualitative shift in administrative principles and associated optimal regulatory strategies, representing a profoundly transformational process of change.

When we apply the territorial-expansion model to the empirical record of primary state formation, we should expect to find a close correspondence in time between the appearance of state institutions and a dramatic expansion of political-economic territory. This expectation, it should be noted, runs counter to the conventional idea that the territorial expansion of state control is a phenomenon that typically occurs well after the initial formation of the state, during what is sometimes called an “imperial” phase of development.

Spencer (2010) marshals impressive archaeological evidence to show that in all six known cases of primary state formation, the emergence of the primary state was concurrent with territorial expansion beyond the home region.

This is a very promising theory. But it raises important questions: Why did most chiefdoms fail to make this phase transition? Why did primary state formation take place in only densely populated regions? The short answer is that fear is a more important driver of primary state formation than greed. It was the struggle for survival with rival chiefdoms that compelled some chiefs to split the atom of chiefly power.

Redmond and Spencer (2012) argue that high levels of inter-polity competition provided the impetuous to rulers of paramount chiefdoms to develop the internally specialized administration of the state. They examine two paramount chiefdoms on the threshold of state formation of comparable size and complexity. The two chiefdoms differed markedly in one critical aspect of their inception: One was relatively isolated while the other was surrounded by rival chiefdoms.

…inter-polity competition was the key factor accounting for Monte Albán’s successful transition from complex chiefdom to [the Zapotec] state, as opposed to Cahokia’s short-lived attempt to cross that threshold. In Oaxaca, the presence of powerful rivals, less than a day’s travel to the south and east, placed a premium on effective administration and military prowess. Monte Albán was able to vanquish some of its rivals in short order, though others managed to resist Monte Albán’s expansionist designs for a considerable time before they too capitulated. To prevail in such a competitive context, Monte Albán had to develop a powerful military as well as an internally specialized administration that was capable of delegating partial authority to subordinate officials who implemented the strategies and policies of the central leadership. The leadership of Cahokia, by contrast, did not have to contend with such daunting rivals. As a consequence, there was relatively less pressure to experiment with the kinds of military and administrative innovations that might have led to the successful transition to statehood in the American Bottom.

Charles Tilly’s dictum regarding the formation of European national states—war made the state and the state made war—is equally valid for pristine states. The Wright-Spencer-Redmond theory of primary state formation explains precisely how war made the state.


[1] Characteristic of modern behavior was figurative art such as cave paintings; ornamentation using pigment and jewelry; the practice of burial; fishing; composite tools such as bows and arrows, darts, harpoons and axes; the use of bone, antler and hide; the invention of rope, fish hook and the eyed-needle; and, of course, blades manufactured from flint. This Great Leap Forward in human culture was likely the result of a single genetic mutation that conferred an innate capacity for complex language and abstract thought.


Wright, Henry T. “Recent Research on the Origin of the State.” Annual Review of Anthropology 6 (1977): 379-397.

Spencer, Charles S. “A mathematical model of primary state formation.” Cultural Dynamics 10.1 (1998): 5-20.

Spencer, Charles S. “Territorial expansion and primary state formation.” Proceedings of the National Academy of Sciences 107.16 (2010): 7119-7126.

Redmond, Elsa M., and Charles S. Spencer. “Chiefdoms at the threshold: The competitive origins of the primary state.” Journal of Anthropological Archaeology 31.1 (2012): 22-37.


Fed Independence, Trump Reflation, and the Primacy of the Dollar

This is part of an ongoing conversation with Adam Tooze.

There is a tension at the heart of US political economy. President Trump wants to reindustrialize America and create jobs for US workers. To that end, he has promised both big tax cuts and a huge investment program. A big fiscal shock is coming.

He has also attacked US’ trade partners for suppressing their currencies. He wants the dollar to weaken against the euro, the yen and the yuan, so that US manufacturers can compete in global product markets.

The problem is that deficit spending at home is expected to accelerate inflation, which would prompt to the Fed to hike faster, which in turn would strengthen the dollar. As part of the Trump reflation trade, the dollar has already strengthened in anticipation.


Dollar Index. Source: Bloomberg.

There are only two possible scenarios that could allow the Trump White House to square the circle. The first scenario is one where both the US macroeconomy and the Fed oblige. That is, US inflation could fail to accelerate despite the fiscal shock and the Fed could hold fire waiting to see the whites of inflation’s eyes. The first component wouldn’t be altogether surprising given that US inflation is driven not by domestic slack but by global slack. But given what we know about the Fed’s reaction function, the second—a dovish, patient Fed—is quite unlikely.

The second scenario is one where the Fed’s independence is compromised by Washington. With the resignation of Daniel Tarullo, Trump can now appoint three of the seven governors of the Fed immediately. (Monetary policy is decided not by the Board but by the FOMC which consists of the seven on the Board and five regional Fed presidents, always including the president of the New York Fed.) Yellen’s term also ends on Feb 3, 2018; at which point Trump could replace her with a lackey. In short, it is not inconceivable to see the Fed revert back to control by political masters.

The second scenario is not as likely as it appears either—despite the clear interest of the Trump White House and the opportunity to pack the Fed with Trump appointees. This is because the Senate has to confirm the appointments. While the Senate Republicans are not as crazy about Ayn Rand as those in the House, it is hard to see them falling behind a policy of packing the Fed with doves. In other words, the balance of power in Congress points in the opposite direction as the White House. If Trump succeeds in this endeavor, it would likely be with the support of Democrats in the Senate. And they would demand their own pound of flesh.

There is another reason to doubt the reflationary scenario. The Fed’s independence—secured by Volcker’s coup in 1979—has served the interests of  Wall Street well. Since the Trump administration is packed with Goldmanites, it is difficult to see them supporting an attack on the Fed’s independence. To be more precise, it is not clear that the big banking firms would pursue their long-term interests (and resist attacks on the Fed’s independence) or their short term interests (which would be well served by the steep yield curve attending a Trump reflation).

A related issue is that of financial deregulation. It is amply clear that the Trump White House and the Republican Congress are going to unshackle Wall Street. This solves at least one problem while risking another. The former is the global shortage of safe assets. Deregulation of Wall Street banks would allow them to expand balance sheet capacity and intermediate dollars to lend offshore via FX swaps. The latter is the risk of financial stability. As Tooze notes, unshackling dealer balance sheets may unleash a new, unsustainable credit boom.

There is of course an entirely different possibility suggested by McHenry’s letter to Yellen and more generally, the strength of the Ayn Rand fanatics in Congress. Namely: Congressional hawks could prevail in the battle for political control over the Fed and make it even more hawkish and its reaction function more formulaic by law (by demanding say that the Fed justify deviations from the Taylor Rule). That would doom any possibility of a great boom in real activity.

Tooze’s original discussion centered not on the political economy of the Federal Reserve per se but the impact of Trump’s economic nationalism on the dollar’s role as the hard currency of choice globally. Tooze mentions three areas of conflict:

 (1) [T]he tension between the dollar’s reserve role and the desire of the Trump administration to boost exports by increasing American “competitiveness” and talking down the dollar; (2) tensions around global bank regulation; (3) nationalist backlash from Trumpites in Congress against the role of the Fed as a global liquidity supplier, by way principally of the swap lines.

As for (1), it is not clear that dollar strength is a required to sustain the dollar’s role as the reserve currency. The dollar has been both stronger and weaker than it is now (or is expected to be) without compromising the willingness of other central banks to hold US dollars.


It is even less clear how (2) could work against the dollar’s preeminence. If US banking firms are subjected to less onerous regulations then the dollar’s share of international funding—already at 75 percent—would tend to increase with the balance sheet capacity of US banking firms.

A nationalist backlash against the Fed’s international activities as envisioned by (3) could potentially backfire on the dollar. In particular, if the Fed is forced to close down it’s swap lines the other hard currency issuing central banks would look for alternatives, which could result in solutions that undermine the dollar’s position. For instance, they could denominate their settlements in euros—a scenario that would be consistent with a major unraveling of the transatlantic alliance.

The Policy Tensor contends that the real threat to the dollar’s hegemony is the possibility of a global trade war which would usher in a more nationalized and a more regionalized world. A global trade war between the United States and China, not the victory of Marine Le Pen, is the most important political risk to financial markets of 2017.

The dollar’s role is due to a mutually-reinforcing combination of America’s command of the global maritime commons, the liquidity of US dollar funding markets, the depth of US capital markets, the network externalities of currency denomination (for invoicing and payments), the stability and predictability of US institutions, and the sheer weight of the United States in the world economy. It is mighty hard, even for an administration led by Donald Trump, to undermine the dollar’s role in the global monetary, financial and economic system. The closest competitor is the euro. And for the euro to merely bid for the dollar’s role in the world economy would take dramatic changes in the political economy of Europe and an outright breakdown of the Western alliance. We may possibly be headed in that direction but we are nowhere close to that scenario.


Can the Liberal International Order Survive President Donald J. Trump?


Weeks into his first term, President Donald J. Trump has already insulted Prime Minister Turnbull of Australia—a longstanding US ally; humiliated President Nieto—going as far as to suggest that he would to send troops to Mexico; issued a thinly-veiled military threat against Iran; banned nationals from seven Muslim-majority countries from stepping foot on US soil; and berated the EU and Nato enough to prompt the President of the European Council to issue a public letter warning against the threat posed by the new American president to the stability of Europe. He is, of course, just getting started.

In what follows I will argue that President Trump has turned the United States into a rogue state that poses an existential threat to the liberal international order. We begin by making these terms more precise.

Liberal international order

The liberal international order is the superstructure of global cooperation, integration and rules-based arbitration that sits atop the substructure of US hegemony. The foundation of this substructure is US military primacy. But just as the substructure contains more than the foundation, hegemony is more than primacy.

Whereas primacy is a statement about the asymmetry of global military power, hegemony is a statement about the foreign policy orientation of other powers. Primacy describes the fact that no state can expect to prevail against the dominant state in a war or an extended rivalry, whereas hegemony describes the willingness of other states to follow the lead of the dominant state. In other words, hegemony describes the politics of a near-unipolar world when relations between major powers are largely cooperative.

A liberal international order need not be based on a near-unipolar configuration of hard power. Indeed, the liberal international order of the late-nineteenth century was based on a stable balance of power. But underlying the contemporary liberal international order is a near-unipolar distribution of military strength. The important thing to keep in mind is that the politics of a near-unipolar world looks very different in the absence of hegemony.

US hegemony

US hegemony manifests itself first and foremost in the stability of the transatlantic and transpacific alliances. The Europeans and Japanese follow the US lead not only because they need US protection—if that were the case, these military alliances would’ve disbanded in 1991—but also because US preeminence is congruent with their core interests. The United States acts as a guarantor not only of US capitalism but also of global capitalism—this has been the job of the “excess capacity” of US power since the second world war. The United States has allowed its major power protectorates full access to the US market. And the US has largely refrained from leveraging its hard power advantage to secure its parochial geoeconomic interests. It is these accommodations by the United States that have held the transatlantic and transpacific alliances together even after the capitulation of the Soviet Union.

Japan and Europe are not the only protectorates of the United States. US security guarantees cover most of maritime Asia including South Korea, Taiwan, the Philippines, Singapore, Thailand, Australia and New Zealand. All the oil monarchies of the gulf are likewise US protectorates; as are Israel and Turkey.

The US allows these states to pursue their national interests unless they conflict with vital interests of the United States. In particular, the US allows them to compete against US firms in global markets. In return, US protectorates largely follow the US lead on major politico-military issues.

Maritime primacy allows the United States to secure the world’s sea lanes. The US has generally allowed all states in the international system to have unimpeded commercial access to the maritime commons. Minor powers can, of course, do little about this dependence. But the cooperation of other major powers is contingent on the continued provision of unimpeded access to the maritime commons.

Open Door

The most important feature of the liberal international order is the openness of national markets to foreign penetration. The core of the world economy—US, Europe and Japan—was largely open to trade in the early postwar period. In the course of the late twentieth century, other nations were persuaded—by the US above all—to open up their economies. The process culminated with the accession of China to the World Trade Organization at the turn of the century. Tariffs have never been lower, and more generally, the global trading system has never been more open.

Global trade integration over the past twenty years has gone considerably beyond the opening of national product markets to global competition. What we have witnessed has been described as the ‘second unbundling’ of global production characterized by ‘vertical trade’ in intermediate goods and services. Multinational firms have supply chains which spill over vast cross-border regional networks. More generally, global value chains—including both intra-firm supply chains and arms-length, subcontracting vertical chains—criss-cross borders largely within the three main networks of global production: Factory North America, Factory Europe and Factory Asia.

These vast regional value chains look like hub-and-spoke networks connecting headquarter economies—US, Germany and Japan—to factory economies within a day’s travel distance around them. (Headquarter economies reimport—i.e., offshore processes—a lot more than they reexport while factory economies reexport a lot more than they reimport. Also, headquarter economies have a large number of partners while factory economies are heavily dependent on their nearest advanced technology manufacturing giant.) Among the four giant manufacturers, only the United States and China are important suppliers globally. China’s export pattern is the most globalized of the G4.

Deep integration—the harmonization of commercial rules, standards, taxes, intellectual property rights and policies between states—and global value chains are mutually reinforcing. The main purpose and result of the so-called regional trade agreements such as Nafta has been to enhance deep integration. A roll-back of deep integration would undermine global value chains.

Another pillar of the liberal international order is the highly-integrated global banking and financial system. International finance is critical for the provision of working capital for international trade. The global periphery relies on FDI from the center countries for access to technology and know-how and on core funding markets for hard currency credit. Global banks are the key conduits through which hard currency credit is transmitted internationally. Global banks fund themselves largely in dollar-denominated wholesale funding markets and intermediate these funds to regional banks, who in turn lend to local firms. Global and regional nonfinancial firms also tap capital markets directly. The entire architecture of international finance depends on a high degree of cooperation on unimpeded capital flows and investor rights.

International cooperation

The liberal international order extends to global and regional cooperation on a slew of issues beyond security, economics and finance: law enforcement, aid, health, food safety, science, education, conservation, arms control, fishing, nuclear energy, climate change, civil aviation and so on and so forth.

Much has been gained by institutionalized international cooperation since the second world war. The thing to keep in mind is that it took a great deal of time and effort to persuade countries to sign up for these institutional mechanisms of international cooperation, as anyone who followed the WTO negotiations can recall. Even though most nations had little leverage, many could and did bargain, resist and drag their feet. Most countries may not be truly sovereign, but they do have autonomy—this is especially true of other major powers. The United States did try to bully other states many times and generally failed in the endeavor. Military power proved to be no substitute for persuasion.

Notice that I have not included liberal democracy in pinning down the liberal international order. This is a conscious departure from contemporary usage on my part. I would like to focus on concrete issues rather than ideological agendas or fall prey to liberal triumphalism.

Regular readers may be surprised to see a realist like the Policy Tensor waxing lyrical about the liberal international order. The incongruity is only apparent. The realist claim is not that liberal international orders don’t exist or that they are not beneficial. It is that they cannot override the underlying logic of geopolitical competition. Whereas liberal internationalists believe that liberal international orders can secure great power peace, realists have no such illusions. International orders are epiphenomena—they can and do fall prey to harsher logics underneath; logics that they cannot alter.

The Trump shock

President Donald J. Trump is a hard realist. He sees international relations as fundamentally a zero-sum game. As opposed to hard realist scholars, Trump’s worldview is decentered: He has strong views on the US national interest which he firmly believes that the president should try his mightiest to secure, but he is only dimly aware that there are other states in the international system with their own interests and agendas. Moreover, he believes that in order to secure the US national interest, all he has to do is intimidate everyone else into submission. In other words, he overestimates the efficacy of US hard power and seems to be entirely unaware of the value of speaking softly. In short, Trump is a bully.

Laderman and Simms’ timely Donald Trump: The Making of a Worldview documents the core beliefs of the President. He has consistently believed since the 1980s that the world is “laughing at us”; that the US doesn’t get the “respect” that it deserves; that adversaries and allies alike are “ripping us off”; that the US is “losing” because its leaders are “idiots.” Trump is an unabashed economic nationalist. His main beef against the liberal international order is that China and other economic rivals steal American jobs by outcompeting US firms by unfair means—currency manipulation and “sharp practice” (presumably dumping and export subsidies). If you became president, asked O’Reilly on Fox News in 2011, whose butt would you kick? Trump answered:

I would say China “number one.”

… We have all of the power. All of the chips are on our side. The trillion, it’s actually $1.1 trillion that they have, forget it, that’s peanuts compared to the overall economy. Now, what I would say very strongly, you don’t start behaving, 25 percent tax on every item you sell in this country. Twenty-five percent right now. By the way, based on what you are doing, it should be 41 percent….

It will put China out of business. We have all the cards and chips. If that ever happened, they would have a depression the likes of which you have never seen. They cannot play the game. We can. 

Laderman and Simms show that this is not a theme Trump picked up recently. He has repeatedly emphasized it since the 1980s. This is a bedrock belief of the man. There is no reason to believe that in his moment of vindication he would suddenly change his core beliefs. The upshot is that Trump is likely to fire the first shot in a trade war against China.

A potential trade war between the world’s two largest and most globally-integrated economies is the most significant threat to the liberal international order posed by President Trump. Not only because of China’s systemic importance—it is practically everyone’s largest trade partner—but also because the potential for geopolitical, and indeed, military confrontation. In extant military power the world may be near-unipolar but in potential warmaking capabilities the world is already near-bipolar.

Trump’s bullying is particularly unsuitable for dealing with Chinese leaders who cannot allow domestic audiences to watch China’s national honor trampled on by the US President. In other words, China is near-certain to respond aggressively both to Trump’s bullying and to US policies that harm its national economic interest.

More generally, all statesman face a ‘double security dilemma.’ In order to maintain themselves in power at home, they must keep domestic audiences in mind when dealing with the statesman of other nations. Put another way, statesman face domestic political constraints on their ability to conduct diplomacy and strike deals with foreign powers—something that Trump seems entirely unaware of. Indeed, the leaders of Britain, Australia and Mexico have already had to bear a backlash at home for their interactions with Trump.

Trump has declared his intention to slap a 20 per cent tariff on imports from Mexico. Because the bulk of Mexican exports to the United States are essentially reexports, this will hurt a substantial number of US manufacturing firms, especially in the auto industry. Deep integration in Factory North America is dead in the water.

An important point to note is that tariffs are much more disruptive for global value chains than for trade in final goods because the former often involve crossing multiple borders and even the same border multiple times. Beyond the disruption induced by the snapping of value chains, the protectionist measures would make US manufacturers less competitive in tradable product markets. The net effect on American jobs could possibly be negative. What is certain is that US consumers would end up paying higher prices for tradable goods.

Most macroeconomists understand that global imbalances are the result of differential rates of savings and investment among nations. In other words, the US current account deficit (which is mathematically equal to the capital account surplus) is the result of Asian and European underconsumption not the natural or artificial competitiveness of their firms against their US rivals. According to the proponents of the savings glut hypothesis, the main culprit for the US current account deficit is not Mexico but Germany. I used to subscribe to the savings glut hypothesis but since then I have come to realize its limitations. An alternate explanation of the US current account deficit is the shortage of safe assets hypothesis, which also explains the asset mix on both sides of the US balance sheet. US external liabilities are mostly low yield safe instruments such as Tbills while US external assets are high-yield risk assets such as FDI and equity. This means that the United States provides insurance to the rest of the world while earning greater returns on its external assets than it pays on its liabilities. Does that sound like “losing”?

It is entirely conceivable that Trump will try to bully the Europeans on their trade surplus as well. That’s not going to go down particularly well in Berlin. The Europeans do have as many “cards and chips” as the United States when it come to trade. The European market is after all a bigger prize than the US market. Contentious transatlantic trade relations will also interact with the Brexit negotiations. So one way for the EU to get back against the Anglo-Americans is for it to undermine the City of London’s position as the dominant financial center in Europe. More generally, it is not difficult to see the Europeans push the euro as the hard currency of choice for global investors—especially in light of Trump’s threat to the independence of the Fed.

Such measures are especially likely if Trump keeps encouraging the breakup of the European Union. Vocal support by the US President of eurosceptics such as France’s Le Pen will further undermine the transatlantic alliance. In fact, a breakup of the Western alliance is no longer beyond the realm of possibility. The EU could very well respond to the breakdown of the transatlantic security alliance by finally pulling together and emerging as unitary security actor—which would automatically make it a peer of the United States. Before we get to that point however, transatlantic relations will have to deteriorate significantly.

More immediately, if Trump tries to intimidate the Europeans into reneging on the nuclear deal with Iran, the Europeans will not simply submit.  Meaningful sanctions against Iran can be ruled out without European participation. If the United States goes to war against Iran—which is not hard to envision in light of the sharp worsening of relations between the two republics—the Europeans are likely to be loudly against the adventure. In turn, that may very well prompt the Trump White House to take a harsher line against the EU. There are very many ways in which things can go south for the Western alliance.

An even bigger threat to the liberal international order is a potential breakup of the eurozone. The most direct route to such a scenario is the rise to power of Marine Le Pen in France. The breakup of the eurozone would create significant instability and increase Russian influence on the continent. It may very well herald the return of history to Europe and with it an immediate end of the liberal international order.

The core of my argument is that even in a near-unipolar world, there are other powers who have their own interests and agendas. Trump’s unrestrained pursuit of his geoeconomic agenda directly undermines the implicit agreement between the United States and its major power allies whereby the other powers accept US preeminence and follow the US lead in exchange for accommodation of their core interests and US restraint on using its preponderance of military power as leverage against them. The unrestrained pursuit of economic nationalism by the United States will therefore undermine US hegemony. Dominance without hegemony in turn poses an existential risk to the Open Door and international cooperation, that is, to the liberal international order.

It is possible to imagine others taking up the baton of the Open Door and international cooperation. Indeed, in an astonishing speech at the World Economic Forum, President Xi Jinping effectively suggested that China is ready to take up the baton of the open global economy. One cannot rule out a joint EU-China bid to sustain the Open Door while the United States goes though this unprecedented bout of insanity—a more proactive version of hunkering down so to speak. However, this rosy scenario is mighty hard to entertain for long since such a bid would be a direct challenge to US preeminence. President Trump is likely to respond by lashing out in unpredictable ways.

A near-unipolar world where the dominant power is a rogue state is a highly unpredictable world. In such a world, all bets are off.


How I Learned to Stop Worrying and Love the Bomb

The President-Elect promised on Twitter to “greatly strengthen and expand” America’s nuclear capability, generating predictable opprobrium from the usual suspects. The arena of nuclear weapons is especially prone to myths and fallacies. Even the greatest of minds, starting from Noam Chomsky down, have deeply misunderstood the role that nuclear weapons have played since their discovery. In this essay I will try to dispel some of these myths and argue that nuclear weapons are in fact weapons of peace and a force for stability in the modern world.

The first myth is that the number of nuclear warheads is a meaningful metric of nuclear capabilities. So you have this infographic from the grey lady which suggests that US and Russian nuclear capabilities have vastly reduced since the height of the Cold War. Nothing could be further from the truth.


Nuclear warfighting capabilities depend first and foremost on the range and accuracy of delivery platforms (ICBMs, long-range cruise missiles, long-range bombers, field artillery and so on) and intelligence, reconnaissance and surveillance (ISR) capabilities required for detection and targeting of the adversary’s nuclear forces. For the principal goal of nuclear warfighting is to disarm the adversary; not attack his population centers. This is because once the adversary is disarmed attacking his cities is unnecessary to obtain political obedience, and destroying all his cities while failing to destroy his nuclear forces guarantees nuclear annihilation.

Imagine the United States circa 2016 with its 7,000 nukes squaring off against the United States circa 1966 with its 32,000 warheads. It would not be a fair fight. US-2016’s reconnaissance-strike complex would allow it disarm US-1966 in a splendid first-strike well before the latter could mobilize its bombers and ICBMs. US-2016 would not, of course, consider initiating a first-strike except under the gravest of circumstances. This is because US-2016 could never be absolutely certain of destroying all of US-1966’s nuclear forces. In fact, that is the point of US-1966’s vast nuclear arsenal. The thirty thousand warheads are not meant to be used but are meant to enhance the survivability of the deterrent against a surprise overwhelming counterforce strike by the adversary.

That also brings us to a much more important question: Is mutual strategic nuclear deterrence stable? The question is more precise than it may seem at first sight. We are not talking about situations where a nuclear-armed state seeks to deter a conventional attack by a non-nuclear weapons state (which would not be mutual). We are not talking about the tactical use of field nuclear weapons in a limited nuclear war (which would not be strategic); nor are we talking about the question of extended deterrence where one seeks to deter the adversary from attacking a protectorate (also not strategic).

There is indeed only one scenario under which mutual strategic nuclear deterrence can be said to fail: An all-out thermonuclear war. In other words, if strategic nuclear deterrence is extraordinarily stable then the existential threat posed by nuclear weapons is minor; if it is not, then advanced human civilization has so far survived the discovery of nuclear weapons by sheer dumb luck. The stability of strategic nuclear deterrence is therefore the most important question of them all.

The question is more interesting than it looks at first sight. Strategic nuclear deterrence is extraordinarily stable when both parties have a secure second-strike capability. As long as neither party can expect to destroy the adversary’s deterrent by launching a surprise first-strike with near-certainty, the stability of strategic nuclear deterrence is not in doubt. The bar for a second-strike capability is extremely low. There need only be an iota of a doubt that a splendid first-strike will fail to eliminate all of the inferior party’s nuclear forces for the superior party to be deterred from ever attempting a first-strike.

Only with highly asymmetric capabilities can one begin to conceive of scenarios where deterrence is not stable. This would happen in a crisis scenario where the weaker party may face a “use it or lose it” situation in the face of the adversary’s overwhelming counterforce capabilities. For instance, in an armed confrontation over Taiwan, China may fear that the United States is about to launch a (perhaps purely conventional) disarming first-strike on its command, control and launch capabilities (as the US war strategy called AirSea Battle calls for). Although even in this most extreme of scenarios it is hard see why China would commit suicide for fear of death.

The most interesting implication of these observations is that nuclear arms races, in fact, enhance the stability of strategic nuclear deterrence. Conversely, increasing asymmetry in nuclear warfighting capabilities is destabilizing. The idea that nuclear arms races between great powers increase the risk of thermonuclear war is as wrongheaded as it is pervasive. The increasing asymmetry in the nuclear capabilities between the unipole and the lesser great powers is a more serious cause of concern. Lieber and Press have argued that the United States has sought, and more controversially still, attained nuclear primacy: The US can destroy Russia’s entire nuclear arsenal with near-certainty. (Although Russia has since modernized its nuclear forces.)

This brings us to the final delusion concerning nuclear weapons: Global Zero. Even if all nuclear stockpiles could be eliminated in their entirety, that would not remove the existential threat posed by nuclear weapons since states would retain nuclear weapons know-how and the silos would get refilled in the event of a major confrontation. In fact, it would surely undermine strategic nuclear deterrence since great powers would face a strong incentive to be the first to reacquire a nuclear arsenal and in fact threaten its use to blackmail their adversaries into capitulation. Global Zero is an extremely unstable configuration—the most likely path to a nuclear holocaust. It is hard to think of a more counterproductive idea.

I am not saying that the risk of nuclear accidents or inadvertent nuclear use or “broken arrow” or jihadi-general scenarios are not a cause of worry. They are indeed. I’m all for tighter control over nuclear weapons. But these questions of safety and control tend to overshadow the overwhelming benefit of nuclear weapons. By effectively ruling out all-out war between the most powerful states, nuclear weapons have ushered in an unprecedented and open-ended era of great power peace. Only those ignorant of the extraordinary toll of hegemonic war can see nuclear weapons as anything other than an overwhelming force for stability in the modern world.