Causal Inference from Linear Models

For the past few decades, empirical research has shunned all talk of causation. Scholars use their causal intuitions but they only ever talk about correlation. Smoking is “associated to” cancer, being overweight is “correlated with” higher morbidity rates, college education is the strongest “correlate of” Trump’s vote gains over Romney, and so on and so forth. Empirical researchers don’t like to use causal language because they think that causal concepts are not well-defined. It is a hegemonic postulate of modern statistics and econometrics that all falsifiable claims can be stated in the language of modern probability. Any talk of causation is frowned upon because causal claims simply cannot be cast in the language of probability. For instance, there is no way to state in the language of probability that smoking causes cancer, that the tides are caused by the moon or that rain causes the lawn to get wet.

Unfortunately, or rather fortunately, the hegemonic postulate happens to be untrue. Recent developments in causality—a sub-discipline of philosophy—by Judea Pearl and others, have made it possible to talk about causality with mathematical precision and use causal models in practice. We’ll come back to causal inference and show how to do it in practice after a brief digression on theory.

Theories isolate a portion of reality for study. When we say that Nature is intelligible, we mean that it is possible to discover Nature’s mechanisms theoretically (and perhaps empirically). For instance, the tilting of the earth on its axis is the cause of the seasons. It’s why the northern and southern hemispheres have opposite seasons. We don’t know that from perfect correlation of the tilting and the seasons because correlation does not imply causation (and in any case they are not perfectly correlated). We could, of course, be wrong, but we think that this is a ‘good theory’ in the sense that it is parsimonious and hard-to-vary—it is impossible to fiddle with the theory without destroying it. [This argument is due to David Deutsch.] In fact, we find this theory so compelling that we don’t even subject it to empirical falsification.

Yes, it is impossible to derive causal inference from the data with absolute certainty. This is because, without theory, causal inference from data is impossible, and theories on their part can only ever be falsified; never proven. Causal inference from data is only possible if the data analyst is willing to entertain theories. The strongest causal claims a scholar can possibly make take the form: “Researchers who accept the qualitative premises of my theory are compelled by the data to accept the quantitative conclusion that the causal effect of X on Y is such and such.”

We can talk about causality with mathematical precision because, under fairly mild regularity conditions, any consistent set of causal claims can be represented faithfully as causal diagrams which are well-defined mathematical objects. A causal diagram is a directed graph with a node for every variable and directed edges or arrows denoting causal influence from one variable to another, e.g., {X\longrightarrow Y} which says that Y is caused by X where, say, X is smoking and Y is lung cancer.

The closest thing to causal analysis in contemporary social science are structural equation models. In order to illustrate the graphical method for causal inference, we’ll restrict attention to a particularly simple class of structural equation models, that of linear models. The results hold for nonlinear and even nonparametric models. We’ll work only with linear models not only because they are ubiquitous but also for pedagogical reasons. Our goal is to teach rank-and-file researchers how to use the graphical method to draw causal inferences from data. We’ll show when and how structural linear models can be identified. In particular, you’ll learn which variables you should and shouldn’t control for in order to isolate the causal effect of X on Y. For someone with basic undergraduate level training in statistics and probability it should take no more than a day’s work. So bring out your pencil and notebook.

A note on attribution: What follows is largely from Judea Pearl’s work on causal inference. Some of the results may be due to other scholars. There is a lot more to causal inference than what you will encounter below. Again, my goal here is purely pedagogical. I want you, a rank-and-file researcher, to start using this method as soon as you are done with the exercises at the end of this lecture. (Yes, I’m going to assign you homework!)

Consider the simple linear model,

{\large Y := \beta X + \varepsilon }

where {\varepsilon} is a standard normal random variable independent of X. This equation is structural in the sense that Y is a deterministic function of X and {\varepsilon} but neither X nor {\varepsilon} is a function of Y. In other words, we assume that Nature chooses X and {\varepsilon} independently, and Y takes values in obedience to the mathematical law above. This is why we use the asymmetric symbol “:=” instead of the symmetric “=” for structural equations.

We can embed this structural model into the simplest causal graph {X\longrightarrow Y} , where the arrow indicates the causal influence of X on Y . We have suppressed the dependence of Y on the error {\varepsilon}. The full graph reads {X\longrightarrow Y \dashleftarrow\varepsilon}, where the dotted lines denotes the influence of unobserved variables captured by our error term. The path coefficient associated to the link {X\longrightarrow Y} is {\beta}, the structural parameter of the simple linear model. A structural model is said to be identified if the structural parameters can in principle be estimated from the joint distribution of the observed variables. We will show presently that under our assumptions the model is indeed identified and the path coefficient {\beta} is equal to the slope of the regression equation,


where {\rho_{YX}} is the correlation between X and Y and {\sigma_{X}} and {\sigma_{Y}} are the standard deviations of X and Y respectively.  {r_{YX}} can be estimated from sample data with the usual techniques, say, ordinary least squares (OLS).

What allows straightforward identification in the base case is the assumption that X and {\varepsilon} are independent. If X and {\varepsilon} are dependent then the model cannot be identified. Why? Because in this case there is spurious correlation between X and Y that propagates along the “backdoor path” {X\dashleftarrow\varepsilon\dashrightarrow Y}. See Figure 1.


Figure 1. Identification of the simple linear model.

Here’s what we can do if X and {\varepsilon} are dependent. We simply find another observed variable that is a causal “parent” of X (i.e., {Z\longrightarrow X} ) but independent of {\varepsilon}. Then we can use it as an instrumental variable to identify the model. This is because there is no backdoor path between Y and Z (which identifies {\alpha\beta} ) and X and Z (which identifies {\alpha}). See Figure 2.


Figure 2. Identification with an instrumental variable.

In that case, {\beta}  is given by the instrumental variable formula,


More generally, in order to identify the causal influence of X on Y in a graph G, we need to block all spurious correlation between X and Y. This can be achieved by controlling for the right set of covariates (or controls) Z. We’ll come to that presently. First, some graph terminology.

A directed graph is a set of vertices together with arrows between them (some of whom may be bidirected). A path is simply a sequence of connected links, e.g., {i\dashrightarrow m\leftrightarrow j\dashleftarrow k} is a path between i and k. A directed path is one where every node has arrows that point in one direction, e.g., {i\longrightarrow j\leftrightarrow m\longrightarrow k} is a directed path from i to k. A directed acyclic graph is a directed graph that does not admit closed directed paths. That is, a directed graph is acyclic if there are no directed paths from a node back to itself.

A causal subgraph of the form {i\longrightarrow m\longrightarrow j} is called a chain and corresponds to a mediating or intervening variable m between i and j. A subgraph of the form {i\longleftarrow m\longrightarrow j} is called a fork, and denotes a situation where the variables i and j have a common cause m. A subgraph of the form {i\longrightarrow m\longleftarrow j} is called an inverted fork and corresponds to a common effect. In a chain {i\longrightarrow m\longrightarrow j} or a fork {i\longleftarrow m\longrightarrow j}, i and j are marginally dependent but conditionally independent (where we condition on m). In an inverted fork {i\longrightarrow m\longleftarrow j} on the other hand, i and j are marginally independent but conditionally dependent (once we condition on m). We use family connections to talk in short hand about directed graphs. In the graph {i\longrightarrow j}, i is the parent and j is the child. The descendants of i are all nodes that can be reached by a directed path starting at i. Similarly, the predecessors of j are all nodes from which j can be reached by directed paths.

Definition (Blocking). A path p is blocked by a set of nodes Z if and only if p contains at least one arrow-emitting node that is in Z or p contains at least one inverted fork that is outside Z and has no descendant in Z. A set of nodes Z is said to block X from Y, written {(X\perp Y |Z)_{G}}, if Z blocks every path from X to Y.

The logic of the definition is that the removal of the set of nodes Z completely stops the flow of information from Y to X. Consider all paths between X and Y . No information passes through an inverted fork {i \longrightarrow m\longleftarrow j} so you can ignore the paths that contain inverted forks. Likewise, no information passes through a path without an arrow-emitting node so those can also be ignored. The rest of the paths are “live” and we must choose a set of nodes Z whose removal would block the flow of all information between X and Y along these paths. Note that whether Z blocks X from Y in a causal graph G can be decided by visual inspection when the number of covariates is small, say less than a dozen. If the number of covariates is large, as in many machine learning applications, a simple algorithm can do the job.

If Z blocks X from Y in a causal graph G, then X is independent of Y given Z. That is, if Z blocks X from Y then X|Z and Y |Z are independent random variables. We can use this property to figure out precisely which covariates we ought to control for in order to isolate the causal effect of X on Y in a given structural model.

Theorem 1 (Covariate selection criteria for direct effect). Let G be any directed acyclic graph in which {\beta} is the path coefficient of the link {X\longrightarrow Y}, and let {G_{\beta}} be the graph obtained by deleting the link {X\longrightarrow Y}. If there exists a set of variables Z such that no descendant of Y belongs to Z and Z blocks X from Y in {G_{\beta}}, then {\beta} is identifiable and equal to the regression coefficient {r_{YX\cdot Z}}. Conversely, if Z does not satisfy these conditions, then {r_{YX\cdot Z}} is not a consistent estimand of {\beta}.

Theorem 1 says that the direct effect of X on Y can be identified if and only if we have a set of covariates Z that blocks all paths, confounding as well as causal, between X and Y except for the direct path {X\longrightarrow Y}. The path coefficient is then equal to the partial regression coefficient of X in the multivariate regression of Y on X and Z,

{Y =\alpha_1Z_1+\cdots+\alpha_kZ_k+\beta X+\varepsilon.}

The above equation can, of course, be estimated by OLS. Theorem 1 does not say that the model as a whole is identified. In fact, the path coefficients associated the links {Z_{i}\longrightarrow Y} that the multivariate regression above suggests, are not guaranteed to be identified. The regression model would be fully identified if Y is also independent of {Z_{i}} given {\{(Z_{j})_{j\ne i}, X\}} in G_{i} for all {i=1,\dots,k}.

What if you wanted to know the total effect of X on Y ? That is, the combined effect of X on Y both through the direct channel (i.e., the path coefficient {\beta}) and through indirect channels, e.g., {X\longrightarrow W\longrightarrow Y} ? The following theorem provides the solution.

Theorem 2 (Covariate selection criteria for total effect). Let G be any directed acyclic graph. The total effect of X on Y is identifiable if there exists a set of nodes Z such that no member of Z is a descendant of X and Z blocks X from Y in the subgraph formed by deleting from G all arrows emanating from X. The total effect of X on Y is then given by {r_{YX\cdot Z}}.

Theorem 2 ensures that, after adjustment for Z, the variables X and Y are not associated through confounding paths, which means that the regression coefficient {r_{YX\cdot Z}} is equal to the total effect. Note the difference between the two criteria. For the direct effect, we delete the link {X\longrightarrow Y} and find a set of nodes that blocks all other paths between X and Y . For the total effect, we delete all arrows emanating from X because we do not want to block any indirect causal path of X to Y.

Theorem 1 is Theorem 5.3.1 and Theorem 2 is Theorem 5.3.2 in the second edition of Judea Pearl’s book, Causality: Models, Reasoning, and Inference, where the proofs may also be found. These theorems are of extraordinary importance for empirical research. Instead of the ad-hoc and informal methods currently used by empirical researchers to choose covariates, they provide a mathematically precise criteria for covariate selection. The next few examples show how to use these criteria for a variety of causal graphs.

Figure 3 shows a simple case (top left) {Z\longrightarrow X\longrightarrow Y} where the errors of Z and Y are correlated. We obtain identification by repeated application of Theorem 1. Specifically, Z blocks X from Y in the graph obtained from deleting the link {X\longrightarrow Y} (top right). Thus, {\alpha} is identified. Similarly, Y blocks Z from X in the graph obtained from deleting the link {Z\longrightarrow X} (bottom right). Thus, {\beta} is identified.


Figure 3. Identification when a parent of X is correlated with Y.

Figure 4 shows a case where an unobserved disturbance term influences both X and Y. Here, the presence of the intervening variable Z allows for the identification of all the path coefficients. I’ve written the structural equation on the top right and checked the premises of Theorem 1 at the bottom left. Note that the path coefficient of {U\dashrightarrow X} is known to be 1 in accordance with the structural equation for X. Hence, the total effect of X on Y equals {\alpha\beta+\gamma}.


Figure 4. Model identification with an unobserved common cause.

Figure 5 presents a more complicated case where the direct effect can be identified but not the total effect. The identification of {\delta} is impossible because X and Z are spuriously correlated and there is no instrumental variable or intervening available available.


Figure 5. A more complicated case where only partial identification is possible.

If you have reached this far, I hope you have acquired a basic grasp of the graphical methods presented in this lecture. You probably feel that you still don’t really know it. This always happens when we learn a new technique or method. The only way to move from “I sorta know what this is about” to “I understand how to do this” is to sit down and work out a few examples. If you do the exercises in the homework below, you will be ready to use this powerful arsenal for live projects. Good luck!


  1. Epidemiologists argued in the early postwar period that smoking causes cancer. Big Tobacco countered that both smoking and cancer are correlated with genotype (unobserved), and hence, the effect of smoking on cancer cannot be identified. Show Big Tobacco’s argument in a directed graph. What happens if we have an intervening variable between smoking and cancer that is not causally related to genotype? Say, the accumulation of tar in lungs? What would the causal diagram look like? Prove that it is then possible to identify the causal effect of smoking on cancer. Provide an expression for the path coefficient between smoking and cancer.
  2. Obtain a thousand simulations each of two independent standard normal random variables X and Y. Set Z=X+Y. Check that X and Y are uncorrelated. Check that X|Z and Y|Z are correlated. Ask yourself if it is a good idea to control for a variable without thinking the causal relations through.
  3. Obtain a thousand simulations each of three independent standard normal random variables {u,\nu,\varepsilon}. Let {X=u+\nu} and {Y=u+\varepsilon}. Create scatter plots to check that X and Y are marginally dependent but conditionally independent (conditional on u). That is, X|u and Y|u are uncorrelated. Project Y on X using OLS. Check that the slope is significant. Then project Y on X and u. Check that the slope coefficient for X is no longer significant. Should you or should you not control for u?
  4. Using the graphical rules of causal inference, show that the causal effect of X on Y can be identified in each of the seven graphs shown in Figure 6.
  5. Using the graphical rules of causal inference, show that the causal effect of X on Y cannot be identified in each of the eight graphs in Figure 7. Provide an intuitive reason for the failure in each case.

    Figure 6. Graphs where the causal effect of X on Y can be identified.


    Figure 7. Graphs where the causal effect of X on Y cannot be identified.


Regional Polarization and Trump’s Electoral Performance

Tom Edsall suggested that I look at the regional socioeconomic correlates of Trump’s electoral performance. Why that didn’t cross my mind before I know not. But here goes. 

Political polarization in the United States means that the overwhelming best predictor of a major party presidential candidate’s electoral performance is the performance of the previous candidate of the party. This was clearly the case in this election. [All data in this post is at the county level. The socioeconomic data is from GeoFRED while the vote count is from here.]


In what follows, therefore, we will look at the correlates of Trump’s performance relative to Mitt Romney’s in 2012. This is the cleanest way to control for partisan polarization. We’re going to examine the socioeconomic indicators of counties where Trump gained vote share compared to Romney.

Specifically, we will divide the counties into five buckets: Blowout, where Trump’s vote share was 5 percent below Romney’s; Major Loss, where Trump’s vote share was between 5 and 2.5 percent below Romney’s; Moderate Loss, where his vote share was between 2.5 and at par with Romney’s; Moderate Gain, where Trump increased the GOP’s share by less than 2.5 percent; Major Gain, where he increased it by between 2.5 and 5 percent; and finally, Land Slide, where Trump gained more than 5 percent relative to Romney.

More sophisticated strategies are certainly possible. But this strategy will allow us to visualize the data cleanly.

We begin with the number of counties. This chart is no surprise to anyone who watched the results on election night. A lot more of the map was colored red than in 2012. There was a major swing in a large number of counties.


But most such counties are very sparsely populated. The most populous counties actually went for Clinton at higher rates than they had gone for Obama in 2012. These two charts illustrate the GOP’s astonishing geographic advantage.


Let’s move on to socioeconomic variables. The next two charts show the median household income and per capita incomes averaged over all the counties in each of the six buckets. Both paint a consistent picture: Trump did worse than Romney in a typical affluent county, but did better than him in poorer counties. But neither was a strong correlate of Trump’s performance. Median household income and per capita income explain only 13 percent and 10 percent of the variation in Trump’s performance relative to Romney respectively.


The percentage of college graduates on the other hand, is a very strong predictor. It explains 35 percent of the variation in Trump’s relative performance. High school diploma rate is, however, a poor predictor. Still, counties where Trump did worse than Romney typically had higher percentages of people with high school diplomas.


Trump did better than Romney in counties where poverty and unemployment rates are relatively high. Although the gradient is not constant.


Similarly, Trump did well in counties where the proportion of people relying on food stamps is high.


But his performance was uncorrelated with crime rates. On the other hand, it was correlated with youth idleness rate—the percentage of 16-19 year olds who are neither working nor employed.


Similarly, counties where Trump improved on Romney’s performance had higher percentages of families with children that are single parent households.


Finally, Trump did worse than Romney in counties with positive net migration rates and he did better in counties with negative net migration rates. This is the only dynamic variable we have in the dataset. (The others are snapshots and do not tell how things are changing in the counties.) It is therefore very interesting to find a clean correlation between net migration rates and Trump’s relative performance. The upshot is that Trump did well in places that are hemorrhaging people.


A consistent picture emerges from all these charts. Trump got to the White House by outperforming Mitt Romney is counties that are less educated, have lower incomes and higher poverty rates, where a greater proportion of people rely on food stamps, where many young adults are idle and children are growing up in broken homes. This is the America that is getting left behind. People are quite literally leaving these counties for greener pastures.

We have yet to tackle the why of it all. Why has America become so regionally polarized? Is it global trade? Automation? Skill-biased technological change? The neoliberal policy consensus? The political economy of Washington, DC? A fairly coherent narrative can be constructed along any of these threads. It is much harder to evaluate their relative importance. And even harder to devise meaningful policy solutions.

While we quietly thank our stars that Trump is getting tamed by adult supervision, we cannot go back to ignoring fly-over country. For we now know quite well what happens when we do.






Zones of Poverty and Affluence in America

In BoBos in Paradise, David Brooks popularized the notion of Latte Towns: “upscale liberal communities, often in magnificent natural settings, often university-based, that have become the gestation centers for America’s new upscale culture.” Charles Murry, in Coming Apart, compiles a list of superzips where the affluent and the educated are concentrated:


Superzips in the United States. Source: Data by Charles Murray, compiled by Gavin Rehkemper.

On the other side of the great divide, we know about endemic poverty in Appalachia and, of course, the Deep South. Much of the doomed cohort analyzed by Case and Deaton is concentrated in these poverty belts.

Combined and uneven development has left America regionally polarized. This affects the politics of the nation and the country’s cohesiveness as a society. To better understand the challenges, it is important to map the regional polarization of America.

Before we come to the maps, a basic question needs to be considered. The affluent are concentrated in the superzips and the poor in the poverty belts, but what about the rest? Surely, the bulk of the population lives neither in zones of grinding poverty nor in zones of mass affluence. Are the rest of these zones homogeneous? Or is there internal structure in the middling bulk of America?

In order to answer this question, I looked at county-level socioeconomic data from GeoFRED. I wanted to see if the counties sorted themselves out into natural clusters. It turns out that there are four basic clusters of counties: Affluent, Middle America, Near-Poor, and Poor. These four clusters differ systematically from each other. Moreover, no matter which subset of socioeconomic indicators you use to do the sorting, you obtain very nearly the same clusters.


The Geography of Class in America
Poor Near-Poor Middle America Affluent
College Graduates 12% 16% 23% 37%
Some College 20% 25% 33% 47%
High School Graduates 75% 83% 89% 91%
Median Household Income 34,302 42,787 52,800 73,170
Per Capita Income 31,107 36,226 45,010 64,218
Unemployment rate 7% 6% 4% 4%
Single Parent Households 41% 34% 28% 25%
Inequality (ratio) 16% 13% 12% 13%
Poverty Rate 26% 18% 12% 9%
SubPrime Rate 37% 29% 22% 20%
Youth Idleness Rate 13% 10% 6% 5%
Food Stamps 27% 17% 10% 7%
Crime Rate (per thousand) 10 8 6 6
Population (millions) 23.0 78.5 134.0 80.7
Population share (sample) 7% 25% 42% 26%
No. of counties 582 1,177 1,077 231
Source: GeoFRED, author’s calculations.

Only 231 out of 3,067 counties can be classified as affluent. But they contain 81 million people, or a quarter of the US population. The median household income in these counties is 73,170. In affluent counties, 91 percent of adults have a high school diploma and 37 percent have college degrees. The poverty rate is 9 percent and only 7 percent of residents rely on food stamps. About a quarter of the families with children are single parent households. Only 5 percent of young adults aged 16-19 are neither studying nor working. The crime rate is low and the unemployment rate is below the national average.

Some 582 out of 3,067 counties can be classified as poor. They are home to 23 million people, or 7 percent of the US population. The median household income is 34,302; less than half that of the affluent counties. A quarter of adult residents in these counties lack a high school diploma and only 12 percent have college degrees. More than a quarter of residents fall below the poverty line and 27 percent rely on food stamps for survival. Some 41 percent of families with children are single parent households and 13 percent of young adults are neither studying nor working. The crime rate is high and the unemployment rate is above the national average.

The vast of bulk of US counties, 74 percent, are neither affluent nor poor. They contain 212 million people, almost exactly two-thirds of the US population. Of these 2,254 counties, 1,177 are near-poor. They are home to 78 million people, or 25 percent of the population. On almost any socioeconomic indicator, these counties are closer to the poor counties than the affluent ones.

Finally, there are 1,077 moderately affluent counties in Middle America. This is where the middling bulk of the US population—42 percent—lives. They are home to 134 million people, which is more than the population of Japan or Mexico. There is a significant gap in incomes and college graduation rates between moderately affluent and affluent counties. But on other socioeconomic indicators, they are not far apart.

Although affluent counties are sprinkled throughout the country, coastal United States is home to all multi-county clusters of mass affluence. A vast zone of affluence stretches across the northeastern seaboard, from the suburbs of DC all the up to Vermont.

Eastern zone of affluence

Inside this eastern zone of affluence there are two major clusters. One is centered around New York City. It is the richest, most populous cluster of counties in the United States. The City’s per capita income is nearly a hundred and sixty thousand dollars.

NYC zone of affluence

The second is centered on Washington, DC. The two suburban counties of Fairfax and Prince William are brown because GeoFRED does not have data on them. Both are easily affluent. According to the 2010 census, the median household incomes of Fairfax and Prince William counties were 105,416 and 91,098 respectively.

DC zone of affluence

The Western zone of affluence is centered on San Francisco and comparable in affluence to the DC area. It obeys the same distance decay law that characterizes the eastern zones of affluence: The closer one gets to the leading city the more affluent the area. Note that Marin County has a higher per capita income than San Francisco itself. Both have per capita incomes in six figures—a property shared by only 13 counties in the entire United States.

Western zone of affluence

On to the other side of the ledger. There are some counties in the Western United States with high poverty rates. But these counties are sparsely populated. Because they are geographically large, national maps provide a misleading picture to the naked eye. The exception is the cluster of high poverty rate counties in Arizona and New Mexico. At the center of the cluster of three dark-hued counties that visually dominate the map is Apache County, Arizona. (The narrow strip that runs north-south along the Arizona-New Mexico border.) Only 10 percent of Apache residents have a college degree; 26 percent don’t even have a high school diploma. Some 37 percent of residents are below the poverty line and rely on food stamps. Per capita income in the county is just shy of thirty thousand dollars. Nearly half the families with children are single parent households. An astonishing 55 percent of county residents have a credit score below 660, meaning that they are considered subprime.

Western poverty

Big multi-county clusters of widespread poverty are concentrated in the southeastern United States. There is a vast poverty belt stretching across the Deep South and another big cluster in Appalachia. You can walk a thousand miles from Texas to the eastern seaboard—say from Marion County, TX, to McIntosh County, GA—without stepping foot in any county with a poverty rate below 20 percent.

Eastern poverty

Kentucky has its own zone of wrenching poverty centered at Owsley County. In the map it is the one in the northern cluster of dark counties (where the poverty rate is more than 30 percent) that is surrounded on all sides by other dark counties. Here, 38 percent of the residents fall below the poverty line. The median household income is a mere 23,047. Only 11 percent of adults are college graduates and 41 percent lack a high school diploma. An astounding 55 percent of county residents rely on food stamps.

We have only scratched the surface of regional socioeconomic polarization in the United States. I will report again when I have more substantial results.

World Affairs

An Irresistible Opportunity


A cruise missile fired off from a US Navy ship to strike an airbase in Syria.

Despite the consensus in the agendasetting media, we do not yet know whether the Assad regime was behind the chemical weapons attack in the rebel held town of Khan Sheikhoun of Idlib province on Tuesday. There is ample evidence that nerve agents—probably sarin—caused the death of dozens and injured hundreds. It’s also clear that the Syrian Air Force bombed the town at the same time. What we don’t yet know is whether the chemical agents released were part of the payload dropped by the regime’s bombers as Western powers have alleged, or whether the bombs stuck a rebel weapons depot containing chemical weapons as Russia has claimed. It’s not out of the realm of possibility that Assad would blatantly test the new administration in this manner. What is clear is that it was not in Assad’s interest to be caught red-handed just as the White House was signaling that it was not interested in getting rid of him.

The White House went for the strike because the chemical attack and the media’s reaction to it made it irresistible. After all, what was there to lose? Trump could distinguish himself from the previous occupant of the White House and provide succor both to much of his ‘America: Fuck Yeah!’ support base as well as the liberal hawks who occupy the center of Washington foreign policymaking. Indeed, in elite foreign policy circles, the chemical attack was very much seen as an opportunity that doesn’t come often and must be seized; what the President of the Council on Foreign Relations called “a rare second chance.”

As the story gained traction in the media, it became apparent to the administration that a symbolic action like a barrage of cruise missile strikes would be a big propaganda win for President Trump. Before the attack and even for a short while afterwards, the line from the White House was that ousting Assad was the last thing on the agenda. Whether that’s still the case is unclear. Secretary of State Rex Tillerson said in the aftermath of the strike that “steps are underway” to get rid of Assad. So is the administration seeking to oust Assad? No one knows; possibly not even the President himself.

The fundamental challenge of any strategy to oust Assad remains unchanged. There is no viable replacement for Assad. The rebellion is composed largely of Salafi jihadist outfits like al Nusra (since rebranded) and Ahrar al Sham. The United States could try to impose a moderate warlord as the leader of a post-Assad Syria. But that is unlikely to carry water in either rebel-held towns or regime-friendly cities. The only way a foreign power can impose a new regime in Syria is to occupy the country. United States armed forces could certainly pull that off—despite the Russian presence—but, politically, it would be extremely challenging for the administration to sell a large scale pacification campaign both to the foreign policy elite and to its support base.

The about-turn in Syria is part of the taming of Donald Trump by the establishment. The most important news of the week on the US foreign policy front wasn’t about Syria. Rather, it was the ouster of Steve Bannon, along with the reinstatement of the Joint Chiefs chairman and intelligence director, and the addition of the energy secretary, CIA director and UN ambassador, to the National Security Council’s principals committee. That event marks a decisive break from the amateur hour of the early Trump administration. The Bannon-Miller-Sessions wing of the administration seems to have lost a second round (after the Flynn affair) in their battle for the soul of the Trump White House on national security, to the centrist, a.k.a. liberal hegemonist, Tillerson-Mattis-McMaster wing.

Two other developments have been overshadowed by the strikes. The first is President Xi’s visit to Mar-a-Lago. That’s possibly the most important relationship of the Trump administration. If Trump thinks that shooting off some cruise missiles at a defenseless country is going to impress Xi, he is going to be surprised. Xi has more cards to play than any other nation facing the United States on the world stage. He would strongly prefer a United States bogged down in Syria. [I’ll write a full-length post on the balance of power in the Western Pacific soon.]

The second development that was overshadowed was not unrelated to the strikes. It was the reorientation of US Middle East policy in favor of the Sunni Arab autocrats. The strikes themselves are bound to have warmed the heart of Mohammad bin Salman, Saudi Arabia’s aggressive, young, de-facto leader. They also cap a remarkable couple of weeks in which Tillerson lifted the ban on fighter jet sales to Bahrain, a Saudi dependency; promised precision weapons to Saudi Arabia for its terror campaign in Yemen; and embraced Sisi, Egypt’s strongman. These developments are, of course, entirely congruent with the new hardline policy on Syria. The Policy Tensor had imagined that Nick Burns would run Clinton’s foreign policy when she became president. Now it seems that Tillerson is implementing Burns’ agenda for him.

What is really striking is that, despite expectations to the contrary, the United States is stumbling into a major confrontation with Russia. Whether this is the result of Trump’s financial ties to the oil monarchies or the lure of an easy win on the home front is not entirely clear. What is clear is that Trump is entangling the United States in a secondary theatre even as he meets with the real challenge to US primacy. Xi could not help but be pleased with the developments of the last few days.


Mirror Mirror on the Wall: Asset Prices and Wall Street

Before I became a geometer and after I studied economics, I worked as a pricing actuary for a reinsurance firm. Insurance companies aggressively market their products and in the process accumulate more risk than they can stomach. They offload this risk to heavily-capitalized reinsurance firms whose entire business is to bear such tail risks and for which they get compensated in the form of ceded premium. The job of the pricing actuary is easy: Compute the expected loss and add on a risk premium for the value-at-risk. Value-at-risk is the largest loss you would have to bear, say, once in a hundred years. Reinsurance pricing is relatively straightforward because the underlying shocks are exogenous and independent of each other. Because you are insuring only against acts of God, the probabilities are relatively stable. It’s all quite tame.

Contrast that to the untamed gyrations of the market. In sharp contrast to the reinsurance industry, shocks to asset prices are endogenous and highly correlated. It is dramatically harder to price risky assets than bundles of insurance policies. Not coincidently it is also much more interesting.

For about a year now, my professional research has focused on asset pricing and macrofinance. I’ve written about financial cycles before. In this post, I’ll summarize my findings on asset pricing for the layperson. All the technical details can be found in my recent paper. I’m a strong believer in the notion that unless you can explain your ideas in plain English, either you don’t understand them yourself or you are peddling snake oil. So in what follows, I’ll try to explain in a clear and straightforward manner precisely what I have figured out.

My intellectual wanderings have convinced me that every single discipline is organized around a single powerful idea—a master key that unlocks the field. The master key that makes asset prices intelligible is systematic risk.

Modern finance began when the focus moved away from stocks to portfolios. The fundamental insight of modern finance is that investors are not compensated for holding idiosyncratic risk; they are compensated for holding only systematic risk. Idiosyncratic risk is the risk that a particular asset will lose value. Such risks are easily diversifiable. Simply by holding a portfolio with a large enough number of assets, an investor can reduce the threat posed by any particular asset to her balance sheet virtually down to zero. If there was any compensation for holding idiosyncratic risk, it would be immediately bid away by diversified investors for whom the risk is as good as nonexistent.

The defining feature of systematic risk is that it is hard to diversify away. For instance, if the market as a whole were to decline, you would feel the pain no matter how diversified a portfolio of stocks you hold. The Capital Asset Pricing Model says that that’s all there is to it: The only systematic risk is market risk. Things are not so simple, of course. The Capital Asset Pricing Model provides a rather poor explanation of asset prices.

More generally, an asset pricing model tells you what constitutes systematic risk. It is quite literally a list of risk factors. The sensitivity of a portfolio’s returns to a risk factor is called the portfolio’s factor beta. The expected return on a portfolio (in excess of the risk-free rate) is then simply the sum of the betas multiplied by the risk premiums on the factors. Your portfolio’s factor beta is your exposure to that risk for which your compensation is the risk premium on that factor. You earn exactly the risk premium on a factor if your portfolio’s beta for that factor is 1 and all other factor betas of your portfolio are 0.

The workhorse asset pricing model is that of Kenneth French and Eugene Fama from the 1990s. They have two risk factors besides market risk. The first is the difference between returns on stocks with low market capitalization and stocks with high market capitalization. That’s the size factor. The second is the difference between returns on stocks with high relative value and stocks with low relative value; where relative value is given by ratio of the book value of the firm (what they show on their accounts) and the market value of the firm. That’s the value factor.

This 3-factor model does well in explaining stock prices, as does Carhart’s 4-factor model; also from the 1990s. Carhart added a fourth factor, momentum, to the Fama-French 3-factor model. He builds on on the observation that stocks that perform well in a given month also do well in the following month. The momentum factor is simply the difference in the return on stocks with high prior returns and stocks with low prior returns.

These two workhorse models have been so successful that they have percolated down from academic journals to personal finance. If you have a bit of money in the bank or in your 401K, you have probably talked to an investment advisor. (The usual advice is to be aggressive if you have a long investment horizon, and play safe otherwise.) They often talk about high beta stocks (by which they mean high market beta), size stocks, value stocks, and momentum stocks. That’s all irrelevant. What matters are your portfolio’s factor betas, not the factor betas of the stocks! You should think of your portfolio not as a collection of stocks but as a bundle of factors.

The big problem with size, value, and momentum, is that it is not at all clear why they sport positive risk premiums. In other words, we do not have a theory to explain the empirical performance of these risk factors. They are, in fact, anomalies begging for explanation.

In recent years, a powerful new theory of asset prices has emerged from the wreckage of the financial crisis. It is this theory that attracted me away from my research on the geometry of black holes.

At the heart of the theory are giant Wall Street banks, referred to in the jargon as broker-dealers. These big banking firms are some of the largest financial institutions in the world. JPMorgan, for instance, has $2.5 trillion in assets.

As the financial crisis gathered pace in the fall of 2007, Tobias Adrian at the New York Fed (now at the IMF) and Hyun Song Shin at Princeton University (now at the Bank of International Settlements) started paying attention to broker-dealer leverage. What they found was striking.

Leverage is naturally countercyclical. When asset prices rise, equity rises faster than assets since liabilities are usually more or less fixed. Leverage therefore falls when assets are booming. Conversely, leverage rises when asset prices fall. This holds in the aggregate for households, non-financial companies, commercial banks, and pretty much every one else—except broker-dealers. Dealer leverage is procyclical. This is because dealers aggressively manage their balance sheets. When perceived risk is low, they increase their leverage and expand their balance sheets. When perceived risk is high, they deleverage and shrink.

In the years since that first breakthrough, the balance sheets of broker-dealers have been tied to the great mortgage credit boom, the shadow banking system, the transmission channel of monetary policy, the global transmission of US monetary policycross-border transmission of credit conditions, the yield curve and the business cycle (or more properly the business-financial cycle), and of course, asset prices.

This is quite simply the most profound revision of our picture of the global monetary, financial and economic system in decades. More on that another day. Let’s stick to the topic at hand.

What is absolutely clear is that an intermediary risk factor belongs in the pricing kernel (the vector of systematic risks). There is no disagreement that such a factor must be based on broker-dealer balance sheets (as opposed to the much broader set of financial intermediaries).

The big disagreement is on precisely what is the right measure to use as the risk factor. There are three competing groups of academics here. The first is the original group around Tobias Adrian, who argue that leverage is the right factor, that the risk posed to investors’ portfolios is that dealers could deleverage and therefore drive down asset prices. The second group, based around Zhiguo He at Chicago University, argue that the capital ratio (the reciprocal of leverage) of the holding companies that own broker-dealer firms is the right factor. This is because dealers can access internal capital markets inside their parent firms, and therefore don’t have to shed assets in bad times as long as they can ask their parents for money.

Both of these models are based on the observation that dealers are the marginal investors in asset markets. In effect, they replace the representative average investor who had hitherto played the starring role in asset pricing theory with broker-dealers. Basically, times are good when the marginal investor has high risk appetite (the marginal value of her wealth is low) and they are bad when she has low risk appetite (the marginal value of her wealth is high). Assets that do well in bad times ought to offer lower compensation to the investor than assets that do badly. The marginal value of her wealth therefore belongs in the pricing kernel.

The third group is a circle of one centered around yours truly. I argue that except for the interdealer markets—which are important as funding markets but not as markets for risky assets—both non-dealer risk arbitrageurs (basically all other big fish in the market) and dealers are simultaneously marginal investors. For the business of broker-dealers is to make markets. That is, dealers quote a two-sided market and absorb the resulting order flow on their own books. Importantly, dealers provide leverage to risk arbitrageurs by letting them trade on margin. Balance sheet capacity is the risk-bearing capacity of the dealers with system-wide implications. It goes up with both dealer equity and dealer leverage. When balance sheet capacity is plentiful, risk arbitrageurs can easily take risky leveraged positions to bid away excess returns. Conversely, when balance sheet capacity is scare, risk arbitrageurs cannot obtain all the leverage they want and therefore find it harder to bid away excess returns.

What this implies is that even if dealers were not marginal investors, their balance sheet capacity but not their leverage, still ought to belong in the pricing kernel. And if dealer leverage is tamed as it has by financial repression since the crisis, fluctuations in balance sheet capacity would still whipsaw asset markets. Balance sheet capacity is like the weather; it affects everyone. Of course, what matters is not the absolute size but the relative size of balance sheet capacity. I therefore define my intermediary risk factor to be the ratio of the total assets of the broker-dealer sector to the total assets of the household sector.

The first thing I show, of course, is that my intermediary risk factor is priced in the cross-section of expected stock excess returns. That is to say: Stocks with high intermediary factor betas have higher expected excess returns than stocks with low intermediary betas. Remarkably, a 2-factor model with my intermediary factor and market as risk factors explains half the cross-sectional variation in expected excess returns and sports a mean absolute pricing error of only 0.3 percent. The 4-factor Carhart model with market, size, value and momentum as risk factors, can explain a greater portion of the cross-sectional variation but it has a much higher mean absolute pricing error of 1.9 percent. (The mean absolute pricing error is a much more important measure than the percentage of variation explained.) In fact, I have shown that no benchmark multifactor model is competitive with my parsimonious intermediary model.


What I do next is to extract the time-variation of the premiums on the risk factors using a dynamic pricing model. First, behold the intermediary risk premium (see chart). What I love about this chart is the sheer intelligibility of the fluctuations. You can literally see the financial booms of the late-1990s and the mid-2000s when the premium gets extraordinarily compressed. The intermediary premium contains macroeconomic information: It predicts US recessions (the dark bands) and is manifestly correlated with the business-financial cycle. Indeed, I show in the paper that it is both contemporaneously correlated with, and predicts 1 quarter ahead, US GDP growth.


There is clearly an important cyclical component in the intermediary risk premium. I isolate it using a bandpass filter that assigns fluctuations to the frequency at which they appear. The visuals are compelling. The lows of the cyclical component of the intermediary premium line up nearly perfectly with US recessions.


None of the benchmark premiums share these properties. In fact, their confidence intervals almost always straddle the X axis, meaning that they are not even statistically distinguishable from zero.


Here’s the money shot. The intermediary premium dwarfs the premiums on the benchmark factors. It appears to be at least thrice as great in amplitude as the benchmark premiums.


Lastly, I show that a portfolio that tracks my intermediary risk factor has dramatically higher returns than benchmark factor portfolios. Over the past fifty years, the market portfolio has returned 6% above the risk-free rate. Size and value portfolios have done worse. The momentum portfolio has done better. It has returned 8% above the risk-free rate. Meanwhile, the intermediary portfolio has returned 14% above the risk-free rate. Yet, its volatility is lower than either the market portfolio or the momentum portfolio! The Sharpe ratio (the ratio of a portfolio’s mean excess return to its volatility) of the intermediary portfolio is in a class of its own. It is twice as high as that of the momentum portfolio, thrice as high as that of market and value, and almost four times as high as that of the size portfolio. If there was ever going to be a compelling reason for investment professionals to start paying attention to balance sheet capacity, this is it.

Market Size Value Mom Intermediary factor
Mean excess return (annual) 6.5% 3.2% 4.4% 8.4% 14.0%
Mean excess return (qtrly) 1.6% 0.8% 1.1% 2.0% 3.3%
Volatility (qtrly) 8.4% 5.6% 5.7% 7.6% 6.0%
Sharpe ratio 18.8% 14.4% 19.2% 27.0% 56.0%

The implications of my work for macrofinance and investment strategies are interesting. But what is really interesting is what this tells us about the nature of the modern financial and economic system.

You are welcome to read and comment on my research paper here


Theory of Primary State Formation

A ‘primary state’ or ‘pristine state’ is a first-generation state that evolves without contact with any preexisting states. The evolution of secondary states is strongly influenced by existing states. In particular, nonstate societies are always at risk of being conquered by neighboring states; they can emulate established states; and they can borrow techniques and know-how from preexisting states. All secondary state formation thus takes place in the context of preexisting states. In order to understand how states emerged in the first place, it is therefore important to restrict attention to primary states. We are only certain about six cases of primary state formation: Hierakonpolis in Upper Egypt, Uruk in Mesopotamia, Mohenjodaro in the Indus Valley, the Erlitou state in the Yiluo Basin in China, the Zapotec state in Mesoamerica, and the Moche state in the Andes. The earliest ones—in Mesopotamia and Egypt—emerged in the fourth millennium BCE. But before we examine primary state formation, we have to briefly review what came before.


Locations where primary state formation took place.

Fifty thousand years ago, behaviorally modern humans burst forth from Africa into Eurasia. By the end of the Pleistocene, they had eliminated archaic humans who had hitherto occupied the Eurasian landmass; and populated Northern Europe, Siberia, Australia and the Western Hemisphere—regions that had hitherto been devoid of people.[1] At this stage in human social evolution, societies were remarkably similar across the globe. Everywhere, people lived together in small, mobile bands—with no more than a few dozen individuals—of unspecialized hunter-gatherers. All practiced shamanism—abstract religious beliefs would have to wait until the Axial Age. There was no political authority to speak of. Leadership was not inherited but acquired. ‘Big men’ sometimes exercised coercion and leadership—but there was no ‘office’ of the chief that would have to be filled if the big man died or fell out of favor with the community. Not only were there no rulers, there was no class structure. For tens of thousands of years, human society was thoroughly egalitarian. Conflict between neighboring bands took the form of raids; there were no wars of conquest and subjugation.

The Neolithic Revolution witnessed the advent of permanent settlements, farming and animal husbandry. With agrarian wealth came social stratification. Social rank became hereditary. Big men increasingly hailed from the ranks of the elite. However, village communities retained their autonomy for a long time. The decisive breakthrough came with supravillage integration—the establishment of chiefdoms.

A chiefdom is defined as a centralized regional polity where authority is permanently centralized in the ‘office’ of the chief, which exists apart from the man who occupies it and is passed down from one generation to the next. Chiefdoms usually have populations in the thousands. There is a lot of variation among chiefdoms. Simple chiefdoms have just two levels of hierarchy (a small number of villages controlled by a center). Complex chiefdoms have three levels (villages clustered around towns controlled by a city.) A paramount chiefdom in an exceptionally powerful chiefdom that has subordinated others.


A paramount chiefdom is an exceptionally powerful chiefdom that has subordinated others.

While both chiefdoms and states feature centralized coercive authority, chiefly authority is non-bureaucratic—all authority rests in the office of the chief. In contrast, states possess internally specialized administrative organization—authority is partially delegated to administrators, tax collectors, police, judges, military commanders and so on.

While all primary states emerged from chiefdoms, it is wrong to think of the chiefdom as a political form that would naturally evolve into the state if left to its own devices. Indeed, only a few ever made the phase transition; the vast majority of chiefdoms did not.

The central question of primary state formation then is: Why, and under what conditions, did some chiefdoms make the transition to statehood?

The reason that the distinction between chiefdoms and states is important is because chiefdoms cannot be scaled up whereas states can and often did. Why can’t chiefdoms be scaled up? Wright (1977) argued that because authority in a chiefdom is not differentiated, any delegation of authority approaches total delegation; a situation ripe with potential for insubordination, insurrection, or fission. It is in the chief’s vital interest to avoid delegating authority, which means that he has to rule his entire domain from the center. As a consequence, there is an effective spatial limit to the territorial expansion of a chiefdom determined by the distance the chief, or the chief’s representative, could go from the center to the periphery of the domain and back on the same day.

The ruler of a state on the other hand, can dispatch subordinates—whose authority has been defined narrowly enough—to locations far from the capital to manage local affairs with little risk of insurrection. The delegation of partial authority thus allows the state to expand its territory well beyond the spatial limits associated with chiefdoms. Moreover, the optimal strategy for a state ruler is to divide and segment authority as much as possible and delegate wholeheartedly so as to minimize the likelihood of insurrection by subordinates.

The question of primary state formation then boils down to this: Given that it was in the vital interest of the chiefs to avoid delegating authority, why were some compelled to do so anyway and under what conditions did they succeed?

Spencer (1987) suggested that if a chief seeks to implement a new strategy of internal administrative specialization, the chances of success will be enhanced if the shift is made quickly and extensively. Spencer (2010) proposed a ‘territorial-expansion model’ whose basic idea is that territorial expansion is an essential, integral part of the process of primary state formation: Without territorial expansion beyond the spatial limit of a chiefdom there is no incentive for the chief to delegate partial authority and expansion beyond the spatial limit of a chiefdom is impossible without such delegation.

[Simultaneous internal specialization and expansion] will help ensure that the new parcels of authority are defined narrowly enough so that no dispatched administrative assistant in the new order enjoys sufficiently broad authority to foment a successful insurrection. From this perspective, we would expect an evolutionary transition from chiefdom to state to be marked by a qualitative shift in administrative principles and associated optimal regulatory strategies, representing a profoundly transformational process of change.

When we apply the territorial-expansion model to the empirical record of primary state formation, we should expect to find a close correspondence in time between the appearance of state institutions and a dramatic expansion of political-economic territory. This expectation, it should be noted, runs counter to the conventional idea that the territorial expansion of state control is a phenomenon that typically occurs well after the initial formation of the state, during what is sometimes called an “imperial” phase of development.

Spencer (2010) marshals impressive archaeological evidence to show that in all six known cases of primary state formation, the emergence of the primary state was concurrent with territorial expansion beyond the home region.

This is a very promising theory. But it raises important questions: Why did most chiefdoms fail to make this phase transition? Why did primary state formation take place in only densely populated regions? The short answer is that fear is a more important driver of primary state formation than greed. It was the struggle for survival with rival chiefdoms that compelled some chiefs to split the atom of chiefly power.

Redmond and Spencer (2012) argue that high levels of inter-polity competition provided the impetuous to rulers of paramount chiefdoms to develop the internally specialized administration of the state. They examine two paramount chiefdoms on the threshold of state formation of comparable size and complexity. The two chiefdoms differed markedly in one critical aspect of their inception: One was relatively isolated while the other was surrounded by rival chiefdoms.

…inter-polity competition was the key factor accounting for Monte Albán’s successful transition from complex chiefdom to [the Zapotec] state, as opposed to Cahokia’s short-lived attempt to cross that threshold. In Oaxaca, the presence of powerful rivals, less than a day’s travel to the south and east, placed a premium on effective administration and military prowess. Monte Albán was able to vanquish some of its rivals in short order, though others managed to resist Monte Albán’s expansionist designs for a considerable time before they too capitulated. To prevail in such a competitive context, Monte Albán had to develop a powerful military as well as an internally specialized administration that was capable of delegating partial authority to subordinate officials who implemented the strategies and policies of the central leadership. The leadership of Cahokia, by contrast, did not have to contend with such daunting rivals. As a consequence, there was relatively less pressure to experiment with the kinds of military and administrative innovations that might have led to the successful transition to statehood in the American Bottom.

Charles Tilly’s dictum regarding the formation of European national states—war made the state and the state made war—is equally valid for pristine states. The Wright-Spencer-Redmond theory of primary state formation explains precisely how war made the state.


[1] Characteristic of modern behavior was figurative art such as cave paintings; ornamentation using pigment and jewelry; the practice of burial; fishing; composite tools such as bows and arrows, darts, harpoons and axes; the use of bone, antler and hide; the invention of rope, fish hook and the eyed-needle; and, of course, blades manufactured from flint. This Great Leap Forward in human culture was likely the result of a single genetic mutation that conferred an innate capacity for complex language and abstract thought.


Wright, Henry T. “Recent Research on the Origin of the State.” Annual Review of Anthropology 6 (1977): 379-397.

Spencer, Charles S. “A mathematical model of primary state formation.” Cultural Dynamics 10.1 (1998): 5-20.

Spencer, Charles S. “Territorial expansion and primary state formation.” Proceedings of the National Academy of Sciences 107.16 (2010): 7119-7126.

Redmond, Elsa M., and Charles S. Spencer. “Chiefdoms at the threshold: The competitive origins of the primary state.” Journal of Anthropological Archaeology 31.1 (2012): 22-37.


Fed Independence, Trump Reflation, and the Primacy of the Dollar

This is part of an ongoing conversation with Adam Tooze.

There is a tension at the heart of US political economy. President Trump wants to reindustrialize America and create jobs for US workers. To that end, he has promised both big tax cuts and a huge investment program. A big fiscal shock is coming.

He has also attacked US’ trade partners for suppressing their currencies. He wants the dollar to weaken against the euro, the yen and the yuan, so that US manufacturers can compete in global product markets.

The problem is that deficit spending at home is expected to accelerate inflation, which would prompt to the Fed to hike faster, which in turn would strengthen the dollar. As part of the Trump reflation trade, the dollar has already strengthened in anticipation.


Dollar Index. Source: Bloomberg.

There are only two possible scenarios that could allow the Trump White House to square the circle. The first scenario is one where both the US macroeconomy and the Fed oblige. That is, US inflation could fail to accelerate despite the fiscal shock and the Fed could hold fire waiting to see the whites of inflation’s eyes. The first component wouldn’t be altogether surprising given that US inflation is driven not by domestic slack but by global slack. But given what we know about the Fed’s reaction function, the second—a dovish, patient Fed—is quite unlikely.

The second scenario is one where the Fed’s independence is compromised by Washington. With the resignation of Daniel Tarullo, Trump can now appoint three of the seven governors of the Fed immediately. (Monetary policy is decided not by the Board but by the FOMC which consists of the seven on the Board and five regional Fed presidents, always including the president of the New York Fed.) Yellen’s term also ends on Feb 3, 2018; at which point Trump could replace her with a lackey. In short, it is not inconceivable to see the Fed revert back to control by political masters.

The second scenario is not as likely as it appears either—despite the clear interest of the Trump White House and the opportunity to pack the Fed with Trump appointees. This is because the Senate has to confirm the appointments. While the Senate Republicans are not as crazy about Ayn Rand as those in the House, it is hard to see them falling behind a policy of packing the Fed with doves. In other words, the balance of power in Congress points in the opposite direction as the White House. If Trump succeeds in this endeavor, it would likely be with the support of Democrats in the Senate. And they would demand their own pound of flesh.

There is another reason to doubt the reflationary scenario. The Fed’s independence—secured by Volcker’s coup in 1979—has served the interests of  Wall Street well. Since the Trump administration is packed with Goldmanites, it is difficult to see them supporting an attack on the Fed’s independence. To be more precise, it is not clear that the big banking firms would pursue their long-term interests (and resist attacks on the Fed’s independence) or their short term interests (which would be well served by the steep yield curve attending a Trump reflation).

A related issue is that of financial deregulation. It is amply clear that the Trump White House and the Republican Congress are going to unshackle Wall Street. This solves at least one problem while risking another. The former is the global shortage of safe assets. Deregulation of Wall Street banks would allow them to expand balance sheet capacity and intermediate dollars to lend offshore via FX swaps. The latter is the risk of financial stability. As Tooze notes, unshackling dealer balance sheets may unleash a new, unsustainable credit boom.

There is of course an entirely different possibility suggested by McHenry’s letter to Yellen and more generally, the strength of the Ayn Rand fanatics in Congress. Namely: Congressional hawks could prevail in the battle for political control over the Fed and make it even more hawkish and its reaction function more formulaic by law (by demanding say that the Fed justify deviations from the Taylor Rule). That would doom any possibility of a great boom in real activity.

Tooze’s original discussion centered not on the political economy of the Federal Reserve per se but the impact of Trump’s economic nationalism on the dollar’s role as the hard currency of choice globally. Tooze mentions three areas of conflict:

 (1) [T]he tension between the dollar’s reserve role and the desire of the Trump administration to boost exports by increasing American “competitiveness” and talking down the dollar; (2) tensions around global bank regulation; (3) nationalist backlash from Trumpites in Congress against the role of the Fed as a global liquidity supplier, by way principally of the swap lines.

As for (1), it is not clear that dollar strength is a required to sustain the dollar’s role as the reserve currency. The dollar has been both stronger and weaker than it is now (or is expected to be) without compromising the willingness of other central banks to hold US dollars.


It is even less clear how (2) could work against the dollar’s preeminence. If US banking firms are subjected to less onerous regulations then the dollar’s share of international funding—already at 75 percent—would tend to increase with the balance sheet capacity of US banking firms.

A nationalist backlash against the Fed’s international activities as envisioned by (3) could potentially backfire on the dollar. In particular, if the Fed is forced to close down it’s swap lines the other hard currency issuing central banks would look for alternatives, which could result in solutions that undermine the dollar’s position. For instance, they could denominate their settlements in euros—a scenario that would be consistent with a major unraveling of the transatlantic alliance.

The Policy Tensor contends that the real threat to the dollar’s hegemony is the possibility of a global trade war which would usher in a more nationalized and a more regionalized world. A global trade war between the United States and China, not the victory of Marine Le Pen, is the most important political risk to financial markets of 2017.

The dollar’s role is due to a mutually-reinforcing combination of America’s command of the global maritime commons, the liquidity of US dollar funding markets, the depth of US capital markets, the network externalities of currency denomination (for invoicing and payments), the stability and predictability of US institutions, and the sheer weight of the United States in the world economy. It is mighty hard, even for an administration led by Donald Trump, to undermine the dollar’s role in the global monetary, financial and economic system. The closest competitor is the euro. And for the euro to merely bid for the dollar’s role in the world economy would take dramatic changes in the political economy of Europe and an outright breakdown of the Western alliance. We may possibly be headed in that direction but we are nowhere close to that scenario.