Yellen Let the Cat Out of the Bag


The Fed has failed to deliver on its inflation target consistently.

So in the Q & A after her speech, Yellen spelled out why the FOMC expects inflation to “return” to target. It was a remarkably honest admission. This is from the FOMC Press Conference on June 14, 2017.

Yellen: [The neutral rate] is hard to pin down; especially given the fact that the so-called Phillips curve appears to be quite flat—that means that inflation doesn’t respond very much or very quickly to movements in unemployment. Nevertheless, that relationship, I believe, remains at work.

Yellen is wrong. As I have argued previously, the ‘second unbundling’ has transformed the inflation process in the core of the world economy such that global slack drives inflation; not domestic slack. More recently, Auer, Borio and Filardo (2017) have shown that the intensity of participation in global value chains explains the time-variation and the international cross-sectional variation in the strength of that relationship. They have thus tied the mutation of the inflation process directly to Baldwin’s ‘second unbundling’. How long do we have to wait before the FOMC catches up with the BIS?


How the affluent get $500 billion in tax giveaways each year

After I finished writing this post, I found these very useful tables here. They help triangulate the class structure and are more helpful at the beginning of my piece than at the end. 

Screen Shot 2017-06-12 at 3.34.01 AM

Screen Shot 2017-06-12 at 3.36.31 AM

BIG GAINS IN THE NEOLIBERAL ERA have largely been cornered by the wealthy. We may even have been underestimating the true wealth of the ultrarich by a factor of 2 because the truly wealthy hide nearly half their wealth in offshore secrecy jurisdictions. I have hitherto emphasized paying attention to the very top of the distribution—the 0.1 percent—because the concentration of material resources in the hands of oligarchs is a significant threat to democracy. Indeed, ultraconservative billionaires are behind the insurgency that used to be called the GOP.

But that is not the end of the story. The main driver of regional polarization is the two-tier polarization of the US economy into a high-productivity tradable sector that accounts for just 2 percent of new jobs but a third of new value added and a low-productivity non-tradable sector. Affluent, college-educated workers have managed to corner all the wage gains over the past generation due to their near-monopoly on jobs in the former. The rest have seen their incomes stagnate and their share of the national pie shrink.

Figure 1 displays the share of pretax income of the top 1 percent of income earners. We see that vertical income polarization has returned to levels last achieved in the roaring twenties. But affluence in America goes deeper. Not everyone in the 99 percent has lost ground in the neoliberal era. Those in the top 20 percent—roughly speaking, families earning six-figure incomes—have done relatively well. Figure 2 shows the share of the next 19 percent of income earners. We see that their share of the national pie has increased slowly but steadily over the past two generations.


Rational public finance would see the government lean against the inequity of the market. Through progressive taxation, tax credits, subsidies, and spending targeted at less fortunate families and regions, fiscal policy ought to be used to ensure a more equitable distribution of the national economic pie. Congress can’t stop talking about making life easier for hard-working Americans. In reality, as we shall see, tax giveaways largely benefit the affluent.


Figure 3. US non-discretionary federal budget breakdown.

Figure 3 presents a top-level breakdown of the non-discretionary federal budget. Medicare, social security, and the military consume two-thirds of the US budget. But the largest component by far, accounting for a third of the budget, is “tax expenditures”—technical jargon for tax credits, subsidies, and other giveaways that is fiscal spending in all but name.

In the current fiscal year, tax expenditures account for nearly $1.5 trillion. The biggest of these giveaways is exclusion of employer contributions for medical insurance premiums and medical care, which will cost the public purse $2.7 trillion over 2016-25, according to the US Treasury. Preferential treatment of unearned capital gains (which are taxed at 15 percent instead of the 35 percent charged on earned income) will cost a cool $1 trillion over the same period. Exclusion of imputed rental income will cost $1.2 trillion, and the mortgage interest deduction will cost $950 billion. (All these numbers are from here.)

Who benefits from these giveaways? Figure 4 shows the distribution of the beneficiaries. The affluent, the top 20 percent of income earners, get 51 percent of all tax expenditures. The rest is split regressively among the lower classes.


Figure 4. Shares of tax expenditures by income quintile.

If we drill down further, we find that some of these giveaways are much less regressive than others. For instance, one-half of the $66 billion earned income tax credit goes to the lowest quintile and 95 percent of the $59 billion child tax credit goes to the bottom 80 percent. The employer-sponsored health insurance exclusion is somewhere in the middle. Two-thirds of this supermassive $258 billion giveaway goes to the bottom 80 percent. Pension contribution, capital gains, local taxes and the mortgage interest deduction are much more regressive. Some 94 percent of the $83 billion capital gains giveaway ends up in the top quintile, as does two-thirds of the $140 billion pension contribution exclusion.


Figure 5. Distribution of selected tax breaks.

Figures 5 and 6 drills down into the most regressive giveaways. We have not included the $17 billion carried interest giveaway to ultrarich hedge fund managers and other such long-running scandals. But the picture that emerges is not pretty. The top quintile gets the vast bulk of the giveaways for capital gains, state and local taxes, mortgage interest, charitable contributions and capital gains exemption at death. All in all, the top quintile cornered $446 billion of the $873 billion given away in 2013, according to the Congressional Budget Office.


Figure 6. Percentage shares of quintiles for selected tax giveaways.

Beyond the regressive distributional impact, these giveaways distort incentives and harm the economy in various ways. For instance, Weicher notes that the mortgage interest deduction (MID),

…encourages taxpayers to pay for homes with debt rather than with cash or financial assets, causes wasteful and unproductive misallocation of physical and financial capital, and distributes benefits disproportionately to upper income households. Furthermore, the MID results in less economic productivity, reduced labor mobility and greater unemployment, depressed real wages, and a lower standard of living. The MID is so damaging to the economy that nearly every economist believes that “the most sure-fire way to improve the competitiveness of the American economy is to repeal the mortgage interest deduction.”

A truly progressive politics will have to take on not just the rich but the affluent as well.


Why did the United States invade Iraq?


Bush’s decision to depose Saddam has always perplexed the Policy Tensor. I have previously argued that US policy with respect to Iraq after 1990 was inconsistent with foreign policy realism; that, during the 1990s, US foreign policy was guided by the rogue states doctrine that served as the justification for the forward-deployment of US forces around the globe (and defense spending high enough to allow for garrisoning the planet after the threat from the Soviet Union vanished into thin air) by inflating the threat posed by confrontation states; that Saddam became a poster child of the rogues’ gallery that the foreign policy elite in Washington said they were determined to contain; and that the US policy consensus on the threat posed by the person of Saddam Hussein meant that Saddam was most at risk from a revisionist policy innovation in Washington.

So when George Bush went about “searching for monsters to destroy,” Saddam was the most tempting target. The consensus in policy circles against Saddam explains why the US invaded Iraq and not Cuba, North Korea, Iran or Libya; or any other confrontation state that could, more or less convincingly, be framed as a “rogue state”, “outlaw state”, “backlash state”, or “Weapon State” (as Krauthammer put it in Foreign Affairs). In short, Bush was following the path of least resistance when he chose to overthrow President Hussein.

But why did Bush want to depose Saddam in the first place? The US veto on potential rivals’ access to gulf energy was already secured by the United States’ impregnable maritime power in the region. That is, with or without a friendly regime in Baghdad, the US could deny any challenger access to gulf energy simply using its overwhelming maritime power.

Moreover, almost any conceivable US national interest could’ve been more easily secured by bringing Saddam in from the cold. If Bush wanted his friends in the oil industry to benefit from access to Iraqi oil, Saddam could easily have been brought in from the cold on that very condition. If Bush wanted to ramp up Iraqi oil production to lower oil prices (and perhaps undermine Saudi Arabia’s position as the swing producer and its hold on OPEC), the easiest way to do that would’ve been to allow Western oil firms to invest in Iraqi production capacity. If Bush wanted an Iraqi regime that was a geopolitical ally of the United States and Israel, even that was within the realm of possibility with Saddam at the helm in Baghdad.

Suppose that for whatever reason it was impossible for the United States to work with Saddam. Then, any conceivable US interest would’ve been better served by replacing the regime led by Saddam Hussein with a more compliant military junta. For a democratic regime in Baghdad was ethno-demographically guaranteed to fall within the orbit of Iran.

In fact, what I found after combing through the archives of the 1990s was that US-Iraq relations had been personalized to an extraordinary degree and that there was an overwhelming consensus in the foreign policy establishment that the ideal scenario would be a military coup by a more accommodative general. The idea was that if Saddam were deposed by a more accommodative general, we would get the best of both worlds. An iron-fisted junta would provide stability in the sense that Iraq would serve as a bulwark against Iran and keep a lid on both ethnic nationalism (Shia, Sunni and Kurdish) and salafi jihadism. And a more accommodative leadership in Iraq would remove Iraq from the ranks of the confrontation states and thereby enhance the security, power and influence of the United States and its regional allies.

These considerations explain why, after kicking Saddam’s army out of Kuwait, Bush’s dad left Saddam in power and watched from the sidelines as Saddam crushed the Iraqi intifada. Bush Senior later explained the decision in his book that he coauthored with Scowcroft:

While we hoped that a popular revolt or coup would topple Saddam, neither the United States nor the countries of the region wished to see the breakup of the Iraqi state. We were concerned about the long-term balance of power at the head of the Gulf. Breaking up the Iraqi state would pose its own destabilizing problems.

The core of the Bush revolution in foreign policy was the decision to break with this policy consensus. Specifically, Bush Jr’s policy innovation was to overthrow Saddam Hussein without replacing him with a more accommodative military junta. What possible US interest could be served by that policy? What did principals in the Bush administration hope to accomplish? What was their grand-strategy? I think I finally have an answer.

My interpretation builds on the findings and arguments of a large number of scholars. For the sake of conciseness, I’ll focus exclusively on the excellent anthology edited by Jane Cramer and Trevor ThrallWhy Did the United States Invade Iraq? In what follows, I’ll summarize their findings before presenting my interpretation. All quotes that follow are from this book unless otherwise specified.

Cramer and Thrall argue that the core foreign policy principals in the Bush administration were President Bush, Vice President Dick Cheney and Defence Secretary Donald Rumsfeld. It’s plausible to imagine that they came under the sway of neocons and to the neocons’ well-known strategy of regime change in Iraq in the heightened threat environment after 9/11. But that story is inconsistent with the facts.

The record indicates they did not even make a decision after 9/ 11; they apparently had already made up their minds so they did not need to deliberate or debate. Instead they discussed war preparations and strategies for convincing the public and Congress, with no planning for how to make democracy take shape in Iraq.

Cheney played an extraordinary role in the administration. In particular, he handpicked almost all the neocon hawks who led the drumbeat to war:

Cheney helped appoint thirteen of the eighteen members of the Project for the New American Century.… Cheney lobbied strongly for one open advocate of regime change—Donald Rumsfeld—who was appointed to be Secretary of Defense. And then, Cheney and Rumsfeld together appointed perhaps the most famous advocate for overthrowing Saddam Hussein in order to create a democracy in Iraq, Paul Wolfowitz, as Undersecretary of Defense. Cheney created a powerful dual position for Scooter Libby …John Bolton as special assistant Undersecretary of State for Arms Control and International Security; David Wurmser as Bolton’s chief assistant; Robert Zoellick as US Trade Representative; and Zalmay Khalilzad as head of the Pentagon transition team…. [Cheney appointed] Elliot Abrams, Douglas Feith, Richard Perle and Abram Schulsky.

But, Cramer and Thrall argue, quite convincingly in my opinion, that “the neoconservatives and the Israel lobby were “used” to publicly sell the invasion, while the plans and priorities of the neoconservatives were sidelined during the war by the top Bush leaders.”

The State Department and the oil industry were becoming increasingly alarmed about the neoconservatives’ oil plan and Chalabi’s open advocacy for it. In the eyes of the mainstream oil industry, an aggressive oil grab by the United States might lead to a destabilization of the oil market and a delegitimizing of the Iraq invasion. This was argued in an independent report put out on January 23, 2003 by the Council on Foreign Relations and the Baker Institute entitled Guiding Principles for U.S. Post-Conflict Policy in Iraq. The report cautioned against taking direct control of Iraqi oil, saying, “A heavy American hand will only convince them (the Iraqis), and the rest of the world, that the operation in Iraq was carried out for imperialist rather than disarmament reasons. It is in American interests to discourage surch misperceptions….

The State Department plan triumphed over the neoconservatives’ plan, and this helps demonstrate that Cheney, Rumsfeld and Bush did not allow the neoconservatives and the Israel lobby to dominate US foreign policy even from the inception of the invasion. In fact, Bush appointed Phillip Carroll, the former chief of Shell Oil, to oversee the Iraqi oil business. Carroll executed much of the oil industries’ preferred plans for Iraqi oil. Revealingly, when L. Paul Bremer, the head of the Coalition Provisional Authority, ordered the de-Ba’athification of all government ministries in Iraq, Carroll refused to comply with Bremer’s order because removing the Ba’athist oil technocrats would have hindered the Iraqi oil business. In the end, the Baker plan (aligned with US oil industry interests) was implemented in its entirety. The US official policy was to use Production Sharing Agreements (PSAs) that legally left the ownership of the oil in Iraqi government hands while attempting to ensure new long-term multinational oil corporation profits.

On de-Ba’athification, on Chalabi, on Iraqi participation in OPEC, on the privatization of Iraqi oil, on bombing Iran and Syria, on threatening Saudi Arabia or giving the Saudis access to advanced weaponry, the administration went counter to the neoconservatives’ proposals and policy desiderata. In fact, the “neoconservatives realized that they had been used to sell the war publicly but were marginalized when it came to the creation of Middle East policy. In 2006 prominent neoconservatives broke with the administration and resoundingly attacked Bush’s policies.”

So, if the neoconservative vision of an expanding zone of democratic peace was not the motivation for the invasion, what was? “Cheney, Rumsfeld and Bush,” Cramer and Thrall argue, “were US primacists and not realists.”

Cheney authorized Paul Wolfowitz to manage a group project to write up a new Defense Planning Guidance (DPG) drafted by various authors throughout the Pentagon in full consultation with the Chairman of the Joint Chiefs of Staff, Colin Powell (Burr 2008). The DPG was leaked to the New York Times on March 7, 1992 (Tyler 1992). The radical plan caused a political firestorm as it called for US military primacy over every strategic region on the planet.

The draft DPG leaked in 1992 was widely perceived as a radical neoconservative document that was not endorsed by the high officials in the George H. W. Bush administration. Dick Cheney sought to distance himself from the document publicly while heartily endorsing it privately. Pentagon spokesman Pete Williams claimed that Cheney and Wolfowitz had not read it. Numerous other Pentagon officials stepped forward to say that the report represented the views of one man: Paul Wolfowitz. The campaign to scapegoat Wolfowitz for the unpopular plan was successful and the press dubbed the DPG as the “Wolfowitz Doctrine.” However, recently released classified documents show that the document was based on Powell’s “base force” plan and was drafted with the full consultation of Cheney and many other high Pentagon officials (Burr 2008). In the days after the leak, Wolfowitz and others worried that the plan would be dropped altogether. But in spite of the controversy, Cheney was very happy with the document, telling Zalmay Khalilzad, one of the main authors, “You have discovered a new rationale for our role in the world.”

Cramer and Thrall conclude:

We think a gradual consensus is forming among scholars of the war that Cheney, and to a lesser degree Rumsfeld, were the primary individuals whom Bush trusted. These three leaders together shared the desire to forcefully remove Saddam Hussein, they made the decision, and they made the key appointments of the talented advisers who crafted the arguments to sell the war to the American people. We have shown that President Bush was a zealous participant in the decision to invade, but he was likely not a primary architect to the extent the much more seasoned Cheney and Rumsfeld were. We find that the recently released documents proving intentional intelligence manipulation (especially from the British Iraq Inquiry, see Chapter 9), combined with the long career paths of Cheney and Rumsfeld and the actions of these top leaders before and after 9/ 11, belie the perception that the administration was swept up by events and acted out of misguided notions of imminent threats, Iraqi connections to Al Qaeda, or crusading idealism. The United States did not emotionally stumble into war because of 9/ 11. On the contrary, the top leaders took a calculated risk to achieve their goals of US primacy, including proving the effectiveness of the revolution in military affairs, and strengthening the power of the president.

The Policy Tensor agrees with the characterization of principals in the Bush administration as primacists. The problem is that invading Iraq does not follow from the grand-strategy of primacy. The primacists’ argument is straightforward and indeed compelling. The idea is that it was in the US interest to prolong unipolarity as long as possible and that required an active policy to prevent the reemergence of a peer competitor. As the authors of the Defence Planning Guidance put it in 1992,

Our first objective is to prevent the re-emergence of a new rival, either on the territory of the former Soviet Union or elsewhere, that poses a threat on the order of that posed formerly by the Soviet Union. This is a dominant consideration underlying the new regional defense strategy and requires that we endeavor to prevent any hostile power from dominating a region whose resources would, under consolidated control, be sufficient to generate global power. These regions include Western Europe, East Asia, the territory of the former Soviet Union, and Southwest Asia.

Separately, in combination, or even in an alliance with a near-peer, the so-called rogue states were never (and never would be) in a position to pose “a threat on the order of that posed formerly by the Soviet Union.” The combined GDP of the “rogue states”—Iraq, Iran, Libya, North Korea, and Cuba—never exceeded that of California, Texas, or New York. Even if Saddam conquered the Arabian peninsula and consolidated control over its oil resources, he would be in no position to “generate global power.” In any case, the unipole could quite easily deter an Iraqi invasion of the Arabian peninsula.

Even a nuclear-armed Iraq would be in no position to impose its will on US protectorates in the region, much less on the United States itself. Those who argue that a nuclear-armed Iraq or Iran cannot be deterred simply don’t understand the logic of nuclear deterrence. If Saddam has successfully acquired a nuclear deterrent, the United States would not have been able to invade and occupy Iraq. But the Iraqi deterrent would have been useless for the purposes of aggression, conquest, or regional domination. Had he retaken Kuwait, the United States would still have been able to kick him out simply because he would’ve been in no position to threaten the use of nuclear weapons against US forces for then he would be making the incredible threat of suicide to hold on to his conquests. Put more formally, extended deterrence is hard enough for the unipole; it is well-nigh impossible for a regional power like Iraq under Saddam.

If the United States under Bush had acted in accordance with the grand-strategy of primacy, she would have cared little about minor confrontation states and much more about actual potential rivals. In particular, the United States would have tried hard to thwart the emergence of a peer in the two extremities of eurasia. A more aggressive strategy to maintain primacy would see the United States not only preventing the consolidation of either of these two regions under a single power, but also undermining the growth rate of the only power that has the potential to become a peer of the United States without conquering a strategically important region. That is, if the Bush administration had followed the grand-strategy of primacy, it would’ve blocked China’s admission into the WTO, and more generally, prevented China’s emergence as the workshop of the world. That would’ve prolonged US primacy with much more certainty than the destruction of the entire rogues’ gallery.

So what was the grand-strategy that made the decision to invade intelligible?

Jonathan Cook has argued for a much more radical proposal in Israel and the Clash of Civilizations. He argues that it was in the Israeli interest to have its regional rivals disappear from the ranks of the confrontation states and be broken up into statelets that would not pose any significant threats to Israel’s regional primacy; and that the Israelis managed to convince principals in the Bush administration of the merits of their revisionist agenda for the region:

I propose a different model for understanding the [Bush] Administration’s wilful pursuit of catastrophic goals in the Middle East, one that incorporates many of the assumptions of both the Chomsky and Walt-Mearsheimer positions. I argue that Israel persuaded the US neocons that their respective goals (Israeli regional dominance and US control of oil) were related and compatible ends. As we shall see, Israel’s military establishment started developing an ambitious vision of Israel as a small empire in the Middle East more than two decades ago. It then sought a sponsor in Washington to help it realise its vision, and found one in the neocons. (p. 91.)

Yinon’s argument that Israel should encourage discord and feuding within states – destabilising them and encouraging them to break up into smaller units – was more compelling [than Sharon’s status-quo, state-centric vision of Israeli regional primacy]. Tribal and sectarian groups could be turned once again into rivals, competing for limited resources and too busy fighting each other to mount effective challenges to Israeli or US power. Also, Israeli alliances with non-Arab and non-Muslim groups such as Christians, Kurds and the Druze could be cultivated without the limitations imposed on joint activity by existing state structures. In this scenario, the US and Israel could manipulate groups by awarding favours – arms, training, oil remittances – to those who were prepared to cooperate while conversely weakening those who resisted. (p. 118.)

Israel and the neocons knew from the outset that invading Iraq and overthrowing its dictator would unleash sectarian violence on an unprecedented scale – and that they wanted this outcome. In a policy paper in late 1996, shortly after the publication of A Clean Break, the key neocon architects of the occupation of Iraq – David Wurmser, Richard Perle and Douglas Feith – predicted the chaos that would follow an invasion. ‘The residual unity of the [Iraqi] nation is an illusion projected by the extreme repression of the state’, they advised. After Saddam Hussein’s fall, Iraq would ‘be ripped apart by the politics of warlords, tribes, clans, sects, and key families.’ (p.133.)

I think Cook is mistaken about the importance of Israeli influence but he is onto something. Even if Israel managed to persuade principals in the Bush administration, there is no evidence to suggest that the Israel lobby, or even the neocons more generally (the lines between the two are blurred), had decisive influence over the Bush administration’s Middle East policy. (My position here is congruent with Cramer and Thrall’s). But what is clear is the frame of reference in which smashing Israel’s rivals would be in the US interest.

More precisely, I think principals in the Bush administration figured that Israel was nearly guaranteed to be a strong ally of the United States is a difficult region. After the reorientation of Egypt (mid-1970s) and the Islamic revolution (1979), three regional poles prevented total US-Israeli domination of the Middle East: Iran, Iraq and Syria. Smashing these confrontation states would guarantee Israel’s regional primacy and therefore, I think principals in the Bush administration reckoned, further the US interest in more easily dominating the region in a permanent alliance with its junior geopolitical ally. In other words, the grand-strategy of the Bush administration was to remove, by threats or by the use of force, Israel’s regional rivals in the Middle East.

They hoped to overthrow or cow into submission, the regimes of Iraq, Iran and Syria; and thereby establish unchallenged US-Israeli supremacy in the Middle East. What I am saying is that the United States’ grand-strategy was based on an ill-informed regional variant of offensive realism—one whose logic was conditional on a permanent alliance with a regional power—as opposed to the global and unconditional variant of offensive realism assumed by the grand-strategy of primacy (as put forward, say, by Mearsheimer).

It is clear that regional primacy was in the Israeli interest. It’s a bit of stretch to argue that it was it was also in the US interest. The problem is that, military primacy or not, Israel simply does not have that much influence in the region. Because it is a pariah in the Middle East, few actors try to seek its patronage (the Kurds are the main exception); most look to Iran, Saudi Arabia, or global powers. It is nearly impossible for Israel to play the role formerly played by Iran under the Shan or Egypt under Nasser. The United States has no choice but to work with other regional powers (Egypt, Turkey, Syria, Saudi Arabia and Iran) to work out regional problems. Moreover, from the perspective of a global power trying to minimize the costs of ensuring stability in a multipolar region of strategic significance, a balance of power is considerably more attractive than the precarious primacy of a pariah; perhaps even one guaranteed to be a permanent ally.

But the fundamental flaw of the grand-strategy pursued by the Bush administration was not the conflation of US and Israeli interest. (It can be argued, after all, that since Israel was basically guaranteed to be a permanent ally, Israeli regional primacy was squarely in the US interest.) No, the fundamental flaw of the revisionist strategy was the outright dismissal of the costs of the ensuing instability. No matter how far the prewar consensus was from foreign policy realism, at least the unbounded costs of regional instability were understood. When Bush broke with the consensus and smashed the Iraqi state, he clearly did not appreciate just how bad things could get.

Breaking up confrontation states into ethnic statelets and zones of weakness may sound like a splendid idea to half-baked geopolitical analysts. But instability and weakness are a source of insecurity, not power; as both the United States and Israel have since discovered.

To wrap up: The grand-strategy pursued by the United States when it invaded Iraq was to smash the regional poles that acted as confrontation states in the Middle East, whose removal from the equation promised unchallenged US-Israeli supremacy in this strategically-relevant region. Principals in the Bush administration simply did not appreciate the unbounded costs of the regional instability that would ensue.


Balance Sheet Capacity and the Price of Crude

I’ve written before about the macrofinancial importance of broker-dealers (a.k.a. Wall Street banks). I emphasized the key role played by dealers in the so-called shadow banking system and have shown that fluctuations in balance sheet capacity explain the cross-section of stock excess returns. I have also argued for a monetary-financial explanation of the commodities rout. In this post, I will show that fluctuations in dealer balance sheet capacity also explain fluctuations in the price of crude.

The evidence can be read off Figure 1. Recessions are shown as dark bands. The top-left plot shows the real price of crude for reference. The spikes in the 1970s correspond to the oil price shocks in 1973 and 1979. Note the price collapse in 1986 and the price shock that attended the Iraqi occupation of Kuwait (the spike in the 1990 recession). Note also the extraordinary run-up in the price of crude during the 2000s boom and the return of China-driven triple digit prices after the great recession. Finally, note the dramatic oil price collapse in 2014 due to the US fracking revolution. We know that much of the fluctuation in the oil price was a result of geopolitical, supply-side and exogenous demand-side factors. My claim is that much of the rest is driven by the excess elasticity of the financial intermediary sector.


Figure 1. Source: Haver Analytics, author’s calculations.

Specifically, I show that fluctuations in the balance sheet capacity of US securities broker-dealers predict fluctuations in the oil price. We define balance sheet capacity as the log of the ratio of aggregate financial assets of broker-dealers to the aggregate financial assets of US households. We stochastically detrend the quarterly series by subtracting the trailing 4-quarter moving average from the original series. The plot on the top-right displays the stochastically detrended balance sheet capacity. We will show that it predicts 1-quarter ahead excess returns on crude.

We run 30-quarter rolling regressions of the form,

{R^{crude}_{t+1}=\alpha+\beta\times capacity_{t}+\varepsilon_{t+1}}, \qquad (1)

where {R^{crude}_{t+1}} is the return on Brent in quarter {t+1} in excess of the risk-free rate and {capacity_{t}} is the shock to balance sheet capacity in quarter {t}. We must take care to interpret rolling regressions because instead of two parameters suggested by equation (1), we are in effect running 183 regressions with different parameters.

The plot on the bottom right displays the percentage of variation explained in each predictive regression. We see that balance sheet capacity became a significant predictor of the price of crude in the mid-1980s. It’s predictive capability diminished in the mid-1990s, before gaining new heights in the 2000s. The period 1999-2007 was the heydey of financially-driven fluctuations in the price of crude. That relationship collapsed in the second quarter of 2007. During the financial crisis and the period of postcrisis financial repression, the relationship disappeared entirely. It only recovers at the very end of our sample in 2016.

The bottom-left plot in Figure 1 displays a signed measure of the influence of balance sheet capacity on the price of crude. We display the product of the slope coefficient in equation (1) with one minus its p-value. This measure kills three birds with one stone. We can (a) keep track of the sign of the slope coefficients (to see whether or not it reverses direction too much), (b) get an additional handle on the time-variation of the strength of the predictive relationship, and (c) control the noise by attenuating the slope coefficients in inverse proportion to their statistical significance. Note that we have reversed the direction of the Y axis in the plot on the bottom-left.

The slope and significance metric tells a story that is very similar to the one told by the percentage of variation explained. Moreover, we can see that the relationship is economically large and negative. The interpretation is that positive shocks to balance sheet capacity compress the risk premium embedded in the price of crude. When balance sheet capacity is plentiful, risk arbitrageurs (speculators who make risky bets) bid away expected excess returns. Conversely, when balance sheet capacity is scarce, risk arbitrageurs are constrained in the amount of leverage they can obtain from their dealers and are therefore compelled to leave expected excess returns on the table.

The main result above—that dealer balance sheet growth predicts returns on crude oil—was originally obtained by Erkko Etula for his doctoral dissertation at Harvard. 


Silicon Valley’s Visions of Absolute Power


Omnipotence is in front of us, almost within our reach…

Yuval Noah Harari

The word “disrupt” only appears thrice in Yuval Noah Harari’s Homo Deus: A Brief History of Tomorrow. That fact cannot save the book from being thrown into the Silicon Valley Kool-Aid wastebasket.

Hariri is an entertaining writer. There are plenty of anecdotes that stroke the imagination. There is the one about vampire bats loaning blood to each other. Then there’s the memorable quip from Woody Allen: Asked if he hoped to live forever through the silver screen, Allen replied, “I don’t want to achieve immortality through my work. I want to achieve it by not dying.” The book is littered with such clever yarns interspersed with sweeping, evidence-free claims. Many begin with “to the best of our knowledge” or some version thereof. Like this zinger: “To the best our knowledge, cats are able to imagine only things that actually exist in the world, like mice.” Umm, no, we don’t know that. Such fraudulent claims about scientific knowledge plague the book and undermine the author’s credibility. And they just don’t stop coming.

“To the best of our scientific understanding, the universe is a blind and purposeless process, full of sound and fury but signifying nothing.” How does one even pose this question scientifically?

“To the best of our knowledge” behaviorally modern humans’ decisive advantage over others was that they could exercise “flexible cooperation with countless number of strangers.” Unfortunately for the theory, modern humans eliminated their competitors well before any large-scale organization. During the Great Leap Forward—what’s technically called the Upper Paleolithic Revolution when we spread across the globe and eliminated all competition—mankind lived in small bands. There was virtually no “cooperation with countless strangers.” The reason why we prevailed everywhere and against every foe was because we had language, which allowed for unprecedented coordination within small bands. Harari seems completely unaware of the role of language in the ascent of modern humans. He claims that as people “spread into different lands and climates they lost touch with one another…” Umm, how exactly were modern humans in touch with each other across the vast expanse of Africa?

“To the best of our scientific understanding, determinism and randomness have divided the entire cake between them, leaving not a crumb for ‘freedom’…. Free will exists only in the imaginary stories we humans have invented.” Here, Harari takes one of the hardest open problems and pretends that science has an answer. The truth is much more sobering. Not only is there no scientific consensus on the matter of free will and consciousness, it would be disturbing if there were, since we have failed to develop the conceptual framework to attack the problem in the first place.

“According to the theory of evolution, all the choices animals make –whether of residence, food or mates – reflect their genetic code.… [I]f an animal freely chooses what to eat and with whom to mate, then natural selection is left with nothing to work with.” Nonsense. The theory of evolution, whether in the original or in its modern formulations, is entirely compatible with free will. Natural selection operates statistically and inter-generationally over populations, not on specific individuals. It leaves ample room for free will.

There are eleven chapters in the book. All the sweeping generalizations and hand-waving of the first ten chapters are merely a prelude to the final chapter. Here, Harari goes on the hard sell.

Dataism considers living organisms to be mere “biochemical algorithms” and “promises to provide the scientific holy grail that has eluded us for centuries: a single overarching theory that unifies all scientific disciplines….”

“You may not agree with the idea that organisms are algorithms” but “you should know that this is current scientific dogma…”

“Science is converging on an all-encompassing dogma, which says that organisms are algorithms, and life is data processing.”

“…capitalism won the Cold War because distributed data processing works better than centralized data processing, at least in periods of accelerating technological changes.”

“When Columbus first hooked up the Eurasian net to the American net, only a few bits of data could cross the ocean each year…”

“Intelligence is decoupling from consciousness” and “non-conscious but highly intelligent algorithms may soon know us better than we know ourselves.”

No, the current scientific dogma isn’t that organisms are algorithms. Nor is science converging on an all-encompassing dogma that says that life is data processing. Lack of incentives for innovation in the Warsaw Pact played a greater role in the outcome of the Cold War than the information-gathering deficiencies of centralized planning. When Columbus first “hooked up the Eurasian net to the American net,” much more than a few bits of data crossed the ocean. For instance, the epidemiological unification of the two worlds annihilated much of the New World population in short order.

There are more fundamental issues with Dataism, or more accurately, Data Supremacism. First, data is simply not enough. Without theory, it is impossible to make inferences from data, big or small. Think of the turkey. All year long, the turkey thinks that the human would feed and take care of it. Indeed, every day the evidence keeps piling up that humans want to protect the turkey. Then comes Thanksgiving.

Second, the data itself is not independent of reference frames. This is manifest in modern physics; in particular, in both relativity and quantum physics. What we observe critically depends on our choice of reference frame. For instance, if Alice and Bob measure a spatially-separated (more precisely, spacelike separated) pair of entangled particles, their observations may or may not be correlated depending on the axes onto which they project the quantum state. This is not an issue of decoherence. It is in principle impossible to extract information stored in a qubit without knowledge of the right reference frame. To go a step further, Kent (1999) has shown that observers can mask their communication from an eavesdropper (called Eve, obviously) if she doesn’t share their reference frame. Even more damningly, reference frames are a form of unspeakable information—information that, unlike other classical information, cannot be encoded into bits to be stored on media and transmitted on data links.

Third and most importantly, we do not have the luxury of assuming that an open problem will be solved at all, much less that it will be solved by a particular approach within a specific time-frame. This is a major source of radical uncertainty that is never going to go away. Think about cancer research. Big data and powerful new data science tools make the researchers’ jobs easier. But they cannot guarantee their success.

The main contribution of my doctoral thesis was solving the problem of reference frame alignment for observers trying to communicate in the vicinity of a black hole. The problem has no general solution. I exploited the locally-measurable symmetries of the spacetime to solve the problem. Observers located in the vicinity of a black hole can use my solution to communicate. If they don’t know my solution or don’t want to use it, they need to discover another solution that works. They cannot communicate otherwise. This is just one of countless examples where data plays at best a secondary role in solving concrete problems.

Empirical data is clearly very important for solving scientific, technical, economic, social, and psychological problems. But data is never enough. Much more is needed. Specifically, solving an open problem often requires a reformulation of the problem. That is, it often requires an entirely new theory. We don’t know yet if AI will ever be able to make the leap from calculator to theoretician. We cannot simply assume that they will be able to do so. They may run into insurmountable problems for which no solution may ever be found. However, if and when they do, there is no reason why humans should not be able to comprehend an AI’s theories. More powerful theories turn out to be simpler after all. And if and when that happens, the Policy Tensor for one would welcome our AI overlords.

Harari makes a big fuss about algorithms knowing you better than yourself. “Liberalism will collapse the day the system knows me better than I know myself.” Well, my weighing machine “knows” my weight better than I do. What difference does it make if an AI could tell me I really and truly have a 92 percent change of having a successful marriage with Mary and only 42 percent with Jane? Assuming that the AI knows me better than I do, why would I treat it any differently from my BMI calculator that insists that I am testing the upper bound of normality? After all, I also agree that the BMI calculator is more accurate than my subjective judgment about my fitness as the AI is about my love life.

Artificial Intelligence without consciousness is just a really fancy weighing machine. And data science is just a fancy version of simple linear regression. Why would Liberalism collapse if Silicon Valley delivers on its promises on AI? Won’t we double-down on the right to choose precisely because we can calibrate our choices much better?

If AI gain consciousness on the other hand, all bets are off. Whether as an existential threat or as a beneficial disruption, the arrival of the first Super AI will be an inflection point in human history. The arrival of advanced aliens poses similar risks to human civilization.

If you are interested in the potential of AI, you’re better off reading Nick Bostrom’s Superintelligence: Paths, Dangers, Strategies. If you are curious about scientific progress and our technological future in deep time as well as the primacy of theory, you should read David Deutsch’s The Beginning of Infinity: Explanations That Transform the World. If you are more interested in the unification of the sciences, look no further than Peter Watson’s Convergence: The Idea at the Heart of Science. (Although I do recommend Watson’s The Modern Mind, The German Genius, and The Great Divide more and in that order.) Finally, for the limits of scientific and technical advance, see John D. Barrow’s Impossibility: The Limits of Science and the Science of Limits.

Silicon Valley’s Kool-Aid encompasses long-term visions of both techno-utopias and techno-dystopias. The unifying fantasy is that, in the long run, technological advance will endow man and/or AI with absolute power. In the utopias, men become gods and mankind conquers the galaxy; and in much more ambitious versions, the entire universe itself. (It would be orders of magnitude harder to reach other galaxies than other stars.) In the more common dystopias, man won’t be able to compete with AI, or the elite will but the commoners won’t (this is Harari’s version). In either case, the Valley’s Kool-Aid is that technology will revolutionize human life and endow some—depending on the narrative: Silicon Valley, tech firms, AIs, the rich, all humans, or AI and humans—with god-like powers. Needless to say, this technology will come out of Silicon Valley.

In reality, a small oligopoly of what Farhad Manjoo calls the Frightful Five (Facebook, Google, Apple, Microsoft and Amazon) have cornered unprecedented market power; and stashed their oligopolistic supernormal profits overseas, just to rub it in your face. Apple alone has an untaxed $216 billion parked offshore. Far from obeying the motto “data wants to be free,” these oligopolistic firms hoard your data and sell it to the highest bidder. The dream of tech start-ups is no longer a unicorn IPO. Rather, it is a buyout by one of the oligopolists. If you are a truly successful firm in the Valley, you have either benefited from network externalities (like the Frightful Five which are all platforms with natural economies of scale), or you have managed to shed costs onto the shoulders of people who would’ve hitherto been your employees or customers (like Airbnb, Uber and so on). Silicon Valley is, in fact, more neoliberal than Wall Street. While the Street has managed to shed risks and costs to the state, the Valley has managed to shed risks and costs to employees and customers. That’s basically the Valley’s business model.

Alongside its hoard of financial resources, the Valley has also cornered an impressive amount of goodwill in the popular consciousness. Who does not admire Google and Apple? This goodwill is the result of the industry’s actual accomplishments; some of them genuine, some thrust upon them by fate. In the popular imaginary, the Valley is the source of innovation and dynamism; to be celebrated not decried. Yet, the concentration of power in the industry has started to worry the best informed. If mass technological unemployment does come to pass, the Valley should not be surprised to find itself a pariah and a target of virulent populism, in the manner of Wall Street in 2009.


Causal Inference from Linear Models

For the past few decades, empirical research has shunned all talk of causation. Scholars use their causal intuitions but they only ever talk about correlation. Smoking is “associated to” cancer, being overweight is “correlated with” higher morbidity rates, college education is the strongest “correlate of” Trump’s vote gains over Romney, and so on and so forth. Empirical researchers don’t like to use causal language because they think that causal concepts are not well-defined. It is a hegemonic postulate of modern statistics and econometrics that all falsifiable claims can be stated in the language of modern probability. Any talk of causation is frowned upon because causal claims simply cannot be cast in the language of probability. For instance, there is no way to state in the language of probability that smoking causes cancer, that the tides are caused by the moon or that rain causes the lawn to get wet.

Unfortunately, or rather fortunately, the hegemonic postulate happens to be untrue. Recent developments in causality—a sub-discipline of philosophy—by Judea Pearl and others, have made it possible to talk about causality with mathematical precision and use causal models in practice. We’ll come back to causal inference and show how to do it in practice after a brief digression on theory.

Theories isolate a portion of reality for study. When we say that Nature is intelligible, we mean that it is possible to discover Nature’s mechanisms theoretically (and perhaps empirically). For instance, the tilting of the earth on its axis is the cause of the seasons. It’s why the northern and southern hemispheres have opposite seasons. We don’t know that from perfect correlation of the tilting and the seasons because correlation does not imply causation (and in any case they are not perfectly correlated). We could, of course, be wrong, but we think that this is a ‘good theory’ in the sense that it is parsimonious and hard-to-vary—it is impossible to fiddle with the theory without destroying it. [This argument is due to David Deutsch.] In fact, we find this theory so compelling that we don’t even subject it to empirical falsification.

Yes, it is impossible to derive causal inference from the data with absolute certainty. This is because, without theory, causal inference from data is impossible, and theories on their part can only ever be falsified; never proven. Causal inference from data is only possible if the data analyst is willing to entertain theories. The strongest causal claims a scholar can possibly make take the form: “Researchers who accept the qualitative premises of my theory are compelled by the data to accept the quantitative conclusion that the causal effect of X on Y is such and such.”

We can talk about causality with mathematical precision because, under fairly mild regularity conditions, any consistent set of causal claims can be represented faithfully as causal diagrams which are well-defined mathematical objects. A causal diagram is a directed graph with a node for every variable and directed edges or arrows denoting causal influence from one variable to another, e.g., {X\longrightarrow Y} which says that Y is caused by X where, say, X is smoking and Y is lung cancer.

The closest thing to causal analysis in contemporary social science are structural equation models. In order to illustrate the graphical method for causal inference, we’ll restrict attention to a particularly simple class of structural equation models, that of linear models. The results hold for nonlinear and even nonparametric models. We’ll work only with linear models not only because they are ubiquitous but also for pedagogical reasons. Our goal is to teach rank-and-file researchers how to use the graphical method to draw causal inferences from data. We’ll show when and how structural linear models can be identified. In particular, you’ll learn which variables you should and shouldn’t control for in order to isolate the causal effect of X on Y. For someone with basic undergraduate level training in statistics and probability it should take no more than a day’s work. So bring out your pencil and notebook.

A note on attribution: What follows is largely from Judea Pearl’s work on causal inference. Some of the results may be due to other scholars. There is a lot more to causal inference than what you will encounter below. Again, my goal here is purely pedagogical. I want you, a rank-and-file researcher, to start using this method as soon as you are done with the exercises at the end of this lecture. (Yes, I’m going to assign you homework!)

Consider the simple linear model,

{\large Y := \beta X + \varepsilon }

where {\varepsilon} is a standard normal random variable independent of X. This equation is structural in the sense that Y is a deterministic function of X and {\varepsilon} but neither X nor {\varepsilon} is a function of Y. In other words, we assume that Nature chooses X and {\varepsilon} independently, and Y takes values in obedience to the mathematical law above. This is why we use the asymmetric symbol “:=” instead of the symmetric “=” for structural equations.

We can embed this structural model into the simplest causal graph {X\longrightarrow Y} , where the arrow indicates the causal influence of X on Y . We have suppressed the dependence of Y on the error {\varepsilon}. The full graph reads {X\longrightarrow Y \dashleftarrow\varepsilon}, where the dotted lines denotes the influence of unobserved variables captured by our error term. The path coefficient associated to the link {X\longrightarrow Y} is {\beta}, the structural parameter of the simple linear model. A structural model is said to be identified if the structural parameters can in principle be estimated from the joint distribution of the observed variables. We will show presently that under our assumptions the model is indeed identified and the path coefficient {\beta} is equal to the slope of the regression equation,


where {\rho_{YX}} is the correlation between X and Y and {\sigma_{X}} and {\sigma_{Y}} are the standard deviations of X and Y respectively.  {r_{YX}} can be estimated from sample data with the usual techniques, say, ordinary least squares (OLS).

What allows straightforward identification in the base case is the assumption that X and {\varepsilon} are independent. If X and {\varepsilon} are dependent then the model cannot be identified. Why? Because in this case there is spurious correlation between X and Y that propagates along the “backdoor path” {X\dashleftarrow\varepsilon\dashrightarrow Y}. See Figure 1.


Figure 1. Identification of the simple linear model.

Here’s what we can do if X and {\varepsilon} are dependent. We simply find another observed variable that is a causal “parent” of X (i.e., {Z\longrightarrow X} ) but independent of {\varepsilon}. Then we can use it as an instrumental variable to identify the model. This is because there is no backdoor path between Y and Z (which identifies {\alpha\beta} ) and X and Z (which identifies {\alpha}). See Figure 2.


Figure 2. Identification with an instrumental variable.

In that case, {\beta}  is given by the instrumental variable formula,


More generally, in order to identify the causal influence of X on Y in a graph G, we need to block all spurious correlation between X and Y. This can be achieved by controlling for the right set of covariates (or controls) Z. We’ll come to that presently. First, some graph terminology.

A directed graph is a set of vertices together with arrows between them (some of whom may be bidirected). A path is simply a sequence of connected links, e.g., {i\dashrightarrow m\leftrightarrow j\dashleftarrow k} is a path between i and k. A directed path is one where every node has arrows that point in one direction, e.g., {i\longrightarrow j\leftrightarrow m\longrightarrow k} is a directed path from i to k. A directed acyclic graph is a directed graph that does not admit closed directed paths. That is, a directed graph is acyclic if there are no directed paths from a node back to itself.

A causal subgraph of the form {i\longrightarrow m\longrightarrow j} is called a chain and corresponds to a mediating or intervening variable m between i and j. A subgraph of the form {i\longleftarrow m\longrightarrow j} is called a fork, and denotes a situation where the variables i and j have a common cause m. A subgraph of the form {i\longrightarrow m\longleftarrow j} is called an inverted fork and corresponds to a common effect. In a chain {i\longrightarrow m\longrightarrow j} or a fork {i\longleftarrow m\longrightarrow j}, i and j are marginally dependent but conditionally independent (where we condition on m). In an inverted fork {i\longrightarrow m\longleftarrow j} on the other hand, i and j are marginally independent but conditionally dependent (once we condition on m). We use family connections to talk in short hand about directed graphs. In the graph {i\longrightarrow j}, i is the parent and j is the child. The descendants of i are all nodes that can be reached by a directed path starting at i. Similarly, the predecessors of j are all nodes from which j can be reached by directed paths.

Definition (Blocking). A path p is blocked by a set of nodes Z if and only if p contains at least one arrow-emitting node that is in Z or p contains at least one inverted fork that is outside Z and has no descendant in Z. A set of nodes Z is said to block X from Y, written {(X\perp Y |Z)_{G}}, if Z blocks every path from X to Y.

The logic of the definition is that the removal of the set of nodes Z completely stops the flow of information from Y to X. Consider all paths between X and Y . No information passes through an inverted fork {i \longrightarrow m\longleftarrow j} so you can ignore the paths that contain inverted forks. Likewise, no information passes through a path without an arrow-emitting node so those can also be ignored. The rest of the paths are “live” and we must choose a set of nodes Z whose removal would block the flow of all information between X and Y along these paths. Note that whether Z blocks X from Y in a causal graph G can be decided by visual inspection when the number of covariates is small, say less than a dozen. If the number of covariates is large, as in many machine learning applications, a simple algorithm can do the job.

If Z blocks X from Y in a causal graph G, then X is independent of Y given Z. That is, if Z blocks X from Y then X|Z and Y |Z are independent random variables. We can use this property to figure out precisely which covariates we ought to control for in order to isolate the causal effect of X on Y in a given structural model.

Theorem 1 (Covariate selection criteria for direct effect). Let G be any directed acyclic graph in which {\beta} is the path coefficient of the link {X\longrightarrow Y}, and let {G_{\beta}} be the graph obtained by deleting the link {X\longrightarrow Y}. If there exists a set of variables Z such that no descendant of Y belongs to Z and Z blocks X from Y in {G_{\beta}}, then {\beta} is identifiable and equal to the regression coefficient {r_{YX\cdot Z}}. Conversely, if Z does not satisfy these conditions, then {r_{YX\cdot Z}} is not a consistent estimand of {\beta}.

Theorem 1 says that the direct effect of X on Y can be identified if and only if we have a set of covariates Z that blocks all paths, confounding as well as causal, between X and Y except for the direct path {X\longrightarrow Y}. The path coefficient is then equal to the partial regression coefficient of X in the multivariate regression of Y on X and Z,

{Y =\alpha_1Z_1+\cdots+\alpha_kZ_k+\beta X+\varepsilon.}

The above equation can, of course, be estimated by OLS. Theorem 1 does not say that the model as a whole is identified. In fact, the path coefficients associated the links {Z_{i}\longrightarrow Y} that the multivariate regression above suggests, are not guaranteed to be identified. The regression model would be fully identified if Y is also independent of {Z_{i}} given {\{(Z_{j})_{j\ne i}, X\}} in G_{i} for all {i=1,\dots,k}.

What if you wanted to know the total effect of X on Y ? That is, the combined effect of X on Y both through the direct channel (i.e., the path coefficient {\beta}) and through indirect channels, e.g., {X\longrightarrow W\longrightarrow Y} ? The following theorem provides the solution.

Theorem 2 (Covariate selection criteria for total effect). Let G be any directed acyclic graph. The total effect of X on Y is identifiable if there exists a set of nodes Z such that no member of Z is a descendant of X and Z blocks X from Y in the subgraph formed by deleting from G all arrows emanating from X. The total effect of X on Y is then given by {r_{YX\cdot Z}}.

Theorem 2 ensures that, after adjustment for Z, the variables X and Y are not associated through confounding paths, which means that the regression coefficient {r_{YX\cdot Z}} is equal to the total effect. Note the difference between the two criteria. For the direct effect, we delete the link {X\longrightarrow Y} and find a set of nodes that blocks all other paths between X and Y . For the total effect, we delete all arrows emanating from X because we do not want to block any indirect causal path of X to Y.

Theorem 1 is Theorem 5.3.1 and Theorem 2 is Theorem 5.3.2 in the second edition of Judea Pearl’s book, Causality: Models, Reasoning, and Inference, where the proofs may also be found. These theorems are of extraordinary importance for empirical research. Instead of the ad-hoc and informal methods currently used by empirical researchers to choose covariates, they provide a mathematically precise criteria for covariate selection. The next few examples show how to use these criteria for a variety of causal graphs.

Figure 3 shows a simple case (top left) {Z\longrightarrow X\longrightarrow Y} where the errors of Z and Y are correlated. We obtain identification by repeated application of Theorem 1. Specifically, Z blocks X from Y in the graph obtained from deleting the link {X\longrightarrow Y} (top right). Thus, {\alpha} is identified. Similarly, Y blocks Z from X in the graph obtained from deleting the link {Z\longrightarrow X} (bottom right). Thus, {\beta} is identified.


Figure 3. Identification when a parent of X is correlated with Y.

Figure 4 shows a case where an unobserved disturbance term influences both X and Y. Here, the presence of the intervening variable Z allows for the identification of all the path coefficients. I’ve written the structural equation on the top right and checked the premises of Theorem 1 at the bottom left. Note that the path coefficient of {U\dashrightarrow X} is known to be 1 in accordance with the structural equation for X. Hence, the total effect of X on Y equals {\alpha\beta+\gamma}.


Figure 4. Model identification with an unobserved common cause.

Figure 5 presents a more complicated case where the direct effect can be identified but not the total effect. The identification of {\delta} is impossible because X and Z are spuriously correlated and there is no instrumental variable or intervening available available.


Figure 5. A more complicated case where only partial identification is possible.

If you have reached this far, I hope you have acquired a basic grasp of the graphical methods presented in this lecture. You probably feel that you still don’t really know it. This always happens when we learn a new technique or method. The only way to move from “I sorta know what this is about” to “I understand how to do this” is to sit down and work out a few examples. If you do the exercises in the homework below, you will be ready to use this powerful arsenal for live projects. Good luck!


  1. Epidemiologists argued in the early postwar period that smoking causes cancer. Big Tobacco countered that both smoking and cancer are correlated with genotype (unobserved), and hence, the effect of smoking on cancer cannot be identified. Show Big Tobacco’s argument in a directed graph. What happens if we have an intervening variable between smoking and cancer that is not causally related to genotype? Say, the accumulation of tar in lungs? What would the causal diagram look like? Prove that it is then possible to identify the causal effect of smoking on cancer. Provide an expression for the path coefficient between smoking and cancer.
  2. Obtain a thousand simulations each of two independent standard normal random variables X and Y. Set Z=X+Y. Check that X and Y are uncorrelated. Check that X|Z and Y|Z are correlated. Ask yourself if it is a good idea to control for a variable without thinking the causal relations through.
  3. Obtain a thousand simulations each of three independent standard normal random variables {u,\nu,\varepsilon}. Let {X=u+\nu} and {Y=u+\varepsilon}. Create scatter plots to check that X and Y are marginally dependent but conditionally independent (conditional on u). That is, X|u and Y|u are uncorrelated. Project Y on X using OLS. Check that the slope is significant. Then project Y on X and u. Check that the slope coefficient for X is no longer significant. Should you or should you not control for u?
  4. Using the graphical rules of causal inference, show that the causal effect of X on Y can be identified in each of the seven graphs shown in Figure 6.
  5. Using the graphical rules of causal inference, show that the causal effect of X on Y cannot be identified in each of the eight graphs in Figure 7. Provide an intuitive reason for the failure in each case.

    Figure 6. Graphs where the causal effect of X on Y can be identified.


    Figure 7. Graphs where the causal effect of X on Y cannot be identified.

    P.S. I just discovered that there is a book on this very topic, Stanley A. Mulaik’s Linear Causal Modeling with Structural Equations (2009).


Regional Polarization and Trump’s Electoral Performance

Tom Edsall suggested that I look at the regional socioeconomic correlates of Trump’s electoral performance. Why that didn’t cross my mind before I know not. But here goes. 

Political polarization in the United States means that the overwhelming best predictor of a major party presidential candidate’s electoral performance is the performance of the previous candidate of the party. This was clearly the case in this election. [All data in this post is at the county level. The socioeconomic data is from GeoFRED while the vote count is from here.]


In what follows, therefore, we will look at the correlates of Trump’s performance relative to Mitt Romney’s in 2012. This is the cleanest way to control for partisan polarization. We’re going to examine the socioeconomic indicators of counties where Trump gained vote share compared to Romney.

Specifically, we will divide the counties into five buckets: Blowout, where Trump’s vote share was 5 percent below Romney’s; Major Loss, where Trump’s vote share was between 5 and 2.5 percent below Romney’s; Moderate Loss, where his vote share was between 2.5 and at par with Romney’s; Moderate Gain, where Trump increased the GOP’s share by less than 2.5 percent; Major Gain, where he increased it by between 2.5 and 5 percent; and finally, Land Slide, where Trump gained more than 5 percent relative to Romney.

More sophisticated strategies are certainly possible. But this strategy will allow us to visualize the data cleanly.

We begin with the number of counties. This chart is no surprise to anyone who watched the results on election night. A lot more of the map was colored red than in 2012. There was a major swing in a large number of counties.


But most such counties are very sparsely populated. The most populous counties actually went for Clinton at higher rates than they had gone for Obama in 2012. These two charts illustrate the GOP’s astonishing geographic advantage.


Let’s move on to socioeconomic variables. The next two charts show the median household income and per capita incomes averaged over all the counties in each of the six buckets. Both paint a consistent picture: Trump did worse than Romney in a typical affluent county, but did better than him in poorer counties. But neither was a strong correlate of Trump’s performance. Median household income and per capita income explain only 13 percent and 10 percent of the variation in Trump’s performance relative to Romney respectively.


The percentage of college graduates on the other hand, is a very strong predictor. It explains 35 percent of the variation in Trump’s relative performance. High school diploma rate is, however, a poor predictor. Still, counties where Trump did worse than Romney typically had higher percentages of people with high school diplomas.


Trump did better than Romney in counties where poverty and unemployment rates are relatively high. Although the gradient is not constant.


Similarly, Trump did well in counties where the proportion of people relying on food stamps is high.


But his performance was uncorrelated with crime rates. On the other hand, it was correlated with youth idleness rate—the percentage of 16-19 year olds who are neither working nor employed.


Similarly, counties where Trump improved on Romney’s performance had higher percentages of families with children that are single parent households.


Finally, Trump did worse than Romney in counties with positive net migration rates and he did better in counties with negative net migration rates. This is the only dynamic variable we have in the dataset. (The others are snapshots and do not tell how things are changing in the counties.) It is therefore very interesting to find a clean correlation between net migration rates and Trump’s relative performance. The upshot is that Trump did well in places that are hemorrhaging people.


A consistent picture emerges from all these charts. Trump got to the White House by outperforming Mitt Romney is counties that are less educated, have lower incomes and higher poverty rates, where a greater proportion of people rely on food stamps, where many young adults are idle and children are growing up in broken homes. This is the America that is getting left behind. People are quite literally leaving these counties for greener pastures.

We have yet to tackle the why of it all. Why has America become so regionally polarized? Is it global trade? Automation? Skill-biased technological change? The neoliberal policy consensus? The political economy of Washington, DC? A fairly coherent narrative can be constructed along any of these threads. It is much harder to evaluate their relative importance. And even harder to devise meaningful policy solutions.

While we quietly thank our stars that Trump is getting tamed by adult supervision, we cannot go back to ignoring fly-over country. For we now know quite well what happens when we do.