Blacks constitute 12.3 percent of the US population but account for 22.4 percent of fatalities in police shootings. This is usually taken as prima facie evidence of a racial bias in police shootings. The perception is reinforced by highly publicized incidents of police brutality against blacks. The case, however, is not quite so obvious. It is well-understood that what accounts for the higher rate of black fatalities in police violence is higher encounter rates with the police rather than higher rates of being shot conditional on encounter. That is to say blacks are much more likely than whites to have encounters with law enforcement, but conditional on having had an encounter with the police, they are no more likely to be shot than whites. So the main question is why blacks face a higher incidence of encounters with law enforcement. Blacks’ higher rate of encounters with the police could be due to overpolicing — perhaps the result of institutionalized racial bias in police practices. Or it could be due to higher rates of violent crime by black offenders — perhaps the result of greater economic deprivation and/or childhood lead exposure. It is of some interest to rule out the second of these hypotheses. We show how this can be done.
In order to construct our panel dataset, we obtain data on fatal police shootings from the Washington Post. We obtain microdata on homicides from the FBI’s Uniform Crime Reporting database. We aggregate homicides by state and offender race and police shooting fatalities by state and victim race. We also obtain race-specific population size from Social Explorer/Census Bureau. All three are race-specific — this is a panel dataset not of the cross-section of states over time, but of race and state. The interracial variation orthogonal to the interstate variation allows us to estimate race fixed-effects which would be otherwise impossible from the cross-section of state means of police shooting fatalities.
Table 1 displays the share of population, homicides, and police shooting fatalities in the United States in 2017-2018. We can see that blacks account for 12.3 percent of the population, 22.4 percent of fatal police shootings, and 30.9 percent of homicide offenders. This is the nub of the problem we are trying to untangle. Note that the taxon OU includes other, unknown and unreported ethnicities. The taxon’s share of police shooting fatalities and especially homicides, is too large for us to throw it out or ignore. In order to make inferences from the data, we will have to pay careful attention to the significance of the dummy for this taxon. The fact that so many agencies do not report the race of the offender or victim makes any conclusion premature without digging further.
Table 1. Share of population, fatal police shooting victims, and homicide offenders. | ||||
White | Black | Hispanic | OU | |
Population | 60.9% | 12.3% | 17.8% | 9.1% |
Fatal police shooting victims | 45.7% | 22.4% | 17.7% | 14.2% |
Homicide offenders | 24.7% | 30.9% | 15.4% | 29.0% |
Source: Census Bureau, Washington Post, FBI, author’s computations. Data is for the United States in 2017-2018. |
We first note that, in the pooled race-blind data, we find a significant and equally strong relationship between police shooting fatalities on the one hand and population (Spearman’s rank correlation: r = 0.77, P < 0.0001) and homicides (r = 0.77, P < 0.0001) on the other. This is as it should be since both variables measure exposure to police shootings. We control for both.
We create a dataset with race-specific population size by state, homicides by offender race and state, police shooting fatalities by victim race and state, along with race dummies. We have 200 observations by state and race. We find a strong association between race-specific police shooting fatalities one the one hand and race-specific population (r = 0.79, P < 0.0001) and race-specific homicides (r = 0.69, P < 0.0001) on the other.
In order to ascertain racial bias in fatal police shootings, we fit generalized linear models (GLM) with the number of fatalities in police shootings as our response. Race-specific population size and number of homicidal offenders are measures of exposure to police shootings: the greater the population at risk and the greater the intensity of violent crime, the more police shootings we should expect. Therefore, race-specific population size and number of homicides by offender race can serve as our controls. The estimates of interest are the coefficients of the race dummies. If the second hypothesis is right, the race dummies should be insignificant or bear the wrong sign. As mentioned before, the “race” OU is essentially missing data that has to interpreted with great care.
Even greater care must be exercised in modeling. The usual linear or log-log models estimated by ordinary least squares (OLS) or maximum likelihood (ML) assume that the errors are Gaussian. This is quite unlikely to hold in the case of rare incidents like fatalities in police shootings. Indeed, when we fit linear or log linear models with either OLS or ML, we find that, using the Anderson-Darling Test, we can reject the null that the residuals are Gaussian with high confidence (P < 0.0001) in all cases. Moreover, all such models display considerable instability. It is more appropriate to assume that the error terms obey the Poisson distribution. But even that turns out to be wrong — so even mixed-effects models assuming Poisson distributions are inappropriate. In fact, as the next figure shows, our response, police shooting fatalities, obeys the negative binomial distribution. Both the Poisson and the Gaussian provide awful fits, while the negative binomial fits the data very well.
The fact that fatalities obey the negative binomial distribution greatly simplifies our modeling problem. We model police shooting fatalities via a negative binomial regression. Since MATLAB does not have the functionality to fit a negative binomial GLM, we use python instead. We use the IRLS algorithm to fit our model. A primer on negative binomial regression in python can be found here. Explicitly, our regression model is given by the following equation:
for state-race observations, and where FATAL is state-race fatalities in police shootings, POP is state-race population, HOM is state-race homicides, and RACE is a race dummy. Table 1 displays our estimates.
Table 1. Negative Binomial Model of Fatal Police Shootings. | ||||
Slope | Std Error | z-score | P | |
Intercept | 1.51 | 0.114 | 13.3 | 0.000 |
Population | 1.12 | 0.102 | 11.0 | 0.000 |
Homicide | 0.26 | 0.094 | 2.8 | 0.005 |
Black | 0.40 | 0.155 | 2.6 | 0.010 |
Hispanic | -0.17 | 0.137 | -1.2 | 0.224 |
OU | 0.19 | 0.171 | 1.1 | 0.269 |
Source: FBI, Washington Post, Census Bureau, author’s computations. Negative Binomial GLM fitted with IRLS in Python. Response is race-specific police shooting fatalities by state. N=200 state-race combinations. OU is a dummy for unknown, other, and unreported ethnicities. Non-Boolean variables have been standardized to have zero mean and unit variance. Estimates in bold are significant at the 5 percent level. |
We find that both exposure controls are significant, although race-specific population has a stronger effect on race-specific fatalities than race-specific homicides. We see that the OU dummy is insignificantly different from zero. This considerably eases our interpretive burden. Note that we have omitted the race dummy for whites and included an intercept term: thus, race dummies capture deviations from the hazard rate for being fatally shot by the police faced by whites. We see that the Hispanic dummy is insignificant, suggesting that police shooting fatalities are not biased against Hispanic subjects. Meanwhile, the Black dummy is large and significant. This suggests a race-specific bias against black subjects, largely black men, in police shooting fatalities.
How large is this bias? In order to pin that down, we drop the other dummies and refit the model. Table 2 displays our selected model.
Table 2. Selected Negative Binomial Model of Fatal Police Shootings. | ||||
Slope | std error | z-score | P | |
Intercept | 1.54 | 0.059 | 26.4 | 0.000 |
Black | 0.35 | 0.110 | 3.1 | 0.002 |
Homicide | 0.32 | 0.077 | 4.2 | 0.000 |
Population | 1.06 | 0.071 | 14.9 | 0.000 |
Source: FBI, Washington Post, Census Bureau, author’s computations. Negative Binomial GLM fitted with IRLS in Python. Response is race-specific police shooting fatalities by state. N=200 state-race combinations. Non-Boolean variables have been standardized to have zero mean and unit variance. Estimates in bold are significant at the 5 percent level. |
Our selected model explains 68 percent of the variation in police shooting fatalities by state and race (r = 0.82, P < 0.0001). Dropping the other dummies slightly reduces the effect size of the Black dummy: from 0.40 to 0.35. However, the standard error of our estimate is smaller, thus giving us a tighter confidence bound. The 95 percent confidence interval for the coefficient of the Black dummy is (0.173, 0.474). We feed these estimates into the fitted model to obtain an estimate of excess black deaths at the hands of the police.
We estimate that, in 2017-2018, 127 more black subjects were killed in police shootings relative to the counterfactual where black subjects face the same risk of being shot and killed by police officers, after controlling for race-specific population size and race-specific homicides. We estimate a 95 percent confidence bound of 42-231 excess deaths among blacks in 2017-2018, relative to the no-bias counterfactual. In other words, we estimate that there is a less than 5 percent chance that there were less than 42 or more than 231 excess deaths among blacks relative to the no-bias counterfactual.
These are very large numbers. For comparison, note that 415 black subjects (of whom 398 were men) were shot and killed by police officers in 2017-2018. Thus, our central estimate says that fatalities among blacks in fatal police shooting incidents were 30 percent larger than they would have been had blacks faced no bias. Our 95 percent confidence interval in percentage terms is 10-56 percent. Again, these are very large numbers.
We do not have race-specific data for violent crimes. However, the aggregate number of homicides by state is highly correlated with the number of violent crimes (r = 0.96, P < 0.0001). It is therefore enough to control for race-specific homicides. We have thus been able to rule out the second hypothesis: Racial bias in fatal police shootings is not largely due to the higher crime rates among blacks. We have shown that even after controlling for race-specific rates of violent crime, there exists a very strong racial bias in fatal police shootings against African-Americans.
Replication data and code for our negative binomial regression model can be found at GitHub.
Postscript. Commenting on the original post, Luther Rivera suggests that there may be a serious class bias in police shootings. And that the class bias may account for the racial bias against African-Americans in fatal police shootings because of their over-representation in the lower classes that are disproportionately at the receiving end of police violence. If the Lutheran Hypothesis is correct, then controlling for a race-specific measure of class, the coefficient of the black dummy should vanish.
Our original selected model in python output:
The coefficient of the Black dummy is, as before, equal to 0.35 and highly significant (P = 0.002). Now see what happens when we add race-specific median income to the regression:
The coefficient of median income bears the intuitive sign — higher income predicts lower police shootings after controlling for violent crime and population at risk. It is significant at the 5 percent, although not at the 1 percent level (P = 0.045). So the slope is not very large or significant. It is, however, large enough to wipe out the coefficient of the Black dummy which now falls into insignificance (P = 0.171). We can thus not rule out the Lutheran Hypothesis: racial bias is an artifact of class bias in fatal police shootings. In other words, African-Americans are shot at higher rates by American police officers even after accounting for race-specific crime rates — but they are not shot at higher rates than whites once we account for the class bias in police violence. Once we control for race-specific population, homicides and median income, there is only one racial bias that survives — in favor of Hispanics:
Many thanks to Luther for nudging me in the right direction. The results documented in this postscript do not overturn the results documented in the original post. Instead, they deepen our understanding: race emerges as a mediating variable between class disparities in police violence and the burden of overpolicing. It is not simply black men who are being overpoliced. If the analysis presented here is correct, so are poorer whites. Greater economic deprivation turns out to be the governing risk factor not for violent crime, as I had hypothesized at the beginning of the original post, but for police bias as expected by Luther.
OK, here’s my question, which I’d love for you to address in non regression-speak: when I look through the excel spreadsheet you linked to for the raw data I look at columns 2 and 3 and I see that police killings of African Americans are either roughly proportionate to, or are lower in proportion to homicides by African Americans (ignoring the OU category). E.g in Illinois, in column 2 there were 166 white homicides and 405 black homicides and in column 3, 13 police shootings of whites and 20 police shootings of blacks. Wouldn’t a simple causal explanation be that the police encounter only a subset of white and black populations – a violent subset for which homicides are a rough proxy. In encounters with that subset, police are somewhat more likely to kill white than black suspects (based on the IL figures which look similar to the other big states in your data).
I’m struggling to see how your complex data analysis leads from that raw data to your very different causal conclusion (police racism). Causally, (not using regression-speak) how does your model differ from the simple one I’ve offered above? Also I’m not sure what you’ve done with the OU category or what assumptions you’ve made about its likely racial distribution.
Happy to explain. The whole point of this exercise is to ask: Is the black-white gap in fatal police shootings explained by the black-white gap in violent crime? While the aggregate pattern (Table 1) suggests that it is more than sufficient, a deeper investigation reveals the opposite conclusion. Forget about blacks and whites for a second and ask what is the exact relationship between fatal police shootings and homicides? That is the regression model. The subtlety involved is that fatality numbers obey the negative binomial distribution. So we have to model them as such. Once we have pinned the model down explicitly, we can ask whether blacks and whites differ in terms of the relationship between police shootings and homicides _in the cross-section of US states_. Ie, given the relationship between homicides and fatal police shootings in the cross-section of US states, do more blacks get fatally shot by the police even after controlling for violent crime? That’s what the race dummies capture. If it were the case that police shootings are a function of homicides (and population at risk), ie, if they were unbiased, then the coefficient of the black dummy should vanish. As I explained, we need to pay careful attention to the coefficient of OU precisely because we don’t know what’s in it — it could potentially flip the relationship in either direction. Thankfully, it turns out not be a problem since the coefficient of the OU dummy vanishes. Thus, we can conclude that the black-white gap in fatal police shootings is not entirely explained by the black-white gap in violent crime. Moreover, now that we have an explicit model relating homicidal violence by civilian offenders and the police, we can quantify the bias with some precision.
Hi Policy Tensor,
Wanted to first say that I’m a big fan of the analyses and thoughts you present on the blog.
One specific thing I have taken from your previous work is the importance of class in the US. African-Americans also seem to be disproportionately represented in the working class/poorer communities. Doesn’t your analysis leave open the door to the possibility that excess policing (encounters with the police) has a class element which disproportionately affects African-Americans?
Would be awesome to see if you could somehow control for income/wealth of racial communities as well.
Hi Luther:
Thanks for the vote of confidence. You are absolutely to right to go a step further and ask whether African-American overrepresentation in the lower classes can account for the racial bias in fatal police shootings. You are also right to suspect that it would. When I control for race-specific median household income, the Black dummy vanishes. This suggests that racial bias is mostly due to a class bias in police shooting fatalities. I’ll update the post.
Thanks for paying careful attention.