Political Analysis Advance Access first published online on February 10, 2008
This version published online on February 14, 2008
Political Analysis, doi:10.1093/pan/mpm039
| ||||||||||||||||||||||||||||||||||||||||||||||||||
Model Specification in Instrumental-Variables Regression
Department of Political Science, Yale University, PO Box 208301, New Haven, CT 06520
e-mail: thad.dunning{at}yale.edu
In many applications of instrumental-variables regression, researchers seek to defend the plausibility of a key assumption: the instrumental variable is independent of the error term in a linear regression model. Although fulfilling this exogeneity criterion is necessary for a valid application of the instrumental-variables approach, it is not sufficient. In the regression context, the identification of causal effects depends not just on the exogeneity of the instrument but also on the validity of the underlying model. In this article, I focus on one feature of such models: the assumption that variation in the endogenous regressor that is related to the instrumental variable has the same effect as variation that is unrelated to the instrument. In many applications, this assumption may be quite strong, but relaxing it can limit our ability to estimate parameters of interest. After discussing two substantive examples, I develop analytic results (simulations are reported elsewhere). I also present a specification test that may be useful for determining the relevance of these issues in a given application.
| 1. Introduction |
|---|
|
|
|---|
Social scientists often construct instrumental variables for use in regression analysis. The well-known idea is as follows. Consider the regression equation
|
| (1) |
is an intercept, β is a regression coefficient, and
i is an unobserved, mean-zero error term. Here, Yi, Xi, and
i are random variables. The parameters
and β will be estimated from the data. Unlike the classical regression model, Xi may be dependent on the error term, that is, endogenous. The ordinary least-squares estimator will therefore be biased. Under additional assumptions, however, instrumental variables least squares (IVLS) regression provides a way to obtain consistent parameter estimates. To use IVLS, we must find an instrumental variable, namely, a random variable Zi that is statistically independent of the error term in equation (1). Moreover, Xi and Zi must be reasonably well correlated. The latter condition can be checked (Bound, Jaeger, and Baker 1995); the former assumption cannot.1 (Below, these ideas are generalized to apply to p treatments and q instruments.) In applications, it is common to devote significant attention to defending the assumption of exogeneity. The broad point I make in this article is the following. It is not merely the exogeneity of the instrument that allows for estimation of the effect of treatment. The inference also depends on a causal model that can be expressed in a regression equation like (1). Without the regression equation, there is no error term, no exogeneity, and no causal inference by IVLS. Exogeneity, given the model, is therefore necessary but not sufficient for the instrumental variables approach. The specification of the underlying causal model is at issue as well.
Although this general point has been raised by others,2 I draw attention here to a particular, critical assumption: variation in the endogenous regressor related to the instrumental variable must have the same causal effect as variation unrelated to the instrument. In equation (1), for example, a single regression coefficient β applies to endogenous as well as exogenous components of Xi. In many applications, this assumption of "homogenous partial effects" may be quite strong, but relaxing it can limit our ability to estimate parameters of interest.
For instance, let Xi be a measure of income and Yi be a measure of political attitudes, such as opinions about taxation. In the example discussed in Section 2, the population of subjects is limited to participants in a prize lottery. The overall income of subject i then consists of Xi
X1i + X2i, where X1i is ordinary income and X2i measures lottery winnings. Overall income Xi is likely to be endogenous, because factors associated with family background influence both ordinary income and political attitudes. However, lottery winnings are correlated with overall income and are also plausibly exogenous. As discussed below, lottery winnings can be used to instrument for overall income Xi.3
However, this approach requires the true data-generating process to be
|
| (2) |
|
| (3) |
β2. According to equation (3), there are heterogenous causal effects across components of Xi, that is, heterogenous partial effects. If the true model is equation (3), assuming equation (2) will produce estimates that are misleading.4 The model must be specified before IVLS or another technique can be used to estimate it. The assumption of homogenous partial effects is therefore a general issue, whether or not Xi is endogenous. Applications of IVLS tend to bring the importance of this assumption to the fore, however. When analysts exploit natural experiments or other research designs to construct an instrumental variable Zi, variation in Xi related Zi may not have the same causal effect as variation unrelated to Zi.5 Unfortunately, it is often the desire to estimate the effect of variation unrelated to the instrument that motivates us to use IVLS in the first place. Otherwise, we could simply regress Yi on Zi.
The issue arises in many settings. For instance, in a regression of civil conflict on economic growth, using data from sub-Saharan African countries, economic growth may be endogenous. Annual changes in rainfall may be used as an instrumental variable for economic growth. Yet as discussed in Section 3, different sources of economic growth, such as growth of agricultural or industrial productivity, may have different effects on the probability of civil war in Africa, and rainfall changes may be associated with the growth of agricultural but not industrial productivity. Economic growth, individual income, and other variables of interest to social scientists tend to be summary measures of many component inputs. These inputs may have different effects on the dependent variable, and instrumental variables will be related to some of these inputs but not to others.
The point is not that there is a general failure in IVLS applications. The assumption of homogenous partial effects may be innocuous in some settings, misleading in others. The examples discussed in this article include some of the strongest recent papers in the literature, in which innovative research designs supply good instruments. Yet the examples also remind us that in the regression context, the identification of causal effects using IVLS depends not just on the exogeneity of the instrument in relation to the model we posit but also on the validity of the underlying model itself.6 This is easily forgotten if we are focusing only on arguments about exogeneity.
Whether the assumption of homogenous partial effects is plausible in any given application is mostly a matter for a priori reasoning; supplementary evidence may help. At the end of this article, I present a statistical specification test that might be of some use. The specification test requires at least one additional instrument, however, and therefore may be of limited practical utility. The main goal of the article is thus to underscore the importance of the assumption of homogenous partial effects and to encourage its discussion in applications. Specification of the model should be defended with the same energy used to defend exogeneity.
This discussion extends without difficulty to p treatment variables and q instruments. For instance, the matrix version of equation (1) is
|
| (4) |
is an n x 1 column vector. Here, n is the number of units, and p is the number of right-hand side variables (including the intercept if there is one). We can think of the rows of equation (4) as i.i.d. realizations of the data-generating process implied by equation (2) or (3), for units i = 1, ...., n.7 To use IVLS, we must find an n x q matrix of instrumental variables Z, with n > q
p, such that (1) Z'Z and Z'X have full rank and (2) Z is independent of the unobserved error term, that is, exogenous (Greene 2003: 74–80; Freedman 2005: 175). Exogenous columns of X may be included in Z. The IVLS estimator can be written as
|
| (5) |
= Z(Z'Z)–1Z'X.8
Note that
is the projection of X onto Z and is (nearly) exogenous.9 On the other hand, X also has a projection orthogonal to Z, which is e
X –
. Rewriting X = e +
and substituting into equation (4), we have
|
| (6) |
The focus of this article differs from a related literature on instrumental variables regression. In other papers, often formulated in the context of the Neyman-Holland-Rubin potential outcomes model, individuals or other units are assumed to have distinct responses to treatment; instruments may influence participation in treatment for only a subset of the units. Under suitable assumptions, instrumental variables can identify what Imbens and Angrist (1994) call "local average treatment effects," that is, average treatment effects for the subset of units whose participation in treatment is influenced by the instruments.10
In this article, I ignore heterogeneity of treatment effects across individuals or units: in the regression models discussed here, coefficients are common to all units. I instead investigate the consequences of heterogeneity across pieces of treatment variables—that is, causal heterogeneity across portions of X. I show that using IVLS to identify the effect of an endogenous regressor, such as individual income or economic growth, depends on specifying a regression model in which all of the inputs or component parts of this regressor have the same effect on the dependent variable. I call this the assumption of homogenous partial effects.
| 2. Political Attitudes and Lottery Winnings |
|---|
|
|
|---|
Doherty, Green, and Gerber (2005, 2006) are interested in assessing the relationship between income and political attitudes.11 They surveyed 342 people who had won a lottery in an unidentified Eastern state between 1983 and 2000 and asked a variety of questions about attitudes toward estate taxes, government redistribution, and social and economic policies more generally. Given the number and kinds of lottery tickets that individuals buy, the level of lottery winnings are randomly assigned among lottery players.12 Abstracting from sample nonresponse and other issues that might threaten the validity of the inferences,13 the authors can exploit the lottery to make compelling claims about the causal impact of winnings on political beliefs. It turns out that winning large amounts in a lottery has an effect on some relatively narrow political attitudes—for example, those who win more in the lottery favor the estate tax less—but lottery winnings have relatively little impact on broader political attitudes, for instance, toward the proper role of government in the economy writ large.
However, a question of greater interest concerns the political effects of overall income, not lottery winnings per se. Does the strong research design allow us to generalize from the effect of lottery winnings to the effect of overall income? It does not, without making further assumptions. As Doherty, Green, and Gerber (2005: 8–10, 2006: 446–7) carefully point out, the effect on political attitudes of "windfall" lottery winnings may be very different from other kinds of income—for example, income earned through work, interest on wealth inherited from a rich parent, and so on.
These kinds of concerns may also limit our ability to use IVLS to estimate the causal effect of overall income on political attitudes. Let Ai be a measure of the political attitudes of subject i.14 Consider the regression equation
|
| (7) |
i is a random variable, independently and identically distributed (i.i.d.) across respondents with E(
i) = 0. For ease of exposition, the variables Ai and Ii are normalized to have zero mean and covariates are not included.15 The goal is to estimate the regression coefficient β, which measures the impact of overall income on political attitudes; by assumption, β is the same for all respondents.16 Equation (7) is the standard linear regression setup, except for one catch: the error term is not independent of income, because unobserved (unmeasured) variables may be associated with both overall income and political attitudes. For instance, rich parents may teach their children how to play the stock market and also influence their attitudes toward government intervention. Peer-group networks may influence both economic success and political values. Ideology may itself shape economic returns, perhaps through the channel of beliefs about the returns to hard work. Even if some of these variables could be measured and controlled, clearly there are many unobserved variables that could conceivably confound inferences about the causal impact of overall income on political attitudes.
Given the model in equation (7), however, the innovative research design supplies an excellent instrument—namely, a variable that is both correlated with the overall income of person i and is independent of the error term in equation (7).17 This variable is the level of lottery winnings of respondent i. The next equation is an accounting identity:
|
| (8) |
|
| (9) |
|
| (10) |
B means "A is independent of B."
Viewed in the context of equation (7), equations (9) and (10) give the conditions for a valid instrument. The IVLS estimator is
![]() | (11) |
Note, however, that our ability to generalize from the effect of one treatment—lottery winnings—to the effect of another treatment—total income—is ensured only by the model in equation (7). We can use equation (8) to rewrite equation (7) as
|
| (12) |
An alternative model to consider is
|
| (13) |
β2. Here, the variable Wi is plausibly independent of the error term among lottery winners, due to the randomization provided by the natural experiment. However, Oi remains endogenous, perhaps because factors such as education or parental attitudes influence both ordinary income and political attitudes. We could again resort to the instrumental variables approach, but since we need as many instruments as there are regressors in (13), we will need some new instrument in addition to Wi. Suppose the data were generated according to equation (13) and we erroneously assume equation (12). As I show analytically in Section 4, if we use IVLS to estimate equation (12) using Wi as an instrument for Ii, IVLS estimates β2 rather than β1.20 Given that the coefficient of Oi is of interest, this may substantially limit the utility of instrumental variables. After all, if we only cared about β2, we could simply regress Yi on Wi. The point is not that there is a general flaw in the IVLS approach. The point is that model specification matters; for IVLS to estimate the parameter of interest, the data must be generated according to equation (12), not equation (13).
| 3. Civil War and Rainfall |
|---|
|
|
|---|
Miguel, Satyanath, and Sergenti (2004) study the effects of economic growth on the likelihood of civil conflict in Africa. According to the influential models of Collier and Hoeffler (1998, 2001), economic factors influence the incidence of civil war because of the important role they play in rebel recruitment (see also Weinstein 2007). Miguel, Satyanath, and Sergenti (2004: 727) summarize the approach as follows: "Collier and Hoeffler stress the gap between the returns from taking up arms relative to those from conventional economic activities, such as farming, as the causal mechanism linking low income to the incidence of civil war."21 According to Collier and Hoeffler, the economic incentives of potential rebels outweigh other factors, such as social injustice, in explaining the incidence of rebellion. In their well-known formulation, it is greed, not grievance, that mainly explains variation in the occurrence of civil wars.
However, there is an important problem for purposes of testing such theories about the influence of economic conditions on civil conflict. As Miguel, Satyanath, and Sergenti (2004: 726) point out, "the existing literature does not adequately address the endogeneity of economic variables to civil war and thus does not convincingly establish a causal relationship. In addition to endogeneity, omitted variables—for example, government institutional quality—may drive both economic outcomes and conflict, producing misleading cross-country estimates." Civil conflict may influence economic conditions, and there may be confounding too.
Miguel, Satyanath, and Sergenti (2004) posit that the probability of civil conflict in a given country and year is given by
|
| (14) |
is an intercept, β is a regression coefficient, and
it is a mean-zero random variable.22 According to the model, if we intervene to increase the economic growth rate in country i and year t by one unit, the probability of conflict in that country-year is expected to increase by β units (or to decrease, if β is negative). The problem is that Git and
it are not independent. The proposed solution is instrumental-variables regression. Annual changes in rainfall provide the instrument for economic growth. In sub-Saharan Africa, as the authors demonstrate, there is a positive correlation between percentage change in rainfall over the previous year and economic growth, so the change in rainfall passes one key requirement for a potential instrument. The other key requirement is that rainfall changes are independent of the error term.23 This is essentially untestable, but Miguel, Satyanath, and Sergenti probe its plausibility at length, and the idea seems very sensible.24 The IVLS estimates presented by Miguel, Satyanath, and Sergenti suggest a strong negative relationship between economic growth and civil conflict.25 This appears to be compelling evidence of a causal relationship, and Miguel, Satyanath, and Sergenti also have a plausible mechanism to explain the effect—namely, the impact of drought on the recruitment of rebel soldiers.
Yet have Miguel, Satyanath, and Sergenti estimated the effect of economic growth on conflict? Making this assertion depends on how growth produces conflict. In particular, it depends on positing a model in which economic growth has a constant effect on civil conflict—constant, that is, across the components of growth. Notice, for instance, that equation (14) is agnostic about the sector of the economy experiencing growth. According to the equation, if we want to influence the probability of conflict, we can consider different interventions to boost growth: for example, we might target foreign aid with an eye to increasing industrial productivity or we might subsidize farming inputs in order to boost agricultural productivity.
Suppose instead that growth in agriculture and growth in industry—which both influence overall economic growth—have different effects on conflict, as in the following model:
|
| (15) |
If the true data-generating process is equation (14), but economic growth is endogenous, instrumental-variables regression delivers the goods. On the other hand, if the data-generating process is equation (15), another approach may be needed. If β2 is the coefficient of theoretical interest, we might use rainfall changes to instrument for agricultural growth in equation (15). However, industrial growth and agricultural growth may both be dependent on the error term in equation (15), in which case a different instrument for industrial growth would be required.28
The point for present purposes is not to try to specify the correct model for this substantive context. The objective is to point out that what IVLS estimates depend on the assumed model and not just on the exogeneity of the instrument in relation to the model. There are important policy implications, of course: if growth reduces conflict no matter what the source, we might counsel more foreign aid for the urban industrial sector, whereas if only agricultural productivity matters, the policy recommendations would be quite different. Discussing and defending the specification of the model, and not just the plausibility of exogeneity, is therefore a crucial part of IVLS applications.
| 4. What Does IVLS Estimate when the Model Is Wrong? |
|---|
|
|
|---|
If the data-generating process involves heterogenous partial effects and we erroneously assume homogenous effects, what does IVLS estimate? In this section, I analyze a case akin to the example in Section 2, where an endogenous regressor breaks down into the sum of independent exogenous and endogenous pieces. I show that in this case, IVLS asymptotically estimates the impact of the exogenous portion of treatment, not the endogenous piece or a mixture of endogenous and exogenous pieces.
For each observation i, the true data-generating process is
|
| (16) |
β2. The subjects are i.i.d., and E(
i) = E(X1i) = E(X2i) = 0. Equation (16) is identical to equation (13) in Section 2, with X1i equal to ordinary income and X2i equal to lottery winnings. Here, X1i is endogenous and X2i is exogenous. In symbols,
|
| (17) |
|
| (18) |
i are independent. Also, X1i
X2i.29
Suppose we erroneously assume that data were generated according to
|
| (19) |
X1i + X2i (with "T" for "total"). Equation (19) is the usual regression model, with one exception: XTi is endogenous, because X1i and
i are dependent. However, by construction we have a valid instrument, since X2i is correlated with the endogenous regressor but also independent of the error term.
The instrumental variables estimator is
|
| (20) |
|
| (21) |
|
| (22) |
In other cases, the situation may be somewhat more complicated. For instance, when Cov(X1, X2)
0, the IVLS estimate of β in equation (19) will converge to a mixture of β1 and β2, the weights being w = Cov(X2i, X1i)/[Cov(X2i, X1i) + Var(X2i)] on β1 and 1 – w on β2.32 In simulations reported online, I investigate what IVLS estimate under a range of other assumptions about the true data-generating process.33
In short, if the true data-generating process involves different coefficients for different components of the treatment variable Xi, and we assume that these components have the same coefficients, IVLS may estimate some data-dependent mixture of the structural parameters, which may not be the quantity of interest. For a more general discussion, see Angrist, Imbens, and Rubin (1996). The analytic results in this section therefore underscore the key role played by model specification: exogeneity of the instruments, given the model, is necessary but not sufficient for valid application of IVLS.
| 5. A Model Specification Test |
|---|
|
|
|---|
The discussion above suggests a natural specification test, which requires the availability of an additional instrument, Z1i, such that
|
| (23) |
|
| (24) |
|
| (25) |
Let
be the estimated variance–covariance matrix for the coefficient estimates:
|
| (26) |
|
| (27) |
This adaptation of a standard test compares a pooling estimator to a splitting estimator; it could be viewed as a Hausmann test, in which an additional instrument is needed to test the pooling restriction because X1 is endogenous. In simulations, the specification test is able to detect model specification failures with a high degree of accuracy. Of course, like most specification tests, this one is robust only against a limited class of alternatives: we stipulate that the data are generated according to equation (16), and the alternatives are that β1 = β2 or β1
β2. Moreover, since the test requires the availability of an additional instrument, it may only be useful in certain classes of applications.34
| 6. Conclusion |
|---|
|
|
|---|
Social scientists often construct instrumental variables for use in regression analysis. A valid instrumental variable Zi must be correlated with an endogenous regressor Xi, and it must itself be exogenous, that is, independent of the error term in the underlying regression model. The first assumption can be checked from the data. The second assumption is generally the more difficult to satisfy, and it is essentially untestable. In applications, analysts often seek to use natural experiments or other research designs to generate plausible instruments (Rosenzweig and Wolpin 2000; Angrist and Krueger 2001; Dunning 2007).
However, it is not enough to have a valid instrument. The regression model linking Yi to Xi must also be valid. Although this may seem obvious, in this article I have drawn attention to a too-infrequently remarked feature of the canonical IVLS regression model: the assumption of homogenous causal effects across portions of the endogenous regressor Xi, that is, the assumption of homogenous partial effects.
Violations of this assumption can limit the ability of the instrumental variables approach to recover causal parameters. For example, in order to use lottery income to estimate the effect of overall income on political attitudes, we must assume that the effects of lottery income and ordinary income are the same. To use rainfall changes to estimate the effect of economic growth on civil conflict, we must assume that growth in the agricultural sector has the same effect as growth in the industrial sector. In short, we need to assume that variation in the endogenous regressor that is related to the instrumental variable has the same effect as variation that is unrelated to the instrument. In many applications, this assumption may be quite strong, and it should be defended with same energy used to defend exogeneity.
If the assumption of homogenous partial effects is wrong, then IVLS estimates can be quite misleading. When heterogeneity takes the simple form discussed in the example on lottery winnings—that is, the endogenous regressor is a sum of independent exogenous and endogenous portions—instrumental-variables regression simply estimates the coefficient of the exogenous portion of treatment. In more complicated settings, IVLS may estimate a mixture of the true coefficients, but it will not necessarily estimate a mixture of theoretical interest. Thus, if the model is incorrectly specified, exogeneity may not be of much help. The point here is not that a different estimation strategy would be better than IVLS. What is at issue is the specification of the model.
Ultimately, the question of model specification is a theoretical and not a technical one. Whether it is proper to specify constant coefficients across exogenous and endogenous portions of a treatment variable, in examples like those discussed in this article, is a matter for theoretical consideration to be decided on theoretical grounds. Supplemental evidence may also provide insight into the appropriateness of the assumption of homogenous partial effects. The issues discussed in this article are not unique to applications of IVLS—indeed, similar issues may arise even if there is no endogeneity—yet special issues are raised with IVLS because we often hope to use the technique to recover the causal impact of endogenous portions of treatment.
What about the potential problem of infinite regress? In the lottery example, for instance, it might well be that different kinds of ordinary income have different impacts on political attitudes; in the Africa example, different sources of agricultural productivity growth could have different effects on conflict. To test many permutations, given the endogeneity of the variables, we would need many instruments, and these are not usually available. This is exactly the point. Deciding when it is appropriate to assume homogenous partial effects is a crucial theoretical issue. That issue tends to be given short shrift in typical applications of the instrumental variables approach, where the focus is on exogeneity.
The point here is not to encourage data analysis or regression diagnostics (although more data analysis might well be a good idea). Rather, in any particular application, a priori and theoretical reasoning as well as supplementary evidence should be used to justify the specification of the underlying regression model. In some settings, the assumption of homogenous partial effects may be innocuous; in other settings, it will be wrong, and IVLS will deliver misleading estimates. Exploiting a natural experiment that randomly assigns units to various levels of Zi may not be enough to recover the causal impact of Xi, if the regression model that is being estimated is itself incorrect.
| Notes |
|---|
|
|
|---|
Author's note: I am grateful to David Freedman, Don Green, Nicholas Sambanis, Ken Scheve, and the anonymous reviewers, whose suggestions greatly improved this article. Bear Braumoeller, David Collier, and Jason Seawright made valuable comments on an earlier, related paper. Simulations are available on the Political Analysis Web site.
1 Standard overidentification tests using multiple instrumental variables, for instance, assume that at least one instrument is exogenous (Greene 2003: 413–5). ![]()
2 See Heckman and Robb (1986); Imbens and Angrist (1994); Angrist, Imbens, and Rubin (1996); Rosenzweig and Wolpin (2000); Freedman (2006); Heckman, Urzua, and Vytlacil (2006). ![]()
5 A discussion of natural experiments can be found in Angrist and Krueger (2001) or Rosenzweig and Wolpin (2000); see also Dunning (2005, 2007). ![]()
6 Inferring causation from regression may demand a "response schedule" (Heckman 2000; Freedman 2005: 85–95). A response schedule says how one variable would respond were we to intervene and manipulate other variables; it is a theory of how the data were generated. ![]()
7 In many applications, we may only require that
i is i.i.d. across units. ![]()
8 Equation (5) is the usual way of writing the two-stage least-squares estimator,
IISLS. See Freedman (2005:178–9) for a proof that
IISLS =
IVLS. ![]()
9
is not quite exogenous, because it is computed from X. This is the source of small-sample bias in the IVLS estimator; as the number of observations grows, the bias goes asymptotically to zero. ![]()
10 See also Heckman and Robb (1986); Angrist, Imbens, and Rubin (1996); Heckman, Urzua, and Vytlacil (2006). Rosenzweig and Wolpin (2000) and Freedman (2006) also show that what IVLS estimates depend on the underlying behavioral models that are posited. There is a large literature that discusses other aspects of IVLS (see Hanushek and Jackson 1977: 234–9, 244–5; Kennedy 1985: 115; Bartels 1991; Bound, Jaeger, and Baker 1995). ![]()
11 Portions of the material in this section are based on Dunning (2005, 2007). ![]()
12 Lottery winners are paid a large range of dollar amounts. In Doherty, Green, and Gerber sample, the minimum total prize was $47,581, whereas the maximum was $15.1 million, both awarded in annual installments. ![]()
13 See Doherty, Green, and Gerber (2005, 2006) for further details. ![]()
14 For instance, Ai might be a measure of the extent to which respondents favor the estate tax or a measure of opinions about the appropriate size of government. ![]()
15 Doherty, Green, and Gerber (2005, 2006) present a similar linear regression model, though they report estimates of ordered probit models. Their equation (1) includes various covariates, including a vector of variables to control for the kind of lottery tickets bought. ![]()
16 Notice that according to equation (7), subject i's response depends on the values of i's right-hand side variables; values for other subjects are irrelevant. The analog in Rubin's formulation of the Neyman model is the stable unit treatment value assumption (Neyman 1923; Rubin 1974, 1978, 1980; Dabrowska and Speed 1990; see also Cox 1958; Holland 1986). ![]()
17 Doherty, Green, and Gerber (2005) use instrumental variables. ![]()
18 That is, Oi is shorthand for the income of subject i, net of lottery winnings; this could include earned income from wages as well as rents, royalties, and so forth. ![]()
19 This assumes (eminently plausibly) that Cov(Oi, Wi)
–Var(Wi). ![]()
20 This depends on the independence of Oi and Wi, which is due here to the randomization of units to levels of lottery winnings. If the true model is (13) but Oi and Wi are correlated, IVLS will estimate a mixture of β1 and β2; see Section 4. ![]()
21 Fearon and Laitin (2003), in an alternative though possibly complementary approach, emphasize the importance of state capacity and roughness of terrain in explaining the outbreak and duration of civil war. ![]()
22 Equation (14) resembles the main equation found in Miguel, Satyanath, and Sergenti (2004: 737), although I use Git in place of Miguel, Satyanath, and Sergenti's notation for economic growth, and I ignore control variables as well as lagged growth values for ease of presentation. The specification in Miguel, Satyanath, and Sergenti is Cit =
Git + X'itβ +
it, so the dichotomous variable Cit is assumed to be a linear combination of continuous right-hand side covariates and a continuous error term. The authors clearly have in mind a linear probability model, so in the text I write equation (14) instead. ![]()
23 An exclusion restriction is necessary in this context: Z cannot appear in equation (14). This would be violated if rainfall had a direct effect on warfare, above and beyond its influence on the economy. ![]()
24 Exogeneity of the instrument is not the issue here; for purposes of this discussion, I will assume that the change in annual rainfall is exogenous. ![]()
25 "A five-percentage-point drop in annual economic growth increases the likelihood of a civil conflict ... in the following year by over 12 percentage points—which amounts to an increase of more than one-half in the likelihood of civil war" (Miguel, Satyanath, and Sergenti 2004: 727). A civil conflict is coded as occurring if there are more than 25 (alternatively, 1000) battle deaths in a given country in a given year. ![]()
26 The use of the same notation for coefficients as in Section 2 is merely for convenience; for instance, there is no claim here that overall economic growth is an additive function of growth in the industrial and agricultural sectors. ![]()
27 Kocher (2007), for example, emphasizes the rural basis of contemporary civil wars. ![]()
28 For instance, conflict may depress agricultural growth and harm urban productivity as well. ![]()
29 This is as in the example on lottery winnings: subjects are randomized to levels of X2i. ![]()
30 Equation (20) is valid because X1 and X2 have been normalized to have a mean of zero. ![]()
31 This depends on the independence of X1i and X2i in this example: see equation (21). ![]()
32 In the formula for w, Var and Cov operate on random variables, and w could be negative. ![]()
33 See the Political Analysis Web site. Also posted at http://pantheon.yale.edu/~td244/research.html. ![]()
34 For instance, I do not attempt to key the test to data from the examples discussed above because I do not see an available additional instrument. ![]()
| References |
|---|
|
|
|---|
-
Angrist Joshua D., Krueger Alan B. Instrumental variables and the search for identification: From supply and demand to natural experiments. Journal of Economic Perspectives (2001) 19:2–16.
Angrist Joshua D, Imbens Guido W., Rubin Donald B. Identification of causal effects using instrumental variables. Journal of the American Statistical Association (1996) 91:444–55.[CrossRef][Web of Science]
Bartels Larry M. Instrumental and Quasi-Instrumental Variables. American Journal of Political Science (1991) 35:777–800.[CrossRef][Web of Science]
Bound John, Jaeger David, Baker Regina. Problems with instrumental variables estimation when the correlation between the instruments and the endogenous explanatory variables is weak. Journal of the American Statistical Association (1995) 90:443–50.[CrossRef][Web of Science]
Collier Paul, Hoeffler Anke. On economic causes of civil war. Oxford Economic Papers (1998) 50:563–73.
———. Greed and grievance in civil war (2001) Washington, DC: World Bank: Policy Research Paper no. 2355.
Cox David R. Planning of experiments (1958) New York: John Wiley & Sons.
Dabrowska DM, Speed TP. On the application of probability theory to agricultural experiments: Essay on principles. In: Statistical Science (1990) 5:465–80. (with discussion). English translation of Jerzy Neyman (1923), Sur les applications de la théorie des probabilités aux experiences agricoles: Essai des principes. Rocznici Nauk Rolniczych 10:1–51, in Polish.
Doherty Daniel, Green Donald, Gerber Alan. Personal income and attitudes toward redistribution: A study of lottery winners. (2005) Field Experiment Initiative, Institution for Social and Policy Studies, Yale University. http://www.yale.edu/isps/publications/field.html (accessed January 8, 2008).
———. Personal income and attitudes toward redistribution: A study of lottery winners. In: Political Psychology (2006) 27:2006.
Dunning Thad. Strengthening causal inference: Practical and statistical perspectives on natural experiments. Presented at the annual meetings of the American Political Science Association, Washington, DC, August 31 to September 5 (2005) 200:5.
———. Improving causal inference: Strengths and limitations of natural experiments. In: Political Research Quarterly (2007) http://intl-prq.sagepub.com/pap.dtl (accessed October 3, 2007).
Fearon James, Laitin David. Ethnicity, insurgency, and civil war. American Political Science Review (2003) 97:75–90.[Web of Science]
Freedman David. Statistical models: Theory and practice. (2005) Cambridge: Cambridge University Press.
———. Statistical models for causation: What inferential leverage do they provide? In: Evaluation Review (2006) 30:691–713.
Greene William H. Econometric analysis (2003) 5th ed. Upper Saddle River, NJ: Prentice Hall.
Hanushek Eric A., Jackson John E. Statistical methods for social scientists (1977) San Diego, CA: Academic Press, Harcourt Brace & Company.
Heckman James J. Causal parameters and policy analysis in economics: A twentieth century retrospective. Quarterly Journal of Economics (2000) 115:45–97.[CrossRef][Web of Science]
Heckman James J., Robb R. Alternative methods for solving the problem of selection bias in evaluating the impact of treatments on outcomes. In: Drawing inferences from self-selected samples—Wainer Howard, ed. (1986) New York: Springer-Verlag. 63–107.
Heckman James J., Urzua Sergio, Vytlacil Edward. Understanding instrumental variables in models with essential heterogeneity. In: Review of Economics and Statistics (2006) 88:389–432.[CrossRef][Web of Science]
Holland Paul W. Statistics and causal inference. In: Journal of the American Statistical Association (1986) 8:945–70. (with discussion).
Imbens Guido W., Angrist Joshua D. Identification and estimation of local average treatment effects. Econometrica (1994) 62:467–75.[CrossRef][Web of Science]
Kennedy Peter. A guide to econometrics. (1985) 2nd ed. Cambridge, MA: MIT Press.
Kocher Matthew Adam. Insurgency, state capacity, and the rural basis of civil war (2007) Centro de Investigación y Docencia Económicas (CIDE): Paper prepared for presentation at the Program on Order, Conflict, and Violence, Yale University. October 26, 2007.
Miguel Edward, Satyanath Shanker, Sergenti Ernest. Economic shocks and civil conflict: An instrumental variables approach. Journal of Political Economy (2004) 22:725–53.
Neyman Jersey. Sur les applications de la théorie des probabilités aux experiences agricoles: Essai des principes. In: Roczniki Nauk Rolniczych (1923) 10:1–51. in Polish. English translation by D. M. Dabrowska, and T. P. Speed (1990), Statistical Science 5: 465–80 (with discussion).
Rosenzweig Mark R., Wolpin Kenneth I. Natural Natural Experiments in Economics. In: Journal of Economic Literature (2000) 38:827–74.[Web of Science]
Rubin Donald. Estimating causal effects of treatments in randomized and nonrandomized studies. Journal of Educational Psychology (1974) 66:688–701.[CrossRef][Web of Science]
———. Bayesian inference for causal effects: The role of randomization. In: The Annals of Statistics (1978) 6(1):34–58.[CrossRef]
———. Comment on randomization analysis of experimental data: The Fisher randomization test. In: Journal of the American Statistical Association (1980) 75:591–3.[CrossRef][Web of Science]
Weinstein Jeremy M. Inside rebellion: The politics of insurgent violence. (2007) New York: Cambridge University Press.
| ||||||||||||||||||||||||||||||||||||||||||||||||||
