Political Analysis Advance Access originally published online on July 20, 2005
Political Analysis 2005 13(4):430-446; doi:10.1093/pan/mpi024
| ||||||||||||||||||||||||||||||||||||||||||||||||||||
Empirical Strategies for Various Manifestations of Multilevel Data
Department of Political Science, University of Michigan, Ann Arbor, MI
e-mail: franzese{at}umich.edu
Equivalent separate-subsample (two-step) and pooled-sample (one-step) strategies exist for any multilevel-modeling task, but their relative practicality and efficacy depend on dataset dimensions and properties and researchers' goals. Separate-subsample strategies have difficulties incorporating cross-subsample information, often crucial in time-series cross-section or panel contexts (subsamples small and/or cross-subsample information great) but less relevant in pools of independently random surveys (subsamples large; cross-sample information small). Separate-subsample estimation also complicates retrieval of macro-level-effect estimates, although they remain obtainable and may not be substantively central. Pooled-sample estimation, conversely, struggles with stochastic specifications that differ across levels (e.g., stochastic linear interactions in binary dependent-variable models). Moreover, pooled-sample estimation that models coefficient variation in a theoretically reduced manner rather than allowing each subsample coefficient vector to differ arbitrarily can suffer misspecification ills insofar as this reduced specification is lacking. Often, though, these ills are limited to inefficiencies and standard-error inaccuracies that familiar efficient (e.g., feasible generalized least squares) or consistent-standard-error estimation strategies can satisfactorily redress.
| 1. Introduction |
|---|
|
|
|---|
Multilevel data are lower, micro-level data nested within higher, macro-level units. Political science examples include survey respondents nested within countries or states, elections nested within countries, time periods nested with nations or nation dyads, and many more. The levels may exceed two, such as survey respondents within elections within countries, voters within districts within countries, time periods within directed-dyads within dyads, etc. Level 1 or the micro-level is the lowest level or smallest unit of analysis; higher levels are Level 2, Level 3, etc., or macro-level(s). Common multilevel datasets in political science include cross-context surveys, which contributions to this volume analyze; panel (survey) data, containing repeated surveys of the same individuals; time-series cross-section (TSCS) datasets common in comparative/international politics and political economy, which typically nest time periods within countries; and datasets commonly used in international relations, which often nest time periods within dyads or directed dyads.1
In considering how to analyze such data empirically effectively for various goals, one must first recognize that typical dataset dimensions (i.e., numbers of micro- and macro-level observations) vary across dataset types and substantive contexts, as do plausible variance-covariance structures for variables, errors, and outcomes (i.e., systematic, stochastic, and total components). Practical and effective empirical-modeling strategies vary accordingly with these dimensions and properties and with researchers' goals and questions. Strategies that make great sense with large numbers of micro-level observations that vary independently across smaller numbers of contexts, as is common when pooling independently random surveys across countries, would not necessarily be as sensible when pooling observations related over time and across contexts, as is typical in international relations and political economy applications. There, with moderate numbers of both micro- and macro-level units and/or with observations related across contexts, alternative strategies become necessary or more practical or effective. In all cases, analysts will want to keep their methods as simple, powerful, and accurate as possible and to keep their own research questions and goals always central to their methodological choices, but what this implies practically may differ across research contexts.
In principle, two-step, separate-subsample estimation can achieve anything achievable in one-step, pooled-sample estimation and vice versa, but which strategy will prove more practical or effective depends on dataset dimensions and properties and on substantive contexts and goals. In general, two-step strategies have difficulties incorporating cross-subsample information, such as when variance-covariance or coefficient parameters may be equal, proportional, or otherwise related or when micro-level outcomes are interdependent, across contexts. Such information often exists and is sometimes substantively central in comparative and international politics and political economy; furthermore, such information is often indispensable there regardless of its substantive centrality given the practical limitations set by typical dataset dimensions. On the other hand, cross-subsample information is usually absent or small, and anyway less essential, in pools of large, independently random surveys, although ignoring some kinds of cross-sample dependence can induce biases.
Two-step estimation also tends to obscure and complicate, although certainly not to debar, the retrieval of estimates of the effects of macro-level variables as opposed to those of micro-level variables or of micro-level variables as conditioned by macro-level ones. The separate-sample, micro-level equations produce the micro-level effects directly, and how that effect depends on macro-level factors emerges directly from a second, macro-level estimation. To obtain the macro-level effect on the outcome, however, would require system-of-equations estimation in that second stage. Micro-behavioral researchers may be less interested in these macro-level effects, per se, but institutionalists and political economists would often find them equally central to their interests.
Pooled-sample estimation, conversely, produces all three effect estimates directly and renders leveraging of cross-sample information simple. However, maintaining efficiency, accurate standard-error estimation, and, in some cases, even unbiasedness and consistency can require additional care. One-step strategies become progressively more complicated as stochastic complexities like unit-specific covariances and stochastic interactive effects arise, and they have particular difficulties with stochastic-model specifications that differ across levels. For example, micro-level effects on binary outcomes being linear-interactively conditioned stochastically by macro-level factors would require nesting linear-normal likelihoods within binomial likelihoods: feasible, but not so simple. Moreover, pooled-sample estimation that models cross-subsample coefficient variation rather than allowing each subsample coefficient vector to differ arbitrarily can suffer various misspecification ills insofar as the theoretically reduced model of context conditionality is lacking. In many cases, though, these ills will be limited to inefficiency and standard-error inaccuracy, which can be satisfactorily redressed by familiar efficient (e.g., feasible generalized least squares, or FGLS) or consistent-standard-error estimation strategies.
Elaboration and discussion of these points unfolds thus. Section 2 introduces a generic multilevel model and the three generic substantive research questions that researchers use such models to evaluate empirically: effects of micro-level and macro-level characteristics and how each depends on the other. As argued more fully there, if "good" estimation (unbiased, consistent, efficient, plus simple and presentationally effective) of these effects and their variance-covariance (standard-errors)2 are the goals, then one wants separate-subsample estimates per se only to satisfy intrinsic interest in unique models for each subsample, for sensitivity analysis of whether restrictions implied by pooled and reduced models hold across subsamples, or, relatedly, if pooled-sample coefficient or standard-error estimation would lack "good" properties. Therefore, Section 3 begins by discussing the sample and theoretical/substantive conditions under which the simplest possible estimatorfull pooling with common parameters across sampleswould suffer no ills, and proceeds to complicate the true stochastic and systematic environment from there. The concluding Section 4 contrasts typical panel and TSCS data in comparative and international politics and political economy with that of cross-context compilations of surveys in comparative micro-behavioral research based on the preceding discussion and existing simulation studies of alternative estimation strategies.
| 2. A Generic Multilevel Model and Typical Multilevel Research Questions |
|---|
|
|
|---|
Multilevel or hierarchical models are any yij = f(Xij,
ij), i.e., any model in which outcomes, explanatory factors, and/or stochastic components occur at nested, micro and macro levels, i and j,3 but, of course, they become more interesting when some arguments vary only at macro levels, Zj, and others vary at micro levels, Xij, and especially when interactions occur across levels, XijZj, and/or when the stochastic properties of
ij pattern by level. Follow Bowers and Drake (2005) to consider this general expression of a hierarchical linear model (HLM), with one zj and xij:
![]() | (1) |
![]() | (2) |
![]() | (3) |
![]() | (4) |
![]() | (5) |
Equation (1) gives a bivariate linear-regression model of outcome, yij, as linear-additive function of explanator, xij, and additively separable stochastic component,
ij. Equation (2) adds complications that another explanator, zj, which varies only across and not within macro level, j, also affects yij and that it does so with some error, u0j, which also varies only across macro levels, j. At this point, we have a (trivariate) random-effects model with two explanators, xij and zj, and a compound error term, u0j +
ij, with u0j being the macro-unit-specific random effect. Equation (3) adds further complications that xij and zj interact in determining the outcome, yij, implying that the effect of xij depends on zj and, vice versa, that the effect of zj depends on xij, and that these conditioning effects, too, occur with macro-unit-specific error, u1j. At this point, as seen in Eq. (4), we have a generic HLM, which also happens to be a special, limited type of random-effects and random-coefficients model (as explained below). As seen in Eq. (5), however, the generic HLM is also quite similar to the familiar linear-interactive model (Franzese and Kam 2005; Brambor et al. 2005), with explanators xij, zj, and xijzj, except that the HLM possesses a compound error term, u1jxij + u0j +
ij, which complicates matters.
This similarity of HLM to simple linear-interaction models suggests our central question: Under what conditions will simple linear-interactive regression models reflecting theoretical propositions about micro, macro, and micro-macro-interactive effects suffice, and under what conditions will multiple stages of estimation or complicated HLM or other elaborate econometric strategies be preferable? As already outlined, the answer seems to depend on sample dimensions and properties, the availability and importance of cross-sample information, and the researcher's goals.
For concreteness, consider a comparative political economy and a comparative micro-behavioral example in which the outcomes, yij, are, respectively, fiscal policy stance (budget balance) at time i in country j and a feeling-thermometer score for a center-right party of respondent i in country j. The micro-level explanators, xij, could be, e.g., government partisanship at time i and the income level of respondent i in country j. The macro-level factors, zj, could be, respectively, the district magnitude of the (assumed time-invariant) electoral system and income inequality in j. A researcher hypothesizes, in case one, that policy makers elected in larger-district-magnitude systems weigh public goods and broad redistribution more heavily relative to narrowly targeted distribution (i.e., pork) than do those elected in smaller district-magnitude systems. Fiscal activism as gauged by budget balances will reflect redistributive and public good (Keynesian macro-management) efforts more than distributive ones, so larger district magnitudes, zj, should increase deficits (reduce yij). Left governments, too, xij, might have this effect, and especially when elected in systems of larger district magnitude; thus the interactive term xijzj also enters. A case two researcher expects poorer respondents, xij, to feel cooler toward center-right parties, especially as income is more unequally distributed in their country (thus the interactive term xijzj), and all respondents in more-unequal countries might feel less warmly toward center-right parties ceteris paribus compared to those in more equal countries (thus zj).
Generically, then, researchers seek to estimate three kinds of effects in multilevel models: effects on yij (deficits, thermometer) of micro-level factors (government partisanship or respondent income), xij, i.e.,
which may vary across macro-level contexts, j, depending on macro-level factors (district magnitude or inequality), zj; the effects of macro-level factors (magnitude or inequality), zj, on yij, i.e.,
which may vary across micro-level units, ij, depending on micro-level factors (partisanship or income), xij; and if and how the effects of these micro- and macro-level factors depend on the other variable, i.e.,
4 Mathematically in model (5), these effects (in expectation) are, respectively:
![]() | (6) |
![]() | (7) |
![]() | (8) |
Equations (6)(8) underscore a certain asymmetry in the random coefficients of the generic HLM to which we alluded above. Whereas micro-level effects, Eq. (6), vary stochastically across macro-level units, macro-level effects, Eq. (7), vary nonstochastically across micro-level units, and the interactive effect, Eq. (8), is constant. These features reflect an assumption that macro-, contextual-, or country-level error components arise in the constant and in the coefficient on xij, i.e., in Eqs. (2) and (3), but that such error components do not arise elsewhere in the model. Each observation i in unit j experiences the same realization of the macro-unit-specific errors, one additively, u0j, and one, u1j, in its coefficient on xij, ß1j. A fuller random-effects and coefficients model would decompose
01 further into a linear-additive function of xij and an error term, u2ij, and perhaps add a fourth error component, u3ij, to the interaction parameter,
11, as well. We will follow standard HLM practice, but perhaps the reader can infer the implications for the more general random-coefficients model by analogy to the discussion of the particular HLM form considered here (i.e., u0j in ß0j and u1j in ß1j only).
Notice, finally, that insofar as researchers aim to estimate Eqs. (6)(8), the effects of xij and of zj on yij, and how these depend on each othere.g., how partisanship and districting (inequality and income) interact to explain fiscal policy (party affinity)they are actually uninterested in ß0j and ß1j per se. That is, for these interests, unique estimates of intercepts, ß0j, and effects of x on y, ß1j, in each macro-unit j, are not the goal; the goal is to estimate the systematic or explicable aspects of effects of x and z and how each depends on each other, i.e.,
01,
10, and
11 (and perhaps the conditional mean,
00, too). Put differently, we do not seek a specific model for each j, but a model of the outcome, yij: not separate models of French, German, Japanese, and every other j's politics/political economy, not unique models of x-y relations for each election j, but a model of comparative politics/political economy or of electoral politics. If ß0j and ß1j vary across contexts j, comparative researchers seek to model, understand, and explain this interesting phenomenon by variations in contextual factors, zj.
Interest in estimating ß0j and ß1j directly, therefore, arises only (a) to satisfy any intrinsic interest in unique models for each subsample, (b) for sensitivity analysis of whether restrictions implied by a pooled and reduced model seem to hold across subsamples, or (c), relatedly, if one-step reduced-form coefficient or standard-error estimation lacks good properties. Apart from inherent curiosity in specific models for each j, researchers will want to estimate j context-unique models only if going directly to theoretically reduced models might mislead. That is, variation in effects across contexts is to be explained, not merely described, so what remains arbitrarily (i.e., inexplicably) variant across j matters only insofar as giving it insufficient attention would harm coefficient or variance-covariance estimates of the theoretically interesting model. Accordingly, the next section considers estimation strategies for multilevel data from the perspective of asking under what conditions might one wish to estimate anything other than Eq. (5) directly in one simple step: pooled linear-interactive ordinary least squares (OLS).
| 3. Estimation Strategies |
|---|
|
|
|---|
3.1 Fully Pooled, Context-Unconditional OLS
The first, and simplest, possible strategy would be to estimate Eq. (1), i.e., to regress yij on xij, by fully pooled OLS:
![]() | (9) |
01 = u0j =
11 = u1j = 0 and V(
ij) =
2. That is, fully pooled OLS with just the micro-level regressor is optimal if and only if the effect of x on y is constant and nonstochastic across contexts (i.e.,
01 =
11 = 0, so Zj does not matter, and u0j = u1j = 0) and outcomes are homoskedastic across and within contexts (i.e., constant variance and no correlation:
Obviously, this is the least interesting, and usually quite implausible, case. Notice, however, that if Zj truly does not matter (
01 =
11 = 0), then even if u0j
0 or u1j
0, i.e., even if macro-unit specific error components (random effects/coefficients) exist or if the stochastic component is nonspherical (heteroskedastic/correlated errors), then this starkest fully pooled OLS would nonetheless produce unbiased and consistent coefficient estimates provided E(u0j,u0j,
ij|X) = 0 (i.e., the usual Gauss-Markov requirement that regressors and residuals not covary):
![]() | (10) |
OLS is inefficient and OLS standard errors are incorrect because OLS ignores the nonconstant error variance and correlation that the macro-unit-specific error components induce:6
![]() | (11) |
ij, is spherical (constant variance and uncorrelated), the term in square brackets (i.e., variance of yij) will not reduce to
2I because macro-unit-specific error components, u0j and u1j, differ across j and because u1j multiplies a variable, xij, and so varies. Thus Eq. (11) does not reduce to the OLS standard-error formula,
2(X'X)1. To enhance efficiency and obtain accurate standard errors by FGLS, however, one need only estimate the induced heteroskedasticity. As seen from the term in square brackets, one could do this simply by regressing squared estimated residuals on macro-unit indicators and those indicators times x2, plus whatever patterns one expects in
ij.7 As Beck and Katz (1995, 1996) famously showed, however, whether FGLS enhances efficiency and improves standard-error estimation truly, and not merely apparently, depends on how many parameters (relative to observations) one must estimate in this step, which degrees-of-freedom consumption FGLS will ignore in its next step.8
Rendering OLS standard-error estimates consistent ("robust") to the induced heteroskedasticity is even simpler. As Eq. (11) shows, as always, OLS standard errors are inaccurate only insofar as the expression between the (X'X)1 terms differs from (X'X) times a constant (
2). That is, they are unbiased (and consistent) if the term in square brackets, the nonsphericity of the error-variance, is unrelated on average (and in the limit) to the x's, x2's, and x cross products contained in the X' and X pre- and post-multiplying that term. As the first term in square brackets reveals, random effects do generally imply biased and inconsistent OLS standard errors because the induced nonsphericity is related to x. The macro-level-specific error components plausible in multilevel data, being random effects, have this implication, and also a nonconstant variance by unit, or clustering, as seen in that first term and the second term. If the second term correlates with the x's, x2's, or x cross products, then it adds further bias to OLS standard-error estimates; otherwise they induce "only" further inefficiency.
To render the standard-error estimates consistent, then, one must use a formula that retains the X'[·]X expression in a form capturing the clustering pattern and/or heteroskedasticity. For example, for simple random effects/coefficients without clustering (i.e., where the error components in Eqs. (2) and (3) are not macro-unit-specific and shared across micro units within cluster but specific to each observation), White's familiar heteroskedasticity-consistent standard errors will suffice:
![]() | (12) |
For the clustering heteroskedasticity induced by the error structure expected in multilevel data, a consistent standard-error estimate must account the common error components within macro units:
![]() | (13) |
3.2 Fully-Pooled, Linear-Interactive OLS
Of course, if Zj matters, i.e., if
01
0 or
11
0 such that outcomes exhibit systematic and/or stochastic variation across contexts, which, after all, is what interested us in multilevel models ab initio, then estimating context-conditional Eq. (5) (with k = 4) by context-unconditional Eq. (9) (with k = 2) suffers omitted-variable bias. Researchers interested in context conditionality, which necessarily includes multilevel modelers, must include zj and/or xijzj.
Consider, then, estimating the full model of Eq. (5) by OLS:
![]() | (14) |
ij) =
2. That is, as usual, OLS requires a spherical stochastic component, here the compound
ij, for efficiency and accurate standard-error estimation, and it requires this residual to be uncorrelated with regressors for unbiasedness and consistency. Again, provided macro-unit-specific components, u0j and u1j, which can now represent the portion of cross-contextual variation not or insufficiently modeled theoretically by zj and xijzj, are uncorrelated with the regressors,13 OLS coefficient estimates remain unbiased and consistent, although inefficient, but its reported standard errors are inaccurate. Again, the inefficiency and standard-error inaccuracy arise even if u0j and u1j are spherical:
![]() | (15) |
01zj and
11xijzj terms are the only differences from Eq. (11). Being non stochastic, they vanish from the last line, so the upshot is identical. Again, if desired, one enhances efficiency via FGLS by the same mechanics as before, and heteroskedasticity-consistent Eq. (12) or cluster-consistent Eq. (13) will render standard-errors consistent to, respectively, pure random-effects/coefficients or clustered stochastic components, with the same small-sample concerns as before. Therefore, if researchers aim to estimate the effects of micro- and macro-level factors and their interactions (e.g., interactive effects of partisanship and magnitude on fiscal policy or of income and inequality on party affinity), little argument has yet arisen against estimating pooled OLS models specified to reflect those interactive propositions. OLS offers unbiased and consistent, although inefficient, coefficient estimates. If sample degrees of freedom are favorable, FGLS could enhance efficiency and is straightforward to implement. OLS, or FGLS that incompletely models the induced nonsphericity, yields biased and inconsistent standard errors, but appropriate robust estimators easily render these heteroskedasticity or cluster consistent. Under what conditions, then, would one consider alternative separate-subsample or HLM estimation strategies?
3.3 Separate-Subsample vs. Dummy-Interaction Estimation
Jusko and Shively (in this issue) offer several arguments that may favor a two-step strategy. First, intrinsic interest in distinct models for each macro-level context dictates that researchers actually do want estimates of ß0j and ß1j per se as well as merely en route to estimates of micro-macro interactive effects,
11 (and possibly macro-level coefficients,
01, also). Regressing yij on xij separately in each of the j subsamples will produce such estimates, but so too would regressing all yij on the complete set of j macro-level indicators and interactions of each of those with xij. Indeed, as is well known, either procedure, separate-subsample or call it dummy-interaction, produces mathematically identical estimates of ß0j and ß1j. Thus they share the same bias, consistency, and efficiency properties and so would serve equally well (for their part: standard errors differ as seen below) in any subsequent analysis relating them to contextual factors, zj. Likewise, either procedure equally easily accommodates macro-unit-specific regressors such as, say, respondent ethnic-group indicators for ethnicities that do not exist in all j.14 In terms of these ß0j and ß1j coefficient estimates alone,15 then, whether to dummy-interact and pool or estimate in separate subsamples is wholly irrelevant or purely a matter of practical implementation ease.16
Standard errors, however, will differ by these procedures. First, notice from Eq. (5) that either option, by allowing ß0j and ß1j to vary arbitrarily across j, assures that the macro-unit-specific shocks will be identically zero: u0j = u1j = 0. Accordingly, the sole stochastic term in either case is
ij. To this, pooled OLS applies
with
across the full sample, whereas separate-subsample OLS applies
with
separately in each j. Given the block diagonality of (X'X) in the dummy-interaction model, the jth block of (X'X)1 in
exactly equals the (X'jXj)1 in
so only the s2 vs.
differ in the standard-error estimates. The former assumes constant variance and no correlation across all j, but the latter only within each j, leaving unspecified any cross-subsample heteroskedasticity or correlation. Because
and X and y are identical in either procedure, the eij are also identical. Given that, and with n =
nj and k =
kj, s2 is the average across j of
(i.e.,
). If the residuals are truly homoskedastic across contexts j, then the s2 from pooling and so
is efficiently constant and
and
inefficiently vary across contexts. Recapturing this efficiency by constraining
in separate-subsample estimation would be extremely difficult, likely requiring some recursive estimation strategy. Unequal variance across j is perhaps more plausible, though. If so, then
and
from the unit-by-unit OLS are correctly variant, and s2 and
from pooling incorrectly constant, i.e., biased and inconsistent (although right on average across j). Once again, redressing this bias/inconsistency requires only simple application of robust standard-errors (plain heteroskedasticity-consistent Eq. [12] should suffice here) and/or FGLS (j indicators suffice for regressors in the auxiliary regression).
Finally,
may have nonzero off-block-diagonal elements, meaning residuals exhibit some cross-subsample correlation. In our substantive examples, time periods of inexplicably large/small deficits in some countries may correlate with deficits elsewhere, or inexplicably warm/cold feelings toward right parties in some countries may correlate with feelings toward others elsewhere. If so, either separate-subsample or pooled OLS will produce inefficient coefficient estimates and biased, inconsistent, and inefficient standard-error estimates. Intuitively: some cross-subsample information exists that either OLS procedure ignores. Again, attempting to incorporate such cross-subsample information (correlation) to enhance efficiency or adjust standard errors would require some difficult recursive-iteration strategy in true separate-subsample estimation, although a related strategy like seemingly unrelated regression (SUR) might suffice. In pooled samples, incorporating correlation information to enhance efficiency and improve standard-error estimation is just another application of FGLS (e.g., Parks procedure if degrees of freedom suffice, or some more-limited parameterization thereof if not) and/or consistent standard errors. Beck and Katz (1995, 1996) panel-corrected standard errors (PCSE) would suffice to render standard errors consistent to one common form of cross-subsample correlation (contemporaneous correlation in a TSCS context).17
Thus, for estimating ß0j and ß1j per se and their standard errors given some intrinsic interest in such context-unique models of politics, little distinguishes pooled dummy-interaction from separate-subsample strategies. Just the assumptions about V(
) might be distinct. (In models with stochastic components fully determined by mean parameters, like binary or count models, not even this differs. There, likelihood maximization yielding identical
and
by either strategy implies that it also produces identical standard-error estimates.18) Pooled OLS does impose likely implausible variance restrictions, but simple FGLS or robust standard-error estimators will adequately redress this and, indeed, can be specified to reproduce separate-subsample assumptions, and so estimates, exactly. Cross-subsample information, on the other hand, is inherently difficult for two-step strategies to accommodate, whereas it is simpler (some standard FGLS or robust procedure will usually apply) in pooled-sample estimation. Therefore, absent cross-subsample information, two-step strategies might be at best marginally easier, not requiring FGLS weighting or robust standard errors, whereas with such information, pooled estimation is unambiguously much easier. However, for purposes of intrinsic interest in obtaining j unique models, the choice is mostly (in models where means and variances are codetermined, like binary or count: purely) one of personal taste or software facilities.
For purposes of estimating the effects on the outcome, yij, of micro- and macro-level factors, xij and zj, and their interaction, xijzj, however, either separate-subsample and pooled-dummy-interaction estimations are mere preliminaries. The researcher seeks estimates of Eqs. (6)(8), i.e., the effects on outcomes like fiscal policies or party affinity of micro-level factors like partisanship or income,
of macro-level factors like district magnitude or inequality,
and if and how effects of these micro- and macro-level factors depend on the other variable,
Neither of these strategies yields direct estimates of any of the parameters of interest (
00,
01,
10, and
11), though. These are obtained instead in second-stage estimations wherein the
from the first stage are regressed on zj as shown in the Jusko and Shively (2005) and Lewis and Linzer (2005) contributions to this issue and applied in most others. For these more theoretical purposes, except for the minor differences discussed above, full-dummy-interaction and separate-subsample estimation strategies are identical; they each provide estimates, equally good ones usually, of
and
) to a second stage estimating the actually desired
3.4 One-Step vs. Two-Step Estimation
What are the trade-offs, then, between two-step strategy,
![]() | (16) |
ij uncorrelated with regressors and V(
) spherical. We have already shown pooled estimation of Eq. (14) unbiased and consistent for
if E(u0j,u1j|xij,zj) = 0, but that inefficiency and incorrect standard errors arise even if V(u0j) and V(u1j) are spherical. We also saw simple FGLS and consistent-standard-error strategies for redressing these shortcomings. Jusko and Shively (2005) show, conversely, that two-step estimation of Eq. (16) can produce unbiased and consistent coefficient estimates and standard errors without efficiency costs relative to pooled estimation under broad circumstances, the most important of which is the absence of cross-subsample information as mentioned above and elaborated below.
One aspect of the trades between the one- and two-step approaches is that, of the three effects of interest to multilevel modelers, only the micro-level effects,
and the dependence of the micro- and macro-level effects on each other,
emerge directly from a single-equation estimation in the second to the two-stage steps. Those two parameter estimates emerge by regressing
on (a constant and) zj, weighted by the estimated variance of
(Jusko and Shively 2005; Lewis and Linzer 2005). To obtain macro-level effects,
however, one must estimate the last two expressions in Eq. (16). Furthermore, because
and
necessarily correlate, one must regress
and
on (constants and) zj in a simultaneous system of equations, weighted by the estimated variance-covariance of those two coefficients. The one-step estimation, contrarily, produces all three parameters of interest (plus the conditional mean) directly in its one stage.
Another aspect of the trades, however, is that, if the contextual factor(s), zj, leave some macro-unit-specific variation unaccounted, the one-step estimator must rely on the orthogonality of those unaccounted u0j and u1j for the unbiasedness and consistency not only of the
estimates, but also for the unbiasedness of coefficients on other explanators. The two-step estimator also relies for the unbiasedness and consistency of its
estimates on this orthogonality; without that, the
and
estimates from the first step will contain stochastic components related to zj, which will bias the second-stage estimates of
exactly as in the one-step estimation. However, because the arbitrary cross-context variation
and
will absorb any macro-unit-specific variation, including that unexplained by zj, the first step of the two-step estimator produces estimates of other coefficients in that first stage that will not be biased by the failure of zj to account all cross-contextual variation.19
A third aspect of the trade-offs is the possibly relatively heavier reliance of one-step estimators on large samples for accurate standard-error estimation (and efficiency) when zj only partially models cross-contextual variation. FGLS, e.g., relies on estimating few variance-covariance parameters relative to degrees of freedom for its optimum efficiency and standard-error-estimation properties. Consistent variance-covariance estimators rely on asymptotics by definition. Pure heteroskedasticity-consistent robust standard errors, Eq. (12), such as those appropriate in nonclustered random-effects conditions or in the full-dummy-interaction case, seem to function reasonably well even in fairly small samples. Their performance, as crudely gauged by suggested small-sample corrections to the base estimator, should be a decreasing function of N/(N k), such that even N = 55, k = 5 would yield only about 10% overconfidence. Existing simulations support this intuition.
Clustered-heteroskedasticity-consistent standard errors, however, have asymptotics in both i and j dimensions, their performance by such gauges being a decreasing function of [J/(J 1)][N/(N k)]. As this would suggest, J = 50, nj = 100 seems comfortable in a linear regression model, judging by Franzese and Kam's (2005) simulations. Eduardo Leoni (2005) explored clustered standard errors for logit, however, finding hypothesis-test rejection rates about 2.5 times too high (.13 for .05) for J = 24 and coverage rates suggesting 17% overconfidence. Small-sample adjustments for maximum likelihood are [J/(J 1)], which suggests only about 5% overconfidence, so clustered-standard-error strategies may lean even more heavily on large J in contexts beyond linear regression. However, Leoni's simulations also showed that a robust-cluster estimator applying small-sample corrections performed remarkably well. Coverage rates for that adjusted clustered-standard-error estimator were 9%, 5%, 1%, and 3% too large in samples of J = 10, 15, 20, and 40 and just 1% and 2% too small in samples of J = 30 and 500. We need more simulations to understand the small-sample properties of the consistent-standard-error estimates often needed for single-stage estimation of multilevel models, and of which small-sample corrections work best under what conditions, but these results are very encouraging. Two-stage estimators, of course, also rely on large samples for small standard errors, but standard-error accuracy may lean less heavily on large macro-unit samples, depending on how few parameters the FGLS (or consistent-standard-error) estimation(s) in their second steps require.20
Penultimately, and most favorably for two-step approaches, are the implications of incomplete contextual-variation modeling by zj in cases in which stochastic and systematic components are not additively separable. If, for example, the multilevel model is of the following probit form,
![]() | (17) |
estimates from its first step probits that serve as dependent variables in the second step will be (asymptotically) normally distributed, and so ideal for linear regression, just by virtue of having emerged from the first-stage maximum-likelihood estimation.21
Finally, and least favorably for two-step estimation, cross-subsample information, as indicated already, is usually difficult for two-step estimators to incorporate but simple for one-step estimators. If, for example, we know that some coefficients are equal across macro-level subsamples or equal across micro-level units, or if we wish to constrain them to be so to enhance efficiency believing it "not far from true," we can do so in one-step estimation simply by including that variable in the model and not interacting it with xij or zj, respectively. In separate-subsample estimation, allowing some macro-level factors to have equal effect on all micro units ij is almost as easy; one simply includes these zj terms only in the
and not the
second-stage regression. Constraining some micro-level factors to have equal, proportional, or otherwise related coefficients across contexts j, though, is far harder. As with cross-context residual variance or correlation discussed above, two-step estimation requires some iterated strategy to accommodate such cross-sample information. Furthermore, some sorts of cross-sample information, such as systematic interdependence across j in the outcomes yij, which our cross-national fiscal-policy example would likely exhibit, e.g., will not only render separate-subsample standard-error estimates biased and coefficients inefficient but also will bias coefficient estimates. As Franzese and Hays (2005) show regarding interdependence in economic policy making, for example, estimating, say, French or Ohioan fiscal-policy models ignoring the feedback from, say, German and Michiganian fiscal policy will produced biased models of French or Ohioan fiscal policy.22
To summarize the comparison of separate- and pooled-sample estimation strategies, then:
- (a) In principle, what one can do in separate subsamples in two steps one can also do in one step with interactions (etc.), and vice versa, but some things are easier one way or the other.
- (b) Separate-subsample and full dummy interactions produce identical coefficient estimates and standard errors that, at most, rectifiably differ; with nonseparable error components, the two are exactly identical. Either can handle context-specific regressors equally easily.
- (c) Separate-subsample coefficient estimates are at least inefficient (biased also in some cases) and standard errors inaccurate relative to appropriate pooled estimators if cross-subsample information exists: homoskedasticity (
), correlation (V({
ij}) not block diagonal), or coefficients relate across j. Leveraging such cross-equation information is usually difficult in two-step estimation.
- (d) Two-step estimation of macro-level effects requires second-stage systems of equations.
- (e) Two-step estimation yields coefficient estimates for micro-level variables that are more robust to misspecification of macro-level effects than one-step linear-interaction models.
- (f) One-step linear-interaction models may rely more heavily on large samples, especially in macro-unit dimensions, for standard-error accuracy and efficiency enhancement.
- (g) One-step linear-interaction models struggle to incorporate the stochastic interactions that are plausible in multilevel models into outcomes whose systematic and stochastic components are nonseparable. Two-step estimation actually facilitates this.
- (b) Separate-subsample and full dummy interactions produce identical coefficient estimates and standard errors that, at most, rectifiably differ; with nonseparable error components, the two are exactly identical. Either can handle context-specific regressors equally easily.
| 4. Comparative Time-Series Cross Sections, Comparative Panels, and Comparative Surveys |
|---|
|
|
|---|
In conclusion, consider, in light of the preceding discussion, the typical properties of panel and TSCS data in comparative and international politics and political economy and those of panel and cross-context survey compilations in comparative microbehavioral research. The key issues, as noted at the outset and seen throughout the article are the dimensions, the likely systematic and stochastic properties, and the research questions in the sample and substantive area.
In comparative micro-behavioral research, such as exemplified throughout this issue, datasets are commonly large (hundreds to thousands of observations), independently randomized surveys of individuals pooled across a few countries (10 to 20 or, more rarely, 30) or subnational political units (perhaps 50). Being so large within each macro level, i.e., having such large I (plus independent i), efficiency at micro levels is unlikely to be of much concern. Moreover, because the surveys are being independently randomized, one expects quite limited, if any, cross-sample information. At most, individual opinions (e.g., party affinities) in France, in their aggregate, might affect individual opinions in Germany. In principle, any estimator that ignores such cross-subsample information, if it exists, will suffer bias and inconsistency, as well as inefficiency (Franzese and Hays 2005). In this context, however, such information is unlikely to be sizeable23 if it exists at all. Moreover, this (small-to-zero) dependence of each respondent in one j on the aggregate of respondents in each of the other j would be absorbed in the macro-level-specific constants, which, while problematic for second-stage estimates, should not therefore hamper first-stage estimates. Furthermore, micro-behavioral research likely stresses micro-level effects,
and their context-conditioning,
much more than macro-level effects,
Finally, outcomes are far more often qualitative than linear-continuous. Thus, referring back to our summary comparison of separate-sample versus pooled-interactive strategies, we see that, along every consideration, typical conditions in cross-context pooled and independently randomized survey analysis in comparative micro-behavioral research are ideal for two-step strategies (with the first step being equally well served by full-dummy-interaction or separate-subsample estimation).24 Insofar as Leoni's simulations apply, pooled-interaction with clustered-heteroskedasticity strategies could also work in these conditions if small-sample adjustments are applied.25
In typical survey panel data, in which large numbers (again, hundreds to thousands) of the same respondents are observed small numbers (usually fewer than ten, very rarely perhaps twenty or thirty) of times, conversely, all the considerations weigh oppositely. The very small number of micro-level observations will often render separate subsample estimation literally impossible (negative degrees of freedom), and, even where possible, capitalizing upon cross-subsample information will be indispensable to reasonable efficiency. Cross-subsample (here, cross-respondent) information is hopefully sizeable because panel-data analysts must lean on it heavily. Notice, however, that here cross-subsample information is, substantively, precisely the same cross-respondent information that is all that exists at the micro-level in randomized survey data. That is, assuming an effect is constant across subsamples in the panel-survey-data case is the same as estimating that one effect at all in nonpanel survey data. Furthermore, the very large number of macro-level units implies that consistent standard-error strategies (or FGLS in linear regression) will work quite well.26
Finally, in the TSCS data typical of comparative and international politics and political economy, observations are usually of political (national or subnational) units at the macro level, over time at the micro level. Both micro and macro levels tend to have intermediate numbers of observations. In comparative/international political economy, developed or developing country samples typically have 1535 macro-level units; global samples might triple or quadruple this; micro-level units (time, usually years) typically vary from 10 to 40. International relations contexts might have global samples of countries, dyads, or directed dyads and widely varying years (from 10 to over 100); sometimes macro levels are confined to relevant or great-power cases, leaving J as few or fewer countries or (directed) dyads in the 1530 range or lower. Therefore, leveraging cross-subsample information in such contexts, as with panel surveys, is at least extremely useful, usually crucial, and occasionally indispensable. As a matter of substance as well as necessity, moreover, observations are highly likely to be related across macro-level contexts. Indeed, in many political economy and international relations contexts (e.g., globalization and strategic relations), cross-unit interdependence is substantively central. Finally, macro-level effects,
are usually at least as central as micro-level effects,
and their context conditioning,
in these research areas. In such conditions, one-step estimation strategies seem the better option. They may face some challenges in accurate standard-error estimation, but these seem surmountable with small-sample-adjusted consistent-estimator or FGLS strategies, especially since dependent variables are more often linear continuous. Unbiased coefficient estimation seems on stronger ground for the same reason. Two-step procedures, on the other hand, are unlikely to prove effective or practical (or even, in some cases, possible).27
| Notes |
|---|
|
|
|---|
Author's note: Gratitude to the contributors to this issue for helpful discussion of some of the issues addressed here and, especially, to the editors of this issue for extremely kind and constructive comments on this manuscript.
1 Concrete examples of each include, respectively, Comparative Study of Electoral Systems, National Election Studies, and World Values Surveys; NES panel studies; IMF, World Bank, or OECD datasets; Comparative Manifestos, which pools parties' election manifestos by election across several democracies; and political-institutional datasets like Polity or Freedom House; and the Correlates of War, Militarized Interstate Disputes datasets. ![]()
2 The bias, consistency, and efficiency of standard errors refer to their properties relative to the true standard deviation of parameter estimates across repeated samples under the model assumptions. ![]()
3 More generally, the number of levels can exceed two, but we will consider only two to keep discussion simple. ![]()
4 Note that z modifies the effect of x on y identically to how x modifies the effect of z on y; these statements are logically (and so mathematically) identical. Likewise, x and z each has only additive effect on y if the interaction does not exist. ![]()
5 More precisely, the following discussion applies to models with additively separable stochastic components, such as linear-regression models, but not necessarily models with nonseparable stochastic components, such as logit or probit. ![]()
6 The step from the second to the third line of Eq. (11) assumes that the error components are uncorrelated. If correlated, this and the next expression would include (two times) the additional covariance terms. Furthermore, the assumption that all error components are uncorrelated with the regressors must be maintained here as in all other estimators considered. ![]()
7 This auxiliary regression should include xij also if the macro-unit-specific shocks are correlated, and it should include all xij involved in stochastic interactions in the case of multiple such interactions. ![]()
8 Here, extra parameters in FGLS are few (see preceding text and note 7), so the issue is likely small. ![]()
9 Davidson and MacKinnon (1993, p. 554) strongly suggest a finite-sample correction of replacing
by
which scales estimated squared residuals by their variance, or of multiplying Eq. (12) by N/(N k), which inflates estimates by a factor reflecting the number of regressors as a percentage of degrees of freedom. Accumulating simulation work favors their suggestion. ![]()
10 In Stata, obtaining these estimates is as simple as typing ", robust" or ", r" at the end of a line. ![]()
11 As with pure heteroskedasticity (see note 9), a finite-sample (degrees-of-freedom) correction, [nc/(nc 1)][(N 1)/(N k)], is suggested. This inflates standard errors as there, but now multiplicatively further, by a declining function of J. Again, simulations strongly support using such adjustments. ![]()
12 As before, obtaining these estimates in Stata is simple; one types ", cluster(J-indicator-name)" at the end of a line. ![]()
13 The final stochastic component,
, must likewise not correlate with regressors, but we will assume so henceforth. Note also that lack of correlation of u0j and u1j with regressors is assumed in all the estimation strategies discussed here. ![]()
14 For example, to include a Basque indicator in Spain only, simply enter it only in that regression by the two-step procedure and include the Basque indicator, which is strictly zero outside Spain, in the whole sample for the pooled procedure. ![]()
15 Whether in one-step pools or two-step separate subsamples, what controls to apply and, more generally, ensuring that ß0j and ß1j estimate the same substantive/theoretical quantities across all j, is paramount. However, it is equally so in any strategy, and no strategy has any means of ensuring this theoretical issue empirically, so this issue cannot serve to evaluate alternative estimation strategies. ![]()
16 For example, some software facilitates indicator and indicator-interaction generation; others facilitate sample restrictions on repetitions of the same estimation command. ![]()
17 In Stata, one uses the xtpcse command to obtain these. ![]()
18 Intuitively: the same coefficient estimates emerge from the same (parts of) likelihood functions being maximized at the same points. Since standard errors are curvatures of those likelihood functions at those points, they are also identical. ![]()
19 This is essentially the standard argument for fixed effects, sharing its strengths and weaknesses. ![]()
20 One-step strategies only possibly lean more, and two-step may lean less, heavily on large samples, but this has not been demonstrated. Since, in principle, equivalent one- and two-step strategies exist for any empirical task and data properties, I rather suspect that, in fact, any differences are illusory. ![]()
21 Again, they only "could be" ugly and unpleasant because these are tastes and, anyway, have not been shown, and, again, I suspect otherwise (see note 20). Because one- and two-step strategies are essentially alternative orderings of the same tasks, I suspect these likelihoods will, in fact, be well behaved and easily searchable by the same logic that proves ML estimates asymptotically normal. ![]()
22 The bias here is textbook omitted-variable bias: German policy (inter alia) causes French policy and likely correlates with other causes of French policy; therefore, estimating French policy models ignoring German ones is biased. Correctly estimating such interdependence models is greatly challenging in its own right, but the point here is simply that ignoring this sort of cross-sample information creates greater problems than mere inefficiency and fixable standard-error inaccuracies. ![]()
23 Each element of each off-diagonal block in the overall variance-covariance matrix would reflect the covariance of one (random) respondent in the row block with one (random) respondent of the column block and so would be some very small number (the same very small number for each element in that block). ![]()
24 Duch and Stevenson (2005) is a partial exception. Surveys across elections within countries are three levels. Cross-election-within-country information (intermediate level) is likely much greater, recommending extra efforts in this direction. ![]()
25 Shrinkage estimators like HLM, conversely, are unlikely to shrink much from the within estimates given very large I and much smaller J, so they would tend to serve little practical purpose here (Beck and Katz 2005; see the appendix to this article on the Political Analysis Web site). ![]()
26 Here, shrinkage estimators likely shrink separate-subsample estimates almost fully to the between estimator, since J is large and I so small; again, shrinkage estimators may serve little practical purpose except where the I (time) dimension extends sufficiently to allow reasonable subsample estimates (see the appendix on the Political Analysis Web site). ![]()
27 Here, finally, shrinkage estimators may serve a practical purpose more often as, with both J and I intermediately sized, estimates may differ meaningfully from both within- and between-estimator extremes (see the appendix on the Political Analysis Web site). ![]()
| References |
|---|
|
|
|---|
-
Beck, N., and J. Katz. 1995. "What To Do (and Not to Do) with Time-Series-Cross-Section Data in Comparative Politics." American Political Science Review 89(3):634647.[CrossRef][Web of Science]
Beck, N., and J. Katz. 1996. "Nuisance or Substance: Specifying and Estimating Time-Series-Cross-Section Models." Political Analysis 6:136.
Beck, N., and J. Katz. 2005. "Random Coefficient Models for Time-Series-Cross-Section Data." Presented at the 2001 meetings of the Political Methodology Organization Section of the American Political Science Association.
Bowers, J., and K. Drake. 2005. "EDA for HLM: Visualization When Probabilistic Inference Fails." Political Analysis doi:10.1093/pan/mpi031.
Brambor, T., W. R. Clark, and M. Golder. 2005. "Understanding Interaction Models: Improving Empirical Analyses." Political Analysis doi:10.1093/pan/mpi014.
Davidson, R., and J. MacKinnon. 1993. Estimation and Inference in Econometrics. New York: Oxford University Press.
Franzese, R., and C. Kam. 2005. Modeling and Interpreting Interactive Hypotheses in Regression Analysis: A Refresher and Some Practical Advice. Unpublished manuscript. (Available at www.personal.umich.edu/
franzese/Interactions_Michigan.030305.pdf.)
Franzese, R., and J. Hays. 2005. Spatial Econometric Models for Political Science. (Available at www.personal.umich.edu/
franzese/FranzeseHays.SpatialEcon.Book.pdf.)
Greene, W. H. 2003. Econometric Analysis. Upper Saddle River, NJ: Pearson Education, Inc.
Jusko, K. L., and P. Shively. 2005. "A Two-Step Strategy for the Analysis of Cross-National Public Opinion Data." Political Analysis doi:10.1093/pan/mpi030.
Leoni, E. 2005. "How to Analyze Multi-Country Survey Data: Results from Monte Carlo Experiments." Paper presented at the 2005 Midwest Political Science Association Conference.
Lewis, J., and D. Linzer. 2005. "Estimating Regression Models in which the Dependent Variable Is Based on Estimates." Political Analysis doi:10.1093/pan/mpi026.
This article has been cited by other articles:
![]() |
M. Nelson An Application of the Estimated Dependent Variable Approach: Trade Union Members' Support for Active Labor Market Policies and Insider-Outsider Politics Int. J. Public Opin. Res., June 1, 2009; 21(2): 224 - 234. [Full Text] [PDF] |
||||
![]() |
J. Adams, A. B. Haupt, and H. Stoll What Moves Parties?: The Role of Public Opinion and Global Economic Conditions in Western Europe Comparative Political Studies, May 1, 2009; 42(5): 611 - 639. [Abstract] [PDF] |
||||
![]() |
C. J. Anderson and M. M. Singer The Sensitive Left and the Impervious Right: Multilevel Models and the Politics of Inequality, Ideology, and Legitimacy in Europe Comparative Political Studies, April 1, 2008; 41(4-5): 564 - 599. [Abstract] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||


















