Skip Navigation



Political Analysis Advance Access published online on August 13, 2007

Political Analysis, doi:10.1093/pan/mpm017
This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (PDF) Freely available
Right arrow All Versions of this Article:
16/1/41    most recent
mpm017v1
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Imai, K.
Right arrow Articles by Strauss, A.
Right arrow Search for Related Content
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

© The Author 2007. Published by Oxford University Press on behalf of the Society for Political Methodology. All rights reserved. For Permissions, please email: journals.permissions@oxfordjournals.org

Bayesian and Likelihood Inference for 2 x 2 Ecological Tables: An Incomplete-Data Approach

Kosuke Imai

Department of Politics, Princeton University, Princeton, NJ 08544

Ying Lu

Department of Sociology, University of Colorado at Boulder, Boulder, CO 80309 e-mail: ying.lu{at}colorado.edu

Aaron Strauss

Department of Politics, Princeton University, Princeton, NJ 08544 e-mail: abstraus{at}princeton.edu

e-mail: kimai{at}princeton.edu (corresponding author)

Ecological inference is a statistical problem where aggregate-level data are used to make inferences about individual-level behavior. In this article, we conduct a theoretical and empirical study of Bayesian and likelihood inference for 2 x 2 ecological tables by applying the general statistical framework of incomplete data. We first show that the ecological inference problem can be decomposed into three factors: distributional effects, which address the possible misspecification of parametric modeling assumptions about the unknown distribution of missing data; contextual effects, which represent the possible correlation between missing data and observed variables; and aggregation effects, which are directly related to the loss of information caused by data aggregation. We then examine how these three factors affect inference and offer new statistical methods to address each of them. To deal with distributional effects, we propose a nonparametric Bayesian model based on a Dirichlet process prior, which relaxes common parametric assumptions. We also identify the statistical adjustments necessary to account for contextual effects. Finally, although little can be done to cope with aggregation effects, we offer a method to quantify the magnitude of such effects in order to formally assess its severity. We use simulated and real data sets to empirically investigate the consequences of these three factors and to evaluate the performance of our proposed methods. C code, along with an easy-to-use R interface, is publicly available for implementing our proposed methods (Imai, Lu, and Strauss, forthcoming).


    1. Introduction
 Top
 1. Introduction
 2. Theoretical Framework for...
 3. A Nonparametric Model...
 4. Quantifying the Aggregation...
 5. Simulation Studies and...
 6. Concluding Remarks
 Funding
 Appendices: Computational...
 Notes
 References
 
Ecological inference is a statistical problem where aggregate-level data are used to make inferences about individual-level behavior. Although it was first studied by sociologists in the 1950s (Robinson 1950; Duncan and Davis 1953; Goodman 1953), recent years have witnessed resurgent interest in ecological inference among political methodologists and statisticians (see, e.g., Achen and Shively 1995; King 1997; King, Rosen, and Tanner 2004; Wakefield 2004a, and references therein). Much of the existing research, however, has focused on the development of new parametric models and the criticism of existing models and has generated numerous debates over the appropriateness of proposed methods and their use (see, e.g., Freedman et al. 1991; Grofman 1991; Cho 1998; Cho and Gaines 2004; Herron and Shotts 2004, and many others).

In this article, we conduct a theoretical and empirical study of Bayesian and likelihood inference for 2 x 2 ecological tables by applying the general statistical framework of incomplete (or missing) data (Heitjan and Rubin 1991).1 First, we formulate ecological inference in 2 x 2 tables as a missing-data problem where only the weighted average of two unknown variables is observed (Section 2). This framework directly incorporates the deterministic bounds, which contain all information available from the data, and allows researchers to use the individual-level data whenever available. Within this general framework, we first show that the ecological inference problem can be decomposed into three factors: distributional effects, which address the possible misspecification of parametric modeling assumptions about the unknown distribution of missing data; contextual effects, which represent the possible correlation between missing data and observed variables; and aggregation effects, which are directly related to the loss of information caused by data aggregation.

We then examine how each of these three factors affects inference and offer new statistical methods to address each of them. To deal with distributional effects, we extend a simple parametric model to a nonparametric Bayesian model based on a Dirichlet process prior (Section 3). One common feature of many existing models is the use of parametric assumptions. In the exchange between King (1999) and Freedman et al. (1998), King concludes that "open issues ... include ... flexible distributional and functional form specifications" (354). We take up this challenge by relaxing the distributional assumption and examine the relative advantages of the proposed nonparametric model through simulation studies and an empirical example. We also show that statistical adjustments for contextual effects can be made within these parametric and nonparametric models.

Although little can be done to cope with aggregation effects, we offer a method to quantify the magnitude of such effects within our parametric model by quantifying the amount of missing information due to data aggregation in ecological inference (Section 4). Our approach is to measure the amount of information the observed aggregate-level data provide in comparison with the information one would obtain if the individual-level data were available. We do so in the context of both parameter estimation and hypothesis testing. Previous studies largely relied upon informal graphical and numerical summaries in order to examine the amount of information available in the observed data (e.g., King 1997; Gelman et al. 2001; Cho and Gaines 2004; Wakefield 2004a). In contrast, the proposed methods can be used to formally assess the severity of aggregation effects.

Finally, we evaluate the performance of our proposed methods and illustrate their use with the analysis of both simulated and real data sets (Section 5). C code, along with an easy-to-use R interface, is publicly available as an R package, eco (Imai, Lu, and Strauss forthcoming), through the Comprehensive R Archive Network (http://cran.r-project.org/) for implementing our proposed methods.


    2. Theoretical Framework for Ecological Inference
 Top
 1. Introduction
 2. Theoretical Framework for...
 3. A Nonparametric Model...
 4. Quantifying the Aggregation...
 5. Simulation Studies and...
 6. Concluding Remarks
 Funding
 Appendices: Computational...
 Notes
 References
 
We first introduce a general theoretical framework for the ecological inference problem in 2 x 2 tables. We show that ecological data can be viewed as coarse data, which are a special case of incomplete data. Following the general framework of Heitjan and Rubin (1991), we discuss the conditions under which valid ecological inferences can be made using likelihood-based models. This theoretical framework clarifies and formally identifies the modeling assumptions required for ecological inference. While demonstrating how to deal with common problems, the framework also provides insight into the fundamental difficulty inherent in ecological inference, which cannot be overcome by statistical adjustments.

2.1 Ecological Inference Problem in 2 x 2 Tables
In this article, we focus on ecological inference in 2 x 2 tables. Suppose, for example, that we observe the number of registered white and black voters for each geographical unit (e.g., a county). The election results reveal the total number of votes for all geographical units. Given this information, we wish to infer the number of black and white voters who turned out. Table 1 presents this 2 x 2 ecological inference example, where counts are transformed into proportions. In typical political science examples, the number of voters within each geographical unit is large. Hence, many previous methods directly modeled proportions rather than counts (e.g., Goodman 1953; Freedman et al. 1991; King 1997).2 We focus on models of proportions in this article.


View this table:
[in this window]
[in a new window]

 
Table 1 2 x 2 Ecological table for the racial voting example

 
For every geographical unit i = 1, ..., n, such a 2 x 2 ecological table is available. Given the total turnout rate Yi and the proportion of black voters Xi, one seeks to infer the proportions of black and white voters who turned out, Wi1 and Wi2, respectively. Although both Wi1 and Wi2 are not observed, they follow a key deterministic relationship,

Formula (1)
That is, Yi is the observed weighted average of the two unknown variables, Wi1 and Wi2, with Xi and 1 – Xi being the observed weights.

The goals of ecological inference are twofold. First, researchers may be interested in characterizing the individual behavior at the population level. For example, they may wish to estimate the mean and variance of the joint or marginal (population) distributions of W1 and W2, or the distributions themselves. Second, since the internal cells of ecological tables are not observed, the estimation of the (sample) values of Wi1 and Wi2 for each unit i is also of interest. We call the former population ecological inference, whereas the latter is referred to as sample ecological inference. In political science research, sample ecological inference is often emphasized more often than population inference (e.g., King 1997). However, in other studies such as epidemiological studies that assess disease risk factors through ecological data, population ecological inference is of primary importance.

If sample ecological inference is conducted within the frequentist statistical framework, Wi1 and Wi2 should not be treated as unknown parameters to be estimated. In that case, we must estimate n parameters based on n observations, and no informational gain results from obtaining additional observations. Instead, each new observation creates an additional parameter to estimate. Such an approach yields an incidental parameter problem where no consistent estimators can be constructed for Wi1 and Wi2 (Neyman and Scott 1948). Hence, Wi1 and Wi2 must be viewed as missing data to be predicted rather than parameters to be estimated. The distinction between sample and population inferences, therefore, is critical for understanding the statistical properties of various frequentist ecological inference models.

2.2 Ecological Inference as a Coarse Data Problem
We now show that ecological inference in 2 x 2 tables can be viewed as a coarse data problem. Coarse data refer to a particular type of incomplete data that are neither entirely missing nor perfectly observed. Instead, we observe only a subset of the complete-data sample space in which the true unobserved data points lie. Some examples of coarse data include rounded, heaped, censored, and partially categorized data (Heitjan and Rubin 1991).

For ecological inference in 2 x 2 tables, the vector of internal cells Wi = (Wi1, Wi2) are the variables of interest. However, they are not directly observed. Instead, only their weighted average Yi and the weight Xi are observed. From equation (1), Duncan and Davis (1953) derive the sharp bounds for each of the unobserved variables, Wi1 and Wi2,

Formula (2)
Although these intervals reveal the possible values that Wi1 and Wi2 could take, they are often too wide to be informative for the purposes of applied researchers.

Ecological inference is a coarse data problem because the missing data Wi = (Wi1, Wi2) are only partially observed. The relationship between the observed data (Yi, Xi) and the missing data Wi is solely characterized by equation (1). The random variable Xi is called a coarsening variable, whereas Yi is called the coarsened data. This terminology is derived from the fact that Xi determines how much information is revealed about each of the missing data through Yi. For example, if there are many more black voters than white voters, then the aggregate turnout rate gives you more information about black voters' turnout. In other words, if Xi takes a value closer to 1, bounds are likely to be narrow for Wi1 and wide for Wi2.

2.3 Three Key Factors in Ecological Inference
Next, we place ecological inference within the theoretical framework of coarse data developed by Heitjan and Rubin (1991) and formally identify the key factors that influence ecological inference. We consider the likelihood-based inference, which has been a popular approach in the literature (e.g., King 1997; King, Rosen, and Tanner 1999; Rosen et al. 2001; Wakefield 2004a). We begin by defining the many-to-one mapping, Yi = Formula (Xi, Wi) = XiWi1 + (1 – Xi)Wi2, from the complete data to the observed (coarsened) data for each i = 1,2, ..., n. Suppose that the density function of Wi is given by f(Wi|{zeta}) with a vector of unknown parameters {zeta}. Let h(Xi|Wi, {gamma}) denote the conditional distribution of Xi given unobserved data Wi and a vector of unknown parameters {gamma}. Then, the observed-data likelihood function can be written as,

Formula (3)
where {zeta} and {gamma} are assumed to be disjoint sets of parameters. The calculation of the observed-data likelihood function in equation (3) requires the integration with respect to the missing data Wi over the region defined by the data coarsening mechanism, Yi = Formula (Xi, Wi). In contrast, the complete-data likelihood function, that is, the likelihood function one would obtain if the missing data were to be completely observed, is given by,

Formula (4)

To make inferences based on Lobs({zeta}, {gamma}|Y, X), we must specify the sampling distribution of missing data f(Wi|{zeta}) as well as the conditional distribution of the coarsening variable h(Xi|Wi, {gamma}). In ecological inference, this incomplete-data framework allows us to formally identify the following three key factors. The first factor is distributional effects, which refer to the effects of the (mis)specification of f(Wi|{zeta}) or the joint distribution of black and white turnout rates in our running example, on the resulting inference. The second factor is contextual effects, which are concerned about the specification of h(Xi|Wi, {gamma}). In our running example, the proportion of black voters might be correlated with black and white turnout rates through neighborhood variables such as income and education. The debate in the literature has almost exclusively focused on the possible misspecification of these two distributions. Unfortunately, since Wi is not directly observed, detecting distributional and contextual effects is a difficult task in practice. For example, one can compare the marginal distribution of Y against the (marginal) predictive distribution of Y from the fitted model. Such an approach, however, will not be able to detect all the misspecified models because the misspecification of the distribution of Wi can still yield the marginal predictive distribution of Y that is consistent with the observed data. In Section 4.3, we partially address this concern of undetectable model misspecification under parametric assumptions.

Finally, we also study the third, yet most critical, issue of ecological inference, that is, the loss of information that occurs due to data coarsening. We call this aggregation effects because it is the data aggregation that makes ecological inference a fundamentally difficult statistical problem. Aggregation effects cause both distributional and contextual effects because the data aggregation prevents researchers from detecting model misspecification through the diagnostic techniques available to usual analysis of complete data. Although aggregation effects cannot be overcome by statistical adjustments, we show that it is possible to quantify the amount of missing information due to aggregation in ecological inference (Section 4).

2.4 Three Modeling Assumptions
Based on the theoretical framework introduced above, we identify three possible modeling assumptions for ecological inference and derive the general conditions under which valid ecological inferences can be drawn.

2.4.1 Assuming no contextual effect
First, we state the condition under which the stochastic coarsening mechanism can be ignored; that is, the condition under which the specification of h(Xi|Wi, {gamma}) is not required. In ecological inference, this corresponds to the condition under which contextual effects can be ignored. In our running example, this means that black and white turnout rates are jointly independent of the proportion of black voters. Although this is a strong assumption and often cannot be justified in practice, it serves as a useful starting point for developing models under more general conditions. Heitjan and Rubin (1991) formally define this condition and call it coarsened at random (CAR) as a general formulation of missing at random in the literature on inference with missing data.

Under CAR, if {zeta} and {gamma} are disjoint parameters, the inference about {zeta} does not depend on {gamma} and the specification of h(Xi|Wi, {gamma}) can be ignored. Heitjan and Rubin (1991) also show that CAR is the weakest condition under which it is appropriate to ignore the coarsening mechanism. Formally, a sufficient condition for Yi to be CAR is that Xi and Wi are independent; that is, h(Xi|Wi, {gamma}) = h(Xi|{gamma}). Then, the observed-data likelihood function of equation (3) can be simplified as

Formula
Parametric models under this assumption have appeared in the literature (e.g., King 1997; Wakefield 2004a).

2.4.2 Modeling contextual effects with covariates
In many situations, Wi and Xi may not be independent, but this dependence can be modeled through controlling for observed covariates Zi, which may or may not include Xi. Another motivation for this approach is the estimation of the conditional mean function of Wi given Zi rather than its marginal mean. We refer to this modeling assumption as conditionally coarsened at random, or CCAR. In the context of our running example, one may assume that once we control for income and education levels, black and white turnout rates are no longer dependent on the proportion of black voters.

Formally, we assume that Wi and Xi are conditionally independent given Zi, that is, h(Xi|Wi, Zi, {gamma}) = h(Xi|Zi, {gamma}). If the assumption holds, the data are CAR given Zi, and the observed-data likelihood can be written as

Formula
King (1997) and King, Rosen, and Tanner (1999) propose parametric models based on this assumption.

2.4.3 Modeling contextual effects without covariates
Finally, we consider a scenario where the CAR assumption is known to be violated but no covariate is available for which the CCAR assumption holds. Even when some covariates are available, researchers may not be willing to make functional-form assumptions about the high-dimensional covariate space because we do not directly observe Wi. Unless we jointly observe (Wi, Xi, Zi) for some units, the not coarsened at random (NCAR) strategy is to minimize the modeling assumptions by focusing on the trivariate relationship between Wi and Xi without incorporating Zi. In addition, one may wish to focus on the estimation of marginal mean of Wi rather than its conditional mean. We refer to this modeling assumption as NCAR. In the NCAR case, we directly model the data coarsening mechanism and specify the joint distribution g(Xi, Wi|{zeta}, {gamma}) = f(Wi|{zeta})h(Xi|Wi, {gamma}). The observed-data likelihood can be written as,

Formula
To the best of our knowledge, no model under this assumption has been proposed in the literature.


    3. A Nonparametric Model of Ecological Inference
 Top
 1. Introduction
 2. Theoretical Framework for...
 3. A Nonparametric Model...
 4. Quantifying the Aggregation...
 5. Simulation Studies and...
 6. Concluding Remarks
 Funding
 Appendices: Computational...
 Notes
 References
 
In this section, we introduce a Bayesian, nonparametric model of ecological inference in order to deal with distributional effects (as well as contextual effects) by relaxing parametric assumptions. We start our discussion by describing a parametric model, which is similar to the ones proposed in the literature, and then show how to extend the model to a nonparametric model.

3.1 A Parametric Base Model
Our first parametric model is based on the CAR assumption. A similar parametric model has appeared in the literature (King 1997; Wakefield 2004a). In particular, we model the logit transformation of the missing data using the bivariate normal distribution,

Formula
where Wi* = (Wi1*, Wi2*) = (logit(Wi1), logit(Wi2)), µ is a (2 x 1) vector of population means, and {Sigma} is a (2 x 2) positive-definite variance matrix. The model allows Wi1 and Wi2 to be correlated with each other (through their logit transformations). This means that in the racial voting example, the turnout rates of black and white voters in each county may be correlated with one another.

The above model can be extended to a Bayesian model by placing the following conjugate prior distribution on (µ, {Sigma}),

Formula (5)
where µ0 is a (2 x 1) vector of the prior mean, {tau}0 is a scalar, {nu}0 is the prior degrees of freedom parameter, and S0 is a (2 x 2) positive-definite prior scale matrix. When strong prior information is available from previous studies or elsewhere, we specify these prior parameters so that the prior knowledge can be approximated. When such information is not available, however, we consider a flat prior where the prior predictive distribution of (W1, W2) is approximately uniform. This latter condition leads to our choice of the prior parameters for the parametric model: µ0 = 0, S0 = 10I2, {tau}0 = 2, and {nu}0 = 4.

This parametric base model can be easily extended to the analyses under the CCAR and NCAR assumptions. For example, under the CCAR assumption, the model becomes,

Formula
where ß is a (k x 1) vector of coefficients, Zi is a (k x 2) matrix of covariates, and {Sigma} is the (2 x 2) positive-definite conditional variance matrix. In contrast, under NCAR, the model is specified as,

Formula
where Xi* = logit(Xi), the mean vector {eta} is 3 x 1, and the covariance matrix {Phi} is a 3 x 3 positive-definite matrix.

3.2 A Nonparametric Model
Similar to other parametric models in the literature, the models introduced in Section 3.1 make specific distributional assumptions. To relax these assumptions, we apply a Dirichlet process prior and model the unknown population distribution as a mixture of bivariate normal distributions (Ferguson 1973).3 The resulting model is nonparametric in the sense that no distributional assumption is made, and its in-sample predictions respect the deterministic bounds. Recent development of Markov chain Monte Carlo (MCMC) algorithms has enabled the use of a Dirichlet process prior for Bayesian density estimation and other nonparametric and semiparametric problems (e.g., Escobar and West 1995; Mukhopadhyay and Gelfand 1997; Gill and Casella 2006). Dey, Müller, and Sinha (1998) is an accessible introduction to this methodology.

Our basic idea is to use the (countably infinite) mixture of bivariate normal distributions to model the unknown distribution of W. Unlike finite mixture models, the number of mixtures (or clusters) is not specified in advance and can grow as the number of data points increases, thereby allowing for nonparametric estimation of an unknown distribution. In fact, each new draw of the data may come from one of the existing mixture components from which the other data points were generated or from a new distribution adding another component to the mixture. The number of mixture components is controlled by a single parameter, {alpha}, which is a positive scalar and called the concentration parameter. Our model specifies a prior distribution on {alpha} which results in a relatively large number of mixture components, and then through posterior updating we learn about the number of clusters from the observed data.

Formally, we model the parameters, {µi, {Sigma}i}i = 1n, with an unknown (random) distribution function G rather than a known (fixed) one such as the normal/inverse-Wishart distribution. Note that the parameters now have subscript i, allowing for the possibility that the number of parameters grows as the number of observation grows (i.e., nonparametric estimation). We then place a prior distribution on G over all possible probability measures. Such a prior distribution is called a Dirichlet process prior and is denoted by G ~ Formula (G0, {alpha}), where G0(·) is the known base prior distribution and is also the prior expectation of G(·); E(G(µ, {Sigma})) = G0(µ, {Sigma}) for all (µ, {Sigma}) in its parameter space. Ferguson (1973) established that given any measurable partition (A1, A2, ..., Ak) on the support of G0, the random vector of probabilities (G(A1), G(A2), ..., G(Ak)) follows a Dirichlet distribution with parameter ({alpha}G0(A1), {alpha}G0(A2), ..., {alpha}G0(Ak)). A large value of {alpha} suggests that G is likely to be close to G0 and, hence, to yield the results that are similar to those obtained from the parametric model with the prior distribution G0. On the other hand, a small value of {alpha} implies that G is likely to place most of the probability mass on a few partitions. This setup allows the unknown distribution function G to be nonparametrically estimated from the data.

We specify a Dirichlet process prior on the unknown distribution function of the population parameters, using the same conjugate normal/inverse-Wishart prior distribution as the base prior distribution. Finally, we place a gamma prior on the concentration parameter {alpha}. Then, our Bayesian nonparametric model is given by,

Formula
where under G0, (µi, {Sigma}i) is distributed as

Formula

To illustrate how our model relates to a normal mixture, we follow Ferguson (1973) and Escobar and West (1995) to compute the conditional prior distribution, pi, {Sigma}i(i), {Sigma}(i), {alpha}), where µ(i) = {µ1, ..., µi–1, µi+1, ..., µn} and {Sigma}(i) = {{Sigma}1, ..., {Sigma}i–1, {Sigma}i+1, ..., {Sigma}n}. The calculation yields,

Formula (6)
where {delta}j, {Sigma}j)i, {Sigma}i) is a degenerate distribution whose entire probability mass is concentrated at i, {Sigma}i) = (µj, {Sigma}j) and an–1 = 1/({alpha} + n 1). Equation (6) shows that given any (n – 1) values of i, {Sigma}i), there is a positive probability of coincident values and that as {alpha} tends to {infty}, the distribution approaches G0. In other words, a new draw of the parameters can take either the same values as one of the existing parameter values or new values drawn from the base distribution. The relative frequencies of these two events are governed by the concentration parameter {alpha}.

Similarly, a future replication draw of (µn+1, {Sigma}n+1), given µ = {µ1, ..., µn} and {Sigma} = {{Sigma}1, ..., {Sigma}n}, has the mixture distribution,

Formula
where an = 1/({alpha} + n). We then compute the predictive distribution of a future observation Wn + 1* given (µ, {Sigma}, {alpha}), which forms the basis of Bayesian density estimation. In particular, we evaluate {int} p(Wn + 1* | µn + 1, {Sigma}n + 1, {alpha})dPn + 1, {Sigma}n + 1 | µ, {Sigma}, {alpha}), which yields,

Formula (7)
where Formula is a bivariate t distribution with {nu}0 degrees of freedom, the location parameter µ0, and the scale matrix S = ({tau}Formula + 1)S0/{{tau}Formula(1 + {nu}0)}. Equation (7) shows that when the value of {alpha} is small, the predictive distribution is equivalent to a normal mixture. This setup resembles the standard kernel density estimator with a bivariate normal kernel. In particular, {alpha} plays a role similar to the bandwidth parameter, which controls the degree of smoothness.

We use a diffuse prior, Formula (1, 0.1), with a mean of 10 and variance 100 for the concentration parameter, {alpha}. According to Antoniak (1974), the expected number of clusters given {alpha} and the sample size n is approximately {alpha}log(1 + n/{alpha}). With this choice of prior distribution for {alpha} and n = 200, the prior expected number of clusters is approximately 27. In general, a sensitivity analysis should be conducted in order to assess the influence of prior specification on posterior inferences. The sensitivity analysis is important especially for the concentration parameter because it plays a critical role in the density estimation with Dirichlet processes.

This nonparametric model can be easily extended to the analysis under the NCAR assumption by placing the following conjugate prior distribution on ({eta}, {Phi}); that is, {eta} | {Phi} ~Formula ({eta}0, {Phi}/{tau}02) and {Phi} ~ InvWish({nu}0, S0–1), where {nu}0 is the (3 x 1) vector of prior mean, {tau}0 > 0 is a scale parameter, {nu}0 is the prior degrees of freedom parameter, and S0 is the (3 x 3) positive-definite prior scale matrix. For the inverse-Wishart distribution to be proper, {nu}0 needs to be greater than 3.

Our nonparametric model, therefore, in principle can provide flexible estimation of bivariate density functions for ecological inference problems. However, because we do not directly observe Wi1 and Wi2, the density estimation problem for ecological inference is much more difficult. Therefore, bounds must be sufficiently informative in order for the nonparametric model to be able to recover the underlying population distribution. We empirically investigate this issue through both the analysis of simulated and real data sets in Section 5.1.

3.3 Computational Strategies
Finally, we briefly discuss our computational strategies to fit the proposed models. To obtain the maximum likelihood (ML) estimates of the model parameters for the parametric CAR and NCAR models, we develop an Expectation Maximization (EM) algorithm (Dempster, Laird, and Rubin 1977), whereas we develop an Expectation Conditional Maximization (ECM) algorithm (Meng and Rubin 1993) to fit the CCAR model. The details of these algorithms appear in Appendix A. The EM and ECM algorithms are general optimization techniques that are often useful when obtaining the ML estimates in the presence of missing data. A main advantage of these algorithms is their numerical stability. In particular, the observed-data log likelihood increases monotonically at each iteration. For Bayesian analysis, we develop MCMC algorithms for both parametric and nonparametric models. These MCMC algorithms are described in Appendix B.


    4. Quantifying the Aggregation Effects
 Top
 1. Introduction
 2. Theoretical Framework for...
 3. A Nonparametric Model...
 4. Quantifying the Aggregation...
 5. Simulation Studies and...
 6. Concluding Remarks
 Funding
 Appendices: Computational...
 Notes
 References
 
In this section, under the theoretical framework described in Section 2, we show how to quantify the magnitude of the aggregation effects in the context of both parameter estimation and hypothesis testing. We do so by measuring the fraction of missing information under the parametric models proposed in Section 3.1. Our approach is to quantify the amount of missing information caused by data aggregation relative to the amount of information one would have if the individual-level data are observed.

4.1 A Measure of the Aggregation Effects in Parameter Estimation
To quantify the amount of the aggregation effects in parameter estimation, we use the missing-information principle of Orchard and Woodbury (1972), which states that the missing information is equal to the difference between the complete information and the observed information. Formally, Dempster, Laird, and Rubin (1977) prove the following key equality,

Formula
{theta}where Formula is the ML estimate of unknown parameters {theta} ({theta} = ({xi}, {gamma}) in our case) from the observed data. Formula represents the observed Fisher information matrix, defined by,

Formula (8)
where lobs is the observed-data log-likelihood function based on equation (3). Formula com(Formula) denotes the expected information matrix from the complete-data log-likelihood function, based on equation (4), and is given by,

Formula (9)
where the expectation is taken with respect to the distribution of missing data W given the observed data (Y, X). Finally, Formula mis(Formula) can be viewed as the missing information due to data aggregation and is defined as,

Formula

To define a measure of missing information in multivariate settings, we use the diagonal elements of the (matrix) fraction of the observed information and complete information,

Formula (10)
Then, the ith element of F{theta} is an information-theoretic measure of the relative amount of missing information in the ML estimation of the ith element of the parameter vector {theta}. In ecological inference, F{theta} represents the amount of additional information the individual-level data would provide for the estimation of {theta}, if they were available, in comparison with the information obtained from the observed aggregate data. Since the diagonal elements of the inverse of the observed information matrix equal the estimated asymptotic variance of each parameter, in the one-parameter case, the fraction of missing information equals the fraction of increase in the asymptotic variance due to missing data.

Finally, it is also possible to summarize the amount of missing information in ecological inference by a scalar rather than computing the fraction of missing information for each parameter. This can be done by computing the largest eigenvalue of the "matrix fraction" of missing information, IFormula com–1(Formula) Formula obs(Formula), where I represents the identity matrix. In this expression, a larger value indicates a greater amount of missing information.

4.2 A Measure of the Aggregation Effects in Hypothesis Testing
Kong, Meng, and Nicolae (2005) propose a general framework for quantifying the relative amount of missing information in hypothesis testing with incomplete data. We apply this methodology to ecological inference so that the fraction of missing information can be calculated for hypothesis testing. Kong, Meng, and Nicolae (2005) propose two measures of missing information in hypothesis testing: the fraction of missing information against and under a null hypothesis. In this article, we focus on the former because, as discussed by Kong, Meng, and Nicolae (2005), the latter may provide misleading inferences if the true values are far away from the null values.

Consider the null hypothesis H0: {theta} = {theta}0. The fraction of missing information against the null hypothesis is given by,

Formula (11)
where the expectation is taken over the conditional distribution of the missing data W given the observed information (Y, X). This measure equals the ratio of the logarithms of the two likelihood ratio test statistics; the logarithm of the observed likelihood ratio statistic, based on the observed-data likelihood, is in the numerator whereas the logarithm of the expected likelihood ratio statistic, based on the complete-data likelihood, is in the denominator.

The interpretation of the measure in equation (11) exactly parallels that of the fraction of missing information in parameter estimation (see equation 10). Kong, Meng, and Nicolae (2005) show the three key properties of this measure; (1) FH is a fraction, that is, 0 ≤ FH ≤ 1; (2) FH = 1 if and only if the observed data cannot distinguish between Formula and {theta}0 at all; that is, the observed-data likelihood ratio is equal to 1 or lobs(Formula | Y, X) = lobs({theta}0 | Y, X); and (3) FH = 0 if and only if the missing information cannot distinguish between Formula and {theta}0 given the observed data; that is, the Kullback-Leibler information number, Formula is equal to 0.

4.2.1 Null hypothesis of linear constraints on marginal means
We first consider the null hypothesis of linear constraints on the marginal means of Wi under the CAR and NCAR models. If we have l linear constraints, then the null hypothesis can be written as the system of l linear equations, H0:AT µ = a, where a is an (l x 1) vector of known constants. For the CAR model, µ is a two-dimensional vector, whereas under the NCAR model it is a three-dimensional vector. An important special case is the equality constraint of marginal means, µ1 = µ2 or equivalently A = (1, –1) and a = 0 under the CAR model and A = (1, –1, 0) and a = 0 under the NCAR model. For example, researchers may wish to test whether the turnout rates of whites and nonwhites are the same. To conduct the likelihood ratio test of H0 and compute the fraction of missing information associated with it, we must first obtain the ML estimates of {theta} under the constraint of ATµ = a, and then compare the value of the observed-data log likelihood under this constraint with the corresponding value obtained without the constraint.

4.2.2 Null hypothesis of linear constraints on regression coefficients
We next consider the null hypothesis of linear constraints on regression coefficients under the CCAR model. For example, one might be interested in testing the null hypothesis that the effect of a particular variable is zero on the conditional means of both Wi1* and Wi2*. If there are l linear constraints, the null hypothesis can be expressed as a system of l linear equations, H0: ATß = a where A is a known (k x l) matrix and a is an l-dimensional vector of constants.

4.3 Missing Information and Model Misspecification
The proposed methods to quantify the amount of missing information described above assume that researchers know the correct (likelihood-based) parametric model. Although most social scientists conduct their data analysis based on such an assumption, the possibility of model misspecification is greater in ecological inference and hence this is a potential concern. Given that the individual-level data are partially missing, standard diagnostics tools, which require complete data, cannot be used to detect possible model misspecification. This means that if the underlying complete-data model is incorrect, the resulting estimates of fraction of missing information may also be misleading. This problem reflects the fundamental difficulty of statistical inference in the presence of missing data; the inference may be sensitive to the modeling assumptions about missing data. Therefore, it is no surprise that much methodological controversy in ecological inference is centered around the issue of model misspecification.

Methodological research has only begun to directly address the problem of model uncertainty in ecological inference (e.g., Imai and King 2004). The methods we propose in this article, however, have only an indirect relationship with the issue of model misspecification. Namely, a higher fraction of missing information implies a greater magnitude of possible incomplete-data bias resulting from local model misspecification, that is, the degree of model misspecification which cannot be detected even if one would observe the complete data. Copas and Eguchi (2005) formalize this idea by showing that the magnitude of standardized incomplete-data bias for parameter {theta} resulting from such local model misspecification has the upper bound, which is equal to Formula where {varepsilon} represents the magnitude of local model misspecification, and F{theta} is the fraction of missing information in the estimation of {theta} as defined in equation (10). This formulation implies that the methods proposed in this section can alert applied researchers to the possibility of local model misspecification. However, the fraction of missing information does not reflect the degree to which the assumed model is grossly misspecified.

In ecological inference, such undetectable, yet serious, model misspecification might occur so that additional aggregate data (or coarse data) do not help detect the misspecification of individual-level data (or complete data) model. In that case, the magnitude of bias is likely to be larger than the above upper bound, and hence, the fraction of missing information may even underestimate the degree of model uncertainty.

4.4 Computational Strategies
To compute a measure of missing information under the CAR model, we apply the supplemented EM (SEM) algorithm to compute the fraction of missing information defined in equation (10) and to estimate the asymptotic variance-covariance matrix of the ML estimates (Meng and Rubin 1991). In addition to its numerical stability, a principle advantage of the SEM algorithm is that it simply extends the EM process, obviating the need to develop an independent algorithm.

Since the EM algorithm outputs the ML estimates of transformed parameters, {theta}* = (µ1, µ2, log {sigma}1, log {sigma}2, 0.5 log [(1 + {rho})/(1 – {rho})]), we first compute the fraction of missing information for each of the transformed parameters. We use Formula* to denote the ML estimates of transformed parameters {theta}*. It is also possible to present the fraction of missing information for the parameters that can be easily interpreted by applied researchers rather than the transformed parameters, {theta}*, which are used purely for the modeling and computational purposes. In this case, we use the first-order approximation to calculate the means, variances, and correlation of the original data, for example, Wij, by logit–1j), {sigma}j e2 µj/(1 + eµj)4, and {rho}, respectively. We then use the chain rule and the invariance property of ML estimators to derive the expression for the DM matrix and the expected information matrix, Formula com, for the new parameters of interest. A similar estimation strategy can be used for the NCAR model as well.

For the CCAR model, we use the supplemented ECM (SECM) algorithm, which modifies the SEM algorithm to adjust for the fact that the conditional maximization is used (van Dyk, Meng, and Rubin 1995). Using the SECM algorithm, we first compute the fraction of missing information for the transformed parameters, {theta}*. If desired, we can also compute the fraction of missing information on the conditional mean on the logit scale, that is, E(Wi* | Zi), or on the original scale, that is, E(Wi|Zi), using the first-order approximation and the invariance property of ML estimators.


    5. Simulation Studies and Empirical Examples
 Top
 1. Introduction
 2. Theoretical Framework for...
 3. A Nonparametric Model...
 4. Quantifying the Aggregation...
 5. Simulation Studies and...
 6. Concluding Remarks
 Funding
 Appendices: Computational...
 Notes
 References
 
In this section, we evaluate the performance of the proposed methods and illustrate their use by analyzing both simulated and real data sets. Each of the following subsections focuses on one of the three key factors in ecological inference identified in Section 2.

5.1 Distributional Effects
5.1.1 A simulation study
To investigate distributional effects, we use Xi from the data set analyzed by Burden and Kimball (1998), which has a sample size of 361. Although this data set is not about racial voting, for simplicity, we use the notation of Table 1 and refer to Xi as the proportion of black voters and Yi as the overall turnout rate for each county i. The unknown inner cells (Wi1, Wi2) are the fractions of those who voted among black and white voters, respectively. To construct different simulation settings, we draw (Wi1, Wi2) independently from the following three distributions, although maintaining the same racial composition Xi.

Simulation I.Wi* is independently drawn from a bivariate normal distribution with mean (0, 1.4), variances (1, 0.5), and covariance 0.2, yielding the average turnout of about 50% and 80% for black and white voters, respectively.
Simulation II.Wi* is independently drawn from a mixture of two bivariate normal distributions with the mixing probability (0.6, 0.4). The first distribution has mean (–0.4, 1.4), variance (0.2, 0.1), and covariance 0. The second distribution has a different mean (–0.4, –1.4), but the same covariance matrix. This yields the average turnout of roughly 40% for black voters, approximately 80% for white voters in 60% of the counties, and about 20% for white voters in the other counties.
Simulation III.Wi* is independently drawn from a mixture of two bivariate normal distributions with the mixing probability (0.6, 0.4). The first distribution has mean (–1.4, 1.4), variance (0.1, 0.1), and covariance 0. The second distribution has a different mean (1.4, –1.4), but the same covariance matrix. In 60% of the counties, the average turnout is 20% for blacks and 80% for whites, whereas in the rest of the counties this pattern is reversed.
In all three simulations, we assume no contextual effect. Note that in Simulation II only the marginal distribution of Wi2 is bimodal, whereas in Simulation III the marginal distributions of both Wi1 and Wi2 are bimodal. It is of particular interest to see whether the nonparametric method can recover such distributions.

Figure 1 presents the tomography plots of the simulated data sets with the true values of Wi. The graphs illustrate the bounds for Wi1 and Wi2, which can be obtained by projecting tomography lines onto the horizontal and vertical axes. The average length of bounds for Wi1 in Simulations I, II, and III is 0.55, 0.58, and 0.64, whereas that for Wi2 is 0.71, 0.73, and 0.78, respectively. This indicates that in all three simulations, the bounds are not particularly informative.


Figure 1
View larger version (19K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Fig. 1 Tomography plots of simulations I, II, and III. The solid lines illustrate the deterministic relationship of equation (2), and the dots represent the true values of (Wi1, Wi2), for randomly selected 40 counties from the Burden and Kimball (1998) data set.

 
Treating Xi and Yi as observed and Wi as unknown, we fit our parametric and nonparametric models and assess their relative performance in terms of both sample and population inferences by examining in-sample and out-of-sample predictions, respectively. Table 2 numerically summarizes the in-sample predictive performance. In Simulations II and III, the (sample) root mean squared error (RMSE) of our nonparametric model is smaller than that of the parametric model. Nevertheless, even when the true distribution is bimodal, the in-sample predictions from our parametric model are reasonable. This is because the parametric model yields the in-sample predictions that respect the bound conditions. The in-sample predictions based on the ecological regression (Goodman 1953) E(Yi|Xi) = {alpha} + ßXi yield larger bias and RMSE than the other two methods.


View this table:
[in this window]
[in a new window]

 
Table 2 In-sample predictive performance with different distributions of (W1, W2)

 
Finally, we examine the out-of-sample predictive performance, which is of importance for population inferences. Figure 2 compares the true distribution with the estimated marginal density based on out-of-sample predictions from our models. In Simulation I, our nonparametric and parametric models give essentially identical estimates and approximate the marginal distributions well. Indeed, the number of clusters for the nonparametric model reduces to one. In our setup, the nonparametric model with one cluster is identical to the parametric model. This result is not surprising given that this data set is generated using the parametric model. The other two simulations, however, demonstrate the clear advantage of the nonparametric model. The nonparametric model captures the bimodality feature of the marginal distributions, whereas the parametric model fails to approximate the true distribution as expected.


Figure 2
View larger version (24K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Fig. 2 Out-of-sample predictive performance with different distributions of (W1, W2). The true marginal distributions are shown as shaded areas. The solid line represents the estimated density from the parametric model, whereas the dashed line represents that from the nonparametric model.

 
5.1.2 Voter registration in U.S. Southern states
Next, we analyze voter registration data from 275 counties of four Southern states in the United States: Florida, Louisiana, North Carolina, and South Carolina. This data set is first studied by King (1997) and subsequently analyzed by others (e.g., King, Rosen, and Tanner 1999; Wakefield 2004b). For each county, Xi represents the proportion of black voters, Yi denotes the registration rate, and Wi1 and Wi2 represent the registration rates of black and white voters. In this example, the true values of Wi1 and Wi2 are known, which allows us to compare the performance of our method with that of existing models.

Figure 3 presents a graphical summary of the data. The upper-left panel plots the true values of Wi1 and Wi2. The registration rates among white voters are high in many counties, with an average of 86%. In contrast, black registration rates are much lower, with an average of 56%. The sample variances of registration rates are 0.044 and 0.024 for black and white voters, respectively. The other two graphs in the upper panel are the scatterplots of the registration rates and the proportions for black and white voters. In this data set, the correlation between X and W1 is –0.08, whereas the correlation between X and W2 is only 0.01, implying minor contextual effects.


Figure 3
View larger version (23K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Fig. 3 Summary of the voter registration data from four U.S. Southern states. The upper-left graph is the scatterplot of the true values of Wi1 and Wi2. The upper-middle graph is the scatterplot of black registration rate, Wi1, and the ratio of black voters, Xi. The solid line represents a LOWESS curve. The upper-right graph presents the same figure for white voters. The lower-left graph is the tomography plot with the true values indicated as dots. The lower middle and right graphs plot the bounds of W1 and W2, respectively.

 
The lower panel of Fig. 3 presents the tomography plots for a random subset of the counties. The bounds reveal asymmetric information about W1 and W2, and they are more informative for W2 than for W1. Moreover, for 30% of W2, the true values are equal to 1. As a result, the true values of the corresponding W1 lie at the lower end of the bounds. This may pose some difficulty for in-sample predictions, especially for the counties whose bounds are wide.

By treating W1 and W2 as unknown, we fit both our parametric and nonparametric models to a subset of 250 counties. We also examine the model performance by adding the individual-level data of the remaining 25 counties. Finally, we compare the results with other methods in the literature, including the ecological regression, the linear and nonlinear neighborhood models (Freedman et al. 1991), the midpoints of bounds, King's EI model, and Wakefield's hierarchical model. To fit King's EI model, we use the publicly available software, EzI (version 2.7) by Benoit and King, with its default specifications. To fit Wakefield's binomial convolution model, we use his WinBUGS code (Wakefield 2004b), which fits the model based on normal approximation. We specify prior distributions such that the implied prior predictive distribution of Wi is approximately uniform. Specifically, we use µ0 ~ logistic(0, 1), µ1 ~ logistic(0, 1), {sigma}Formula ~ Formula (1,100), and {sigma}Formula ~ Formula (1,100). After 50,000 iterations, we discard the initial 20,000 draws and take every 10th draw.

Table 3 summarizes the in-sample predictive performance. For this data set, our nonparametric model significantly outperforms our parametric model in all three discrepancy measures (bias, RMSE, and mean absolute error) by a magnitude that is much greater than what we have seen in our simulation examples. With the addition of the individual-level data, however, the in-sample predictions of the parametric model improve substantially. Furthermore, the predictions of the nonparametric model are also more accurate than those of existing methods in terms of all three discrepancy measures. The performance of King's EI model and Wakefield's model is reasonable, but not as good as that of the nonparametric model. Finally, the neighborhood models do not work well in this application, and simply using the midpoint of a bound as an estimate gives better results than some methods.


View this table:
[in this window]
[in a new window]

 
Table 3 In-sample predictive performance of various models on voter registration data

 
For our two models, the posterior predictive distribution serves as a basis for population inferences. Figure 4 compares the out-of-sample predictive performance of our models, with and without the addition of individual-level data. In this application, the true distribution of W1 and W2 is unknown, so we approximate it by a kernel smoothing technique using the sample values. The nonparametric model estimates the marginal density of W2 very well, whereas its density estimate for W1 is slightly off. This is expected because the bounds of W2 are more informative than those of W1. In contrast, the estimated marginal densities based on our parametric model are not accurate. With the addition of the individual-level data, the nonparametric model now recovers the density of W1 and the density estimation of W2 is further improved. The parametric model still gives a poor estimate even after adding the individual-level data.


Figure 4
View larger version (12K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Fig. 4 Out-of-sample predictive performance of selected models on voter registration data. The true density is represented by the shaded area. The solid and dashed lines represent the estimated density without and with the additional survey data information, respectively.

 
5.2 Contextual Effects
Next, we investigate the possibility of correcting contextual effects through a simulation study and an empirical investigation of the data set on race and literacy.

5.2.1 A simulation study
To avoid other confounding issues, we simulate a data set under a parametric assumption. We also assume that a covariate Z is an aggregate-level variable, which is expressed in terms of proportion. We start by generating the logit-transformed values of (Wi1, Wi2, Xi, Zi), denoted by (Wi1*, Wi2*, Xi*, Zi*) with the sample size of 500. To do this, we first draw Zi* independently from a univariate normal distribution with mean –0.85 and variance 0.5. We then compute Wi* = BZi* + {varepsilon}i1, where B is a (2 x 2) matrix with the first diagonal element equal to 0.85, the second diagonal element equal to –0.85, and the off-diagonal elements equal to 0. {varepsilon}i1 is a (2 x 1) vector independently drawn from a bivariate normal distribution with mean (0, 0), variance (0.5, 0.5), and covariance 0.2. For simplicity, we do not include an intercept.

Next, we construct X* as a nonlinear function of Z*. In particular, Xi* = 2Zi* + 0.5Zi*2 + {varepsilon}i2, where {varepsilon}i2 is a independent draw from a univariate normal distribution with mean 0 and variance 0.5. We then take the inverse-logit transformation of Wi*, Xi*, and Zi* to obtain Wi, Xi, and Zi. Finally, applying equation (1), we obtain the value of Yi. We also generate a spurious covariate Formula which is independent of Z, in order to investigate the effect of model misspecification. Formulai is obtained by sampling independently from a normal distribution with mean 0 and variance 0.5 and then taking its inverse-logit transformation.

In this simulation example, X and W are correlated through Z. The sample correlation between X and W1 is 0.39 and that between X and W2 is –0.53. Moreover, the average bounds length for W1 is 0.7 and for W2 is 0.4, suggesting that W1 is more coarsened than W2. Finally, the sample means of W1 and W2 are 0.35 and 0.64, respectively.

To examine the performance of various models, we first fit the true model, which is the parametric CCAR model given Z. We then fit four other parametric models (CAR, CCAR given X, CCAR given Formula, and NCAR). To estimate the proposed Bayesian models, we adopt diffuse prior distributions. In particular, for the parametric CAR model, we use the same prior specifications described in Section 3.1. For the parametric CCAR model, the prior parameters are B0 = 0, A0 = I2, {nu}0 = 7, and S0 = 10I2; whereas for the parametric NCAR model, our choice of diffuse prior distribution is defined by {eta}0 = 0, {tau}0 = 2, {nu}0 = 5, and S0 = 13I2.

Table 4 presents the bias and RMSE of the in-sample predictions based for each parametric model. As we expected, when the correct covariate is controlled, the CCAR model yields the smallest bias and RMSE. In contrast, incorrectly conditioning on Formula results in poor in-sample predictions. In this particular example, since Formula is independent from Z, conditioning on Formula does not correct the correlation between X and W. Therefore, the CCAR model behaves like the CAR model. In other cases that are not shown here, when Formula is correlated with Z, the in-sample predictions could be even further off from the true values when compared to the CAR model. On the other hand, the parametric NCAR model has a reasonable performance even though it does not incorporate covariate information. Since X is not a linear function of Z at the logit scale, the NCAR model assuming a trivariate normal distribution is not properly specified. Nevertheless, the precision of the in-sample predictions of W2 based on this model still improves substantially comparing to those based on the CAR model and the misspecified CCAR models.


View this table:
[in this window]
[in a new window]

 
Table 4 In-sample predictive performance of the parametric models on simulated data when X and W are independent given Z

 
5.2.2 Race and literacy
In many studies, the CAR assumption is clearly violated. The straightforward solution in this scenario is to use the CCAR framework; however, this approach may not be feasible for various reasons. For instance, lack of knowledge of the underlying processes would leave the CCAR model vulnerable to misspecification. In other situations, the model may be known but the necessary variables may be unavailable. Therefore, it is of practical importance to consider the analysis under the NCAR assumption.

Here, we reexamine a classical ecological inference problem of black illiteracy rates in 1910 in order to assess the performance of the NCAR models. This study is introduced by Robinson (1950), which is the first article to formally examine the fallacy of ecological inference. Using an empirical example, Robinson (1950) demonstrates that there is not necessarily a correspondence between aggregate- and individual-level correlations. The original study is done based on the state-level data with only 48 observations. To better examine this problem, King (1997) coded the county-level data from the paper records of the 1910 census. In this extended data set, there are 1040 counties. The data set includes the proportion of the residents over 10 years of age who are black Xi, the proportion of those who can read Yi, the county population size Ni, and the true values of the black literacy rate W1 and the white literacy rate W2 with sample mean 68% and 92% for W1 and W2, respectively.

Following Robinson (1950), we compare the aggregate correlation between race and literacy with its individual-level counterpart. We first calculate the aggregate correlation as the sample correlation between Xi and Yi for all the counties. The resulting correlation is –0.733, which is very high. Using the true values of Wi and the number of population in each county, we construct a race x literacy table, which contains the total number of people in each race by literacy category summing over all 1040 counties. Then we compute the Pearson's correlation coefficient for this 2 x 2 table, which measures the individual correlation between race and literacy. The resulting individual-level correlation is –0.339, indicating only a mild association between being black and illiterate in 1910 when compared with the aggregate correlation. As demonstrated by Robinson (1950), the large gap between the aggregate and individual correlations implies that we cannot simply use the former to infer the latter.

In this data set, the black literacy rate is negatively correlated with the percentage of black population X (the sample correlation is –0.51), whereas the white literacy rate is only slightly correlated with X (the sample correlation is 0.17). This suggests that the CAR assumption is likely to be violated. Given the presence of the contextual effect, it is of interest to investigate whether models under the CAR assumption will yield a biased estimate of the individual correlation. We also examine whether the NCAR models can reduce such bias. Moreover, since the parametric assumption about the joint distribution of (W1*, W2*, X*) is rather strong, we also study whether the precision of in-sample predictions can be improved by using the nonparametric NCAR model. For the purpose of comparison, we also fit the parametric and nonparametric models under the CAR assumption. To estimate the parametric CAR and NCAR models, we use the same diffuse prior specifications as in Section 3.1. For the nonparametric CAR and NCAR models, the corresponding diffuse prior distributions used in the parametric models are used as the base prior distribution of the Dirichlet processes prior. We also use a diffuse prior distribution for the concentration parameter {alpha}, that is, {Gamma}(1, 0.1). Table 5 presents the results. As expected, the NCAR models outperform the CAR models and the other models in terms of both bias and RMSE.

Finally, we estimate the individual correlation between race and literacy based on the in-sample predictions of our CAR and NCAR models as well as King's EI models and ecological regression. The results are shown in Table 6. The NCAR models perform best, yielding the estimated individual correlations of –0.341 and –0.359, respectively. In particular, the estimate based on the nonparametric model is very close to the true observed correlation –0.339. In contrast, the estimates based on the other models deviate further from the true value.


View this table:
[in this window]
[in a new window]

 
Table 6 Estimated individual correlations based on different models

 
5.3 Aggregation Effects
Although aggregation effects are inherent to ecological inference problems and cannot be remedied by statistical techniques, the analysis below exhibits the amount of missing information present in the extended literacy data set we analyzed in Section 5.2.

To keep the example simple, we model the literacy rate within the parametric framework, using the CAR assumption. First, the parameters are estimated and the amount of missing information is quantified for the entire data set, without additional survey data. Next, the data set is supplemented with survey data at amounts ranging from 5% to 15% at 5% intervals—the survey data replace the original data at each record, keeping the overall sample size constant at 1040. The survey data are added to random data points, and the simulation is repeated 20 times at each level of supplemental data. This design results in 60 simulations with survey data, plus one simulation without.

The results of the simulations are presented in Table 7. The literacy rate parameter estimates (Formula1 and Formula2) are logit transformed (e.g., the estimated population black literacy rate in the unsupplemented example is 66%). The amount of missing information is greater for the literacy of blacks than of whites because, on average, blacks make up a lower percent of county populations, resulting in weaker bounds. As survey information is added to the data set, the percent of missing information monotonically decreases. Furthermore, the point estimates of the parameter generally become more accurate with increases in supplemental data. However, even with 15% of the data set containing the actual disaggregated data, more than 50% of information is missing for each parameter estimate and the complete-data estimates (right-most column of Table 7) of two parameters (Formula1 and {sigma}2) lie outside their 95% confidence intervals.


View this table:
[in this window]
[in a new window]

 
Table 7 Parameter estimates and fraction of missing information for varying levels of supplemental data for the race/literacy data set

 
In addition to parameter estimates, we quantify the amount of missing information in hypothesis testing. For this example, the null hypothesis is that the population white literacy rate and black literacy rate are equal, that is, H01 = µ2. This restriction is a more general constraint than the "neighborhood" model, in which the literacy rates are equal within each county (Wi1 = Wi2). The (observed) likelihood ratio test statistic (double the numerator of equation 11) is presented in the penultimate row of Table 7. The gap between the likelihoods of the constrained and unconstrained parameter estimates grows (suggesting stronger evidence against the null hypothesis) as more individual-level data are available. Although the fraction of missing information for the hypothesis test begins at the relatively large value of 84%, the decline in this fraction over the amount of supplemental data is steeper than for the parameter estimates.


    6. Concluding Remarks
 Top
 1. Introduction
 2. Theoretical Framework for...
 3. A Nonparametric Model...
 4. Quantifying the Aggregation...
 5. Simulation Studies and...
 6. Concluding Remarks
 Funding
 Appendices: Computational...
 Notes
 References
 
In this article, we show that by formulating an ecological inference problem as an incomplete-data problem, the three key factors that influence ecological inference—aggregation, distributional, and contextual effects—can be formally identified. The proposed framework shows that although distributional and contextual effects can be adjusted by statistical methods, it is the data aggregation that causes the fundamental difficulty of ecological inference and makes the statistical adjustment of the other two factors difficult in practice.

We address each of these three factors. First, to deal with distributional effects, we extend our basic parametric model and propose a Bayesian nonparametric model for ecological inference in 2 x 2 tables. The simulation studies and an empirical example demonstrate that in general the nonparametric model outperforms parametric models by relaxing distributional assumptions. Second, we also demonstrate that contextual effects can be addressed under the proposed parametric and nonparametric models in a relatively straightforward manner. In particular, we show that this task can be accomplished even when extra covariate information is not available. Third, although aggregation effects cannot be statistically adjusted, we demonstrate how to quantify the information loss due to data aggregation in ecological inference. We offer computational methods to quantify the amount of missing information in the context of both parameter estimation and hypothesis testing.

It is important to emphasize that when the aggregation effects are too severe and bounds are too wide, any ecological inference models including our proposed methods are likely to fail. In such situations, the comparison of the predictive distribution of Y from the fitted model against its observed marginal distribution may be able to rule out some of the misspecified models, but the data will not contain enough information to nail down the correct model specification.

Finally, the theoretical framework developed in this article applies more generally to R x C ecological inference problems where R ≥ 2 and C ≥ 2. However, since Wi is of higher dimension in these cases, modeling the three factors simultaneously and detecting possible model misspecification are even more challenging tasks for large ecological tables than for the 2 x 2 tables considered in this article. Although in theory our nonparametric modeling approach can be extended to larger ecological tables, we believe that such a modeling strategy may not work well in practice due to the lack of information in large ecological tables. Strong parametric assumptions may be necessary when making such inferences.


    Funding
 Top
 1. Introduction
 2. Theoretical Framework for...
 3. A Nonparametric Model...
 4. Quantifying the Aggregation...
 5. Simulation Studies and...
 6. Concluding Remarks
 Funding
 Appendices: Computational...
 Notes
 References
 
National Science Foundation (SES–0550873); Princeton University Committee on Research in the Humanities and Social Sciences.


    Appendices: Computational Details
 Top
 1. Introduction
 2. Theoretical Framework for...
 3. A Nonparametric Model...
 4. Quantifying the Aggregation...
 5. Simulation Studies and...
 6. Concluding Remarks
 Funding
 Appendices: Computational...
 Notes
 References
 
Appendix A: The EM and ECM algorithms
In this appendix, we describe the EM and ECM algorithms we developed in order to obtain the ML estimates of the proposed models. The algorithm starts with an arbitrary initial value of parameters, {theta}(0), and repeats the expectation step (or E-step) and the maximization step (M-step) until a satisfactory degree of convergence is achieved. The ECM algorithm replaces the M-step with the conditional M-steps where the parameters are divided into smaller subsets and each subset is maximized conditional on the current values of other parameters.

A.1 E-step
At the (t + 1)th iteration, our E-step for the CAR model requires the integration of the complete-data log likelihood, that is, lcom={sum}i=1n log f(Wi | {zeta}), with respect to the missing data, W, over its conditional distribution given the observed data, (Y, X), and the value of parameters from the previous iteration, {theta}(t). Thus, we compute,

Formula
where, for j, j' = 1, 2, Sj(t) = {sum}i = 1n E(Wij* | Xi, Yi, {theta}(t)) and Sjj'(t) = {sum}i = 1n E(Wij* Wij'* | Xi, Yi, {theta}(t)) are the expected values of sufficient statistics with respect to Wi* over its conditional distribution, p(Wi* | Yi, Xi, {theta}(t)).

Since SFormula and Sjj'(t) are not available in a closed form, we use the numerical integration to compute the following integral,

Formula (A1)
where m(Wi*) is a function determined by each of the sufficient statistics and {kappa} (Wi* | {theta}(t)) is the kernel of the bivariate normal density function. Equation (A1) can be viewed as a line integral over a scalar field (e.g., Larson, Hostetler, and Edwards 2002). We express Wi* as a function of a new variable t isin (0, 1), that is, Wij*(t) = logit[tWijU + (1 – t)WijL], for j = 1, 2, where WijU = supWij and WijL = infWij are the upper and lower bounds of Wij given in equation (2). Then, we reexpress the integral as,

Formula
where the integral is taken with respect to t isin (0, 1). This numerical integration can be accomplished using a standard one-dimensional finite numerical integration routine. Furthermore, the accuracy of this numerical integration can be checked by computing E(Wi1* | Xi, Yi, {theta}(t)) and E(Wi2* | Xi, Yi, {theta}(t)), separately and then investigating whether equation (1) holds with these conditional expectations.

The E-step of the NCAR model is similar to that of the CAR model. The difference is that the conditional distribution of the missing data given the observed data and the values of the parameters from the previous iteration p(Wi* | Yi, Xi, {theta}(t)) are different. In particular, {kappa}(Wi* | {theta}(t)) in equation (A1) is replaced with {kappa}(Wi* | {theta}(t), Xi*), which is the kernel of the bivariate normal distribution with the marginal means equal to Formula the marginal variances equal to {sigma}1(t) (1 – {rho}13(t)2) and {sigma}2(t)(1 – {rho}23(t)2), and the correlation coefficient equal to Formula

A.2 M-step
Once the expected values of sufficient statistics are computed, the M-step is a straightforward application of the standard result available in the literature. Namely, for the CAR model, we have

Formula (A2)
where Tjj'(t) = Sjj'(t)Sj(t)Sj'(t)/n for j, j' = 1, 2.

The M-step of the NCAR model is also similar to that of the CAR model. First the two parameters µ3 and {sigma}3 do not need to be updated in each iteration because their ML estimates are available in the closed form, that is, Formula3 = {sum}i = 1n Xi*/n and Formula3 = {sum}i = 1n(Xi* –Formula3)2/n, respectively. Furthermore, µ1, µ2, {sigma}1, {sigma}2, and {rho}12 can be updated in the same way as specified in equation (A2). The remaining correlation parameters are updated as Formula where Sj3(t) = {sum}i = 1n Xi* E[Wij* | Yi, Xi, {theta}(t)], for j = 1, 2. Like the CAR model, the convergence of the NCAR model is monitored in terms of the transformed parameters, µj, log {sigma}j, and 0.5 log [(1 + {rho}jj')/(1 – {rho}jj')] for all j, j' with j != j'.

A.3 CM-step
To conduct the CM-steps at the (t + 1)th iteration, we first maximize the regression coefficients, ß, given the conditional variance {Sigma}(t), via

Formula (A3)
Given ß(t+1), we update {Sigma} as follows,

Formula (A4)

Finally, when monitoring the convergence, we transform the variance parameters and the correlation parameter so that they are not bounded; we use the logarithm of the variances, that is, log {sigma}j for j = 1, 2, and the Fisher's Z transformation of the correlation parameter, that is, 0.5 log [(1 + {rho})/(1 – {rho})], to improve the normal approximation.

Appendix B: The MCMC Algorithms
In this section, we describe our MCMC algorithms to fit the proposed Bayesian parametric and nonparametric models. We focus on the CAR models but similar algorithms can be applied to the NCAR models.

B.1 The parametric model
To sample from the joint posterior distribution p(Wi*, µ, {Sigma} | Y, X), we construct a Gibbs sampler. First, we draw Wi from its conditional posterior density, which is proportional to,

Formula (B1)
if (Wi1, Wi2) isin (0, 1), otherwise the density is equal to 0. Although equation (B1) is not the density of a standard distribution, it has a bounded support because (Wi1, Wi2) lies on a bounded line segment. Therefore, we can use the inverse-cumulative distribution function method by evaluating equation (B1) on a grid of equidistant points on a tomography line. Given a sample of Wi, we then obtain Wi* via the logit transformation. Alternatively, Metropolis-Hastings or importance sampling algorithms can be used, although they require separate tuning parameters or target densities for each observation.

Next, we draw (µ, {Sigma}) from their conditional posterior distributions. Note that the observed data (Yi, Xi) are redundant given Wi*. The augmented-data conditional posterior distribution has the form of a standard bivariate normal/inverse-Wishart model, p(µ, {Sigma} | Wi*) {propto} p(µ | {Sigma})p({Sigma}) prodi = 1n p(Wi* | µ, {Sigma}). This implies that conditioning on Wi*, sampling (µ, {Sigma}) can be done using the following standard distributions, Formula where Formula and Formula

B.2 The nonparametric model
We construct a Gibbs sampler in order to sample from the joint posterior distribution p(W*, µ, {Sigma}, {alpha}|Y). First, we independently sample Wi for each i and transform it to obtain Wi* in the same way as above, but we replace (µ, {Sigma}) with (µi, {Sigma}i) in equation (B1). Then, given the draw of Wi*, the augmented-data model can be estimated through a multivariate generalization of the density estimation method of Escobar and West (1995). In our Gibbs sampler, we sample (µi, {Sigma}i) given (µ(i), {Sigma}(i), W*, {alpha}) for each i and then update {alpha} based on the new values of (µi, {Sigma}i).

An application of the usual calculation due to Antoniak (1974) shows that the conditional posterior distribution of (µi, {Sigma}i) given Wi* is given by the following mixture of Dirichlet processes,

Formula
where Gii, {Sigma}i) is the posterior distribution under G0 which is a normal/inverse-Wishart distribution with components,

Formula

Next, following West, Müller, and Escobar (1994), we derive the weights q0 and qj by computing the marginal (augmented data) likelihood p(Wi* | µi, {Sigma}i) and p(Wi* | µj, {Sigma}j), respectively,

Formula
where {sum}j = 0, j != in qj = 1. q0 is proportional to the bivariate t density with ({nu}0 – 1) degrees of freedom, the location parameter µ0, and the scale matrix S0 (1 + {tau}02)/{ {tau}02({nu}0 – 1)}. qj is proportional to the bivariate normal density with mean µj and variance {Sigma}j.

Given these weights, we can approximate p(µ, {Sigma}|W*) via a Gibbs sampler by sampling (µi, {Sigma}i) given (µ(i), {Sigma}(i), Wi*) for each i. This step creates clusters of units where some units share the same values of the population parameters. At a particular iteration, we have J ≤ n clusters each of which has nj units with {sum}j = 1J nj = n. Note that the number of clusters J can vary from one iteration to another. Bush and MacEachern (1996) recommend adding the "remixing" step to prevent the Gibbs sampler from repeatedly sampling a small set of values. In our application, we update the new values of the parameters (µi, {Sigma}i) by using the newly configured cluster structure. That is, for each cluster j, we update the parameters with Formulaj, Formulaj) by drawing them from the following conditional distribution,

Formula
where Formula Given these new draws, we set µi = µj* and {Sigma}i = {Sigma}j* for each i that belongs to the jth cluster.

Finally, to update {alpha}, we use the algorithm developed by Escobar and West (1995). Namely, the conditional posterior distribution of {alpha} has the form of the following gamma mixture,

Formula
where {omega} = (a0 + J – 1)/{n(b0 – log {eta})}, and {eta} is a latent variable that follows a beta distribution, Formula ({alpha} +1, J). This completes one cycle of our Gibbs sampler.


View this table:
[in this window]
[in a new window]

 
Table 5 In-sample predictive performance of various models on literacy data when X and W are correlated

 


    Notes
 Top
 1. Introduction
 2. Theoretical Framework for...
 3. A Nonparametric Model...
 4. Quantifying the Aggregation...
 5. Simulation Studies and...
 6. Concluding Remarks
 Funding
 Appendices: Computational...
 Notes
 References
 
Authors' note: This article is in the part based on two working papers by Imai and Lu, "Parametric and Nonparamateric Bayesian Models for Ecological Inference in 2 x 2 Tables" and "Quantifying Missing Information in Ecological Inference." Various versions of these papers were presented at the 2004 Joint Statistical Meetings, the Second Cape Cod Monte Carlo Workshop, the 2004 Annual Political Methodology Summer Meeting, and the 2005 Annual Meeting of the American Political Science Association. We thank anonymous referees, Larry Bartels, Wendy Tam Cho, Jianqing Fan, Gary King, Xiao-Li Meng, Kevin Quinn, Phil Shively, David van Dyk, Jon Wakefield, and seminar participants at New York University (the Northeast Political Methodology conference), at Princeton University (Economics Department and Office of Population Research), and at the University of Virginia (Statistics Department) for helpful comments.

1 See Cross and Manski (2002) and Judge, Miller, and Cho (2004) for alternative approaches, which are not based on the likelihood function. Back

2 See Brown and Payne (1986); King, Rosen, and Tanner (1999); and Wakefield (2004a) for models of counts. Back

3 See Imai and King (2004) for an alternative approach based on the Bayesian model averaging. Back


    References
 Top
 1. Introduction
 2. Theoretical Framework for...
 3. A Nonparametric Model...
 4. Quantifying the Aggregation...
 5. Simulation Studies and...
 6. Concluding Remarks
 Funding
 Appendices: Computational...
 Notes
 References
 

    Achen CH, Shively WP. Cross-level inference (1995) Chicago, IL: University of Chicago Press.

    Antoniak CE. Mixtures of Dirichlet processes with applications to Bayesian nonparametric problems. The Annals of Statistics (1974) 2:1152–74.

    Benoit Kenneth, King Gary. EzI: A(n easy) program for ecological inference. Cambridge, Mass.: Harvard University. (2003) Available from: http://gking.harvard.edu. (accessed August 8, 2007).

    Brown PJ, Payne CD. Aggregate data, ecological regression, and voting transitions. Journal of the American Statistical Association (1986) 81:452–60.[CrossRef][Web of Science]

    Burden BC, Kimball DC. A new approach to the study of ticket splitting. American Political Science Review (1998) 92:533–44.[CrossRef][Web of Science]

    Bush CA, MacEachern SN. A semiparametric Bayesian model for randomized block designs. Biometrika (1996) 83:275–85.[Abstract/Free Full Text]

    Cho WKT. Iff the assumption fits...: A comment on the King ecological inference solution. Political Analysis (1998) 7:143–63.[Abstract/Free Full Text]

    Cho WKT, Gaines BJ. The limits of ecological inference: The case of split-ticket voting. American Journal of Political Science (2004) 48:152–71.[Web of Science]

    Copas J, Eguchi S. Local model uncertainty and incomplete-data bias. Journal of the Royal Statistical Society, Series B (Methodological) (2005) 67:459–513.[CrossRef]

    Cross PJ, Manski CF. Regressions, short and long. Econometrica (2002) 70:357–68.[CrossRef][Web of Science]

    Dempster AP, Laird NM, Rubin DB. Maximum likelihood from incomplete data via the EM algorithm (with discussion). Journal of the Royal Statistical Society, Series B, Methodological (1977) 39:1–37.[Web of Science]

    Dey D, Müller P, Sinha D, eds. Practical nonparametric and semiparametric Bayesian statistics (1998) New York: Springer-Verlag Inc.

    Duncan OD, Davis B. An alternative to ecological correlation. American Sociological Review (1953) 18:665–6.[Medline]

    Escobar MD, West M. Bayesian density estimation and inference using mixtures. Journal of the American Statistical Association (1995) 90:577–88.[CrossRef][Web of Science]

    Ferguson TS. A Bayesian analysis of some nonparametric problems. The Annals of Statistics (1973) 1:209–30.

    Freedman DA, Klein SP, Sacks J, Smyth CA, Everett CG. Ecological regression and voting rights (with discussion). Evaluation Review (1991) 15:673–816.[Abstract/Free Full Text]

    Freedman DA, Ostland M, Roberts MR, Klein SP. Review of "A Solution to the Ecological Inference Problem". Journal of the American Statistical Association (1998) 93:1518–22.[CrossRef]

    Gelman A, Park DK, Ansolabehere S, Price PN, Minnite LC. Models, assumptions and model checking in ecological regressions. Journal of the Royal Statistical Society, Series A (2001) 164:101–18.

    Gill J, Casella G. Markov chain Monte Carlo methods for models with nonparametric priors (2006) University of California, Davis: Technical report.

    Goodman L. Ecological regressions and behavior of individuals. American Sociological Review (1953) 18:663–6.[CrossRef]

    Grofman B. Statistics without substance: A critique of Freedman et al. and Clark and Morrison. Evaluation Review (1991) 15:746–69.[Abstract/Free Full Text]

    Heitjan DF, Rubin DB. Ignorability and coarse data. The Annals of Statistics (1991) 19:2244–53.

    Herron MC, Shotts KW. Logical inconsistency in EI-based second stage regressions. American Journal of Political Science (2004) 48:172–83.[Web of Science]

    Imai K, King G. Did illegal overseas absentee ballots decide the 2000 U.S. presidential election? Perspectives on Politics (2004) 2:537–49.

    Imai K, Lu Y, Strauss A. eco: R package for ecological inference in 2 x 2 tables. In: Journal of Statistical Software. (forthcoming).

    Judge GG, Miller DJ, Cho WKT. An information theoretic approach to ecological estimation and inference. In: Ecological inference: New methodological strategies—King G, Rosen O, Tanner M, eds. (2004) Cambridge: Cambridge University Press. 162–87.

    King G. A solution to the ecological inference problem: Reconstructing individual behavior from aggregate data (1997) Princeton, NJ: Princeton University Press.

    ———. Comment on "review of ‘a solution to the ecological inference problem’." Journal of the American Statistical Association (1999) 94:352–5.[CrossRef][Web of Science]

    King G, Rosen O, Tanner MA. Binomial-beta hierarchical models for ecological inference. Sociological Methods & Research (1999) 28:61–90.[Abstract/Free Full Text]

    King G, Rosen O, Tanner MA, eds. Ecological inference: New methodological strategies (2004) Cambridge: Cambridge University Press.

    Kong A, Meng X-L, Nicolae DL. Quantifying relative incomplete information for hypothesis testing in statistical and genetic studies (2005) Department of Statistics, Harvard University: Unpublished manuscript.

    Larson R, Hostetler RP, Edwards BH. Calculus: Early transcendental functions (2002) 3rd ed. Boston, MA: Houghton Mifflin Company.

    Meng X-L, Rubin DB. Using EM to obtain asymptotic variance-covariance matrices: The SEM algorithm. Journal of the American Statistical Association (1991) 86:899–909.[CrossRef][Web of Science]

    ———. Maximum likelihood estimation via the ECM algorithm: A general framework. Biometrika (1993) 80:267–78.[Abstract/Free Full Text]

    Mukhopadhyay S, Gelfand AE. Dirichlet process mixed generalized linear models. Journal of the American Statistical Association (1997) 92:633–9.[CrossRef][Web of Science]

    Neyman J, Scott EL. Consistent estimation from partially consistent observations. Econometrica (1948) 16:1–32.[CrossRef]

    Orchard T, Woodbury MA. A missing information principle: Theory and applications. Proceedings of the 6th Berkeley Symposium on Mathematical Statistics and Probability (1972) 1:697–715.

    Robinson WS. Ecological correlations and the behavior of individuals. American Sociological Review (1950) 15:351–7.[CrossRef]

    Rosen O, Jiang W, King G, Tanner MA. Bayesian and frequentist inference for ecological inference: The R x C case. Statistica Neerlandica (2001) 55:134–56.[CrossRef][Web of Science]

    van Dyk DA, Meng X-L, Rubin DB. Maximum likelihood estimation via the ECM algorithm: Computing the asymptotic variance. Statistica Sinica (1995) 5:55–75.[Web of Science]

    Wakefield J. Ecological inference for 2 x 2 tables (with discussion). Journal of the Royal Statistical Society, Series A (2004a) 167:385–445.

    ———. Prior and likelihood choices in the analysis of ecological data. In: Ecological inference: New methodological strategies—Gary King, Ori Rosen, Martin Tanner, eds. (2004b) Cambridge: Cambridge University Press. 13–50.

    West M, Müller P, Escobar MD. Hierarchical priors and mixture models, with application in regression and density estimation. In: Aspects of uncertainty: A tribute to D. V. Lindley—Smith AFM, Freedman PR, eds. (1994) London: John Wiley & Sons. 363–86.


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?



This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (PDF) Freely available
Right arrow All Versions of this Article:
16/1/41    most recent
mpm017v1
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Imai, K.
Right arrow Articles by Strauss, A.
Right arrow Search for Related Content
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?