Political Analysis Advance Access originally published online on February 16, 2009
Political Analysis 2008 16(4):372-403; doi:10.1093/pan/mpn018
| ||||||||||||||||||||||||||||||||||||||||||||||||||||
This article appears in the following Political Analysis issue: Special Issue: The Statistical Analysis of Political Text [View the issue table of contents]
Fightin' Words: Lexical Feature Selection and Evaluation for Identifying the Content of Political Conflict
Department of Political Science, Quantitative Social Science Initiative, The Pennsylvania State University
Department of Political Science, Michigan State University, e-mail: colaresi{at}msu.edu
Department of Government and Institute for Quantitative Social Science, Harvard University, e-mail: kevin_quinn{at}harvard.edu
e-mail: burtmonroe{at}psu.edu (corresponding author)
Entries in the burgeoning "text-as-data" movement are often accompanied by lists or visualizations of how word (or other lexical feature) usage differs across some pair or set of documents. These are intended either to establish some target semantic concept (like the content of partisan frames) to estimate word-specific measures that feed forward into another analysis (like locating parties in ideological space) or both. We discuss a variety of techniques for selecting words that capture partisan, or other, differences in political speech and for evaluating the relative importance of those words. We introduce and emphasize several new approaches based on Bayesian shrinkage and regularization. We illustrate the relative utility of these approaches with analyses of partisan, gender, and distributive speech in the U.S. Senate.
Author's note: We would like to thank Mike Crespin, Jim Dillard, Jeff Lewis, Will Lowe, Mike MacKuen, Andrew Martin, Prasenjit Mitra, Phil Schrodt, Corwin Smidt, Denise Solomon, Jim Stimson, Anton Westveld, Chris Zorn, and participants in seminars at the University of North Carolina, Washington University, and Pennsylvania State University for helpful comments on earlier and related efforts. Any opinions, findings, and conclusions or recommendations expressed in the paper are those of the authors and do not necessarily reflect the views of the National Science Foundation.