Text as Data¶
Text as data refers to the use of natural language processing and AI techniques to convert unstructured text into quantitative data for economics and social science research.
Context & Background¶
Text analysis has become increasingly important in economics, with applications ranging from analyzing Federal Reserve communications to measuring media sentiment. AI tools dramatically lower the barrier to entry for text analysis:
- Sentiment analysis: Measuring tone in earnings calls, policy statements, news articles
- Topic modeling: Identifying themes in large document collections
- Named entity extraction: Pulling structured data (companies, people, dates) from text
- Classification: Categorizing documents by type, topic, or relevance
- Summarization: Condensing large text corpora into analyzable summaries
Practical Implications¶
- LLMs as annotators: Use AI to classify or score text, replacing manual coding
- Validate against human coding: Compare AI classifications with human-coded samples
- Document your prompts: The prompt used for text analysis is a methodological choice — report it
- Consider scale: AI enables text analysis at scales previously impractical