LLM Collaboration and Reasoning -- Generative AI for Economic Research

Author/Source: Anton Korinek, December 2024 update to "Generative AI for Economic Research: Use Cases and Implications for Economists," published in the Journal of Economic Literature 61(4).
Original: https://www.genaiforecon.org/llm-collaboration.html

Key Ideas¶

The LLM frontier as of late 2024 features GPT-4o, Gemini 1.5 Pro, Grok-2, Claude 3.5 Sonnet, Llama 3.1, and Qwen 2.5 -- with remarkably small performance gaps between the top models and rapid catch-up by open-source alternatives.
Reasoning advances (OpenAI's o1 series) represent a paradigm shift: models now employ chain-of-thought and tree-search techniques via hidden reasoning tokens, dramatically improving performance on math, coding, and complex analytical tasks.
Workspace tools for interactive collaboration have proliferated: Claude Artifacts, ChatGPT Canvas, Advanced Data Analysis, NotebookLM, Cursor, and Microsoft/Google office integrations.
Autonomous computer use (Anthropic's Claude "Computer Use") gives LLMs the ability to control a desktop, representing a step toward full AI agency.
LLM-based research tools like EDSL (Expected Parrot Domain-Specific Language) for surveys/simulations and the AI Scientist for automated paper generation point toward future research automation.
Practical considerations: data confidentiality varies by provider (Anthropic is safest by default; Google warns against confidential input); watermarking (Google's SynthID); and reproducibility challenges from randomness, parallelization, and model deprecation.
LLM operating costs have declined 83-92% since GPT-4's launch while quality has steadily improved.

Summary¶

Anton Korinek's comprehensive update to his Journal of Economic Literature article surveys the state of LLM technology and its implications for economic research as of late 2024. The article is organized around several themes: an overview of leading models (both proprietary and open-source), advances in reasoning capabilities, new access modes and workspace environments, and practical considerations for researchers.

On the model landscape, Korinek documents that the gap between top proprietary models has narrowed significantly, with LLMs becoming "more of a commodity" in the words of Satya Nadella. Open-source models (Meta's Llama 3.1 and Alibaba's Qwen 2.5) have closed much of the gap with proprietary leaders, offering benefits of transparency, customizability, privacy, and reproducibility for research. The speed of progress is striking: GPT-4-level model costs have fallen 83-92% in under two years while quality has improved substantially.

The section on reasoning is particularly significant. Korinek explains how OpenAI's o1 series moves beyond token-by-token prediction toward genuine multi-step reasoning via chain-of-thought and tree-of-thought techniques. He provides a detailed example of o1-preview successfully log-linearizing a complex bond pricing equation -- a task far beyond earlier LLMs -- and coding a solution to the Ramsey growth model. These advances change the economics of AI itself: o1's heavy inference-time computation means higher variable costs compared to traditional LLMs, shifting the cost structure from training-dominated to inference-dominated.

The article surveys an expanding ecosystem of workspace tools that move beyond simple chat: Anthropic's Artifacts and Claude Analysis Tool, OpenAI's Canvas and Advanced Data Analysis, Google's NotebookLM, Cursor for AI-assisted coding, and office integrations from Microsoft and Google. It also covers emerging capabilities like real-time voice assistants, autonomous computer use, and dedicated research tools. Practical sections address data confidentiality across providers, watermarking of AI outputs, reproducibility challenges, and the author's view that mandatory AI disclosure requirements for journal submissions are unnecessary and potentially counterproductive.

Relevance to Economics Research¶

This is the most comprehensive academic survey of LLM capabilities relevant to economics research, published in the field's top survey journal. Korinek's detailed examples -- log-linearizing macro equations, solving the Ramsey model, generating structured economic data -- demonstrate concrete value for theory, empirical, and computational economics. The practical guidance on confidentiality, reproducibility, and model selection is immediately useful. The article's framework for understanding reasoning advances (chain-of-thought, tree-of-thought, o1-style inference scaling) helps researchers understand when and why to deploy different models for different research tasks.

LLM Collaboration and Reasoning -- Generative AI for Economic Research

Key Ideas¶

Summary¶

Relevance to Economics Research¶

Related Concepts¶

Related Summaries¶