WorldSeed: A Multi-Agent World Engine for Emergent Outcomes

  • Author/Source: Jie Ding, AIScientists-Dev, GitHub
  • Original: https://github.com/AIScientists-Dev/WorldSeed

  • Key Ideas

  • "Don't build a workflow. Seed a world." WorldSeed is a scene-agnostic multi-agent engine: define roles, rules, private information, actions, and consequences, then let agents interact until useful artifacts emerge.
  • Same engine runs production rooms, simulations, games, and fictional worlds — the scene is just YAML. The author's headline formula: rules + different agents + consequences → emergence.
  • Demo scenes:
    • Autoresearch — a cohort of specialist agents tackles an open research question. Goal: lower val_loss on a 5M GPT trained on TinyStories. In 11 hours: 100 hypotheses, 86 experiments, 72 peer-reviewed papers, val_loss down 24.7%. Every paper is end-to-end auditable (hypothesis → commit → experiment → verified result → citations → reviewer reasoning), forming a search-evolution graph.
    • AI Tool Pilot Lab — one agent studies a new API; builders create competing demos; critics reject generic output; audience agents judge usefulness; curator ships the strongest artifact with its trail of attempts.
    • AI Layoffs — a 30% layoff scenario where outgoing workers must "distill" their expertise into an AI Skill (honestly, or with backdoors); stayers absorb higher workload. Four characters with conflicting private agendas.
    • Teahouse Espionage — same engine, different YAML — a fictional espionage world with an isometric map and event stream dashboard.
  • Emergent behaviors observed, not configured. In the Autoresearch run: the data specialist exhausted easy wins in her own area and started drafting hypotheses in teammates' territory (attention design, second-order optimization). Other agents stayed in their lanes. Nothing in the config told her to drift.

  • Summary

WorldSeed is a striking shift in framing for multi-agent systems: away from carefully orchestrated pipelines (the dominant pattern in tools like ARS or Haaland's Reviewer) and toward open-ended emergent simulation. The author's distinction — workflow vs. world — is meaningful: workflows specify the sequence of agent calls; worlds specify only the rules and roles, and the sequence emerges from agent decisions plus consequence-driven feedback.

The Autoresearch demo is the most consequential because it produces a measurable output (val_loss reduction) and a fully auditable artifact graph (papers, citations, experiments). The 24.7% val_loss reduction in 11 hours via 72 peer-reviewed papers is a concrete data point for what emergent multi-agent ML research can produce given a clear objective and consequence loop. The role-drift observation — the data agent organically expanding into adjacent technical territory — is the kind of behavior that pipeline-oriented systems specifically prevent and that emergence-oriented systems are trying to elicit.

  • Relevance to Economics Research

Multiple angles for economists. (1) Multi-agent simulation for economic phenomena: the WorldSeed framing maps cleanly onto agent-based models traditional to economics — agents with private information, asymmetric incentives, and consequence loops are the substrate of most theoretical ABMs, and LLM-driven agents may now be capable enough to produce qualitatively richer behavior than rule-based ABMs. The AI Layoffs scene is essentially a behavioral economics experiment on principal-agent dynamics with deception. (2) Emergence vs. workflow design as a methodological choice for AI-assisted research pipelines: when do you want the auditable determinism of a pipeline, and when do you want emergent role-drift and cross-specialization? The Autoresearch demo's role-drift result suggests emergence may find directions human-configured pipelines miss. (3) Auditability infrastructure: WorldSeed builds the search-evolution graph that current pipeline tools mostly don't — every artifact links back to its hypothesis, experiment, and citations. That graph is a useful design pattern for any AI-assisted research workflow, whether emergent or pipelined.