Vox Deorum: LLMs Play Strategy Games—1,408 Civilization V Experiments Reveal Emergent Personality
- Olivia Johnson

- Dec 25, 2025
- 5 min read

Strategy games have long suffered from a singular problem: the "Artificial Intelligence" isn't actually intelligent. It is a series of if/else scripts that cheats to remain competitive. A recent breakthrough project, Vox Deorum, changes this dynamic by injecting Large Language Models directly into the decision-making loop of Sid Meier's Civilization V.
By bridging the gap between game states and generative models, Vox Deorum allows players to face off against an LLM Civilization V opponent that thinks, plans, and negotiates. Recent tests involving over 1,408 automated matches between different models—specifically OSS-120B and GLM-4.6—reveal that while these AI agents aren't beating grandmasters yet, they are surviving, evolving, and playing with distinct personalities that traditional game scripts cannot replicate.
Setting Up Vox Deorum for Local LLM Civilization V Matches

Before analyzing the data from the 1,000+ game experiment, we need to address the practical side. How do you actually get an LLM Civilization V agent running on your machine? Based on current community testing and the Beta 0.4.6 documentation, the setup requires a specific approach to avoid technical bottlenecks.
Installation and Architecture
Vox Deorum isn't a standalone game; it operates as a bridge. The architecture links Civ 5 to a Community Patch DLL, which talks to a Bridge Service, then to an MCP (Model Context Protocol) Server, and finally to the LLM.To start, you need a legitimate copy of Civilization V with all expansions. The installer (currently Windows-only) handles the necessary dependencies, including Node.js and Python.
Configuring for Stability vs. Desync
A critical piece of user experience derived from early adopters involves network stability. While the system technically supports standard multiplayer protocols, the heavy data traffic required to send game states to an LLM often causes synchronization issues ("Desyncs").
The fix: Run matches in "Hotseat" mode. This forces the game to process turns sequentially on the local machine, eliminating network lag caused by the LLM's processing time.
Handling the Token Load
One common hurdle is the sheer volume of data. An LLM Civilization V match generates massive prompts. You aren't just sending text; you are sending map data.
The Matrix Problem: Feeding a raw 56x36 map matrix to a model consumes upwards of 40,000 tokens per prompt. This is unsustainable for most local hardware.
The Event Solution: Vox Deorum solves this by describing the map implicitly through "Events" and tactical area descriptions rather than a brute-force coordinate grid.
Local Hardware Reality
You do not need an H100 cluster to play. Users have successfully run smaller parameter models (like OSS-20B) locally. If you have a decent GPU with sufficient VRAM, you can host the AI opponent on your own hardware. For those prioritizing speed over privacy, the system allows connection to external APIs (like OpenRouter) via the WebUI configuration.
LLM Civilization V Performance: OSS-120B vs. GLM-4.6
The core of the recent Vox Deorum experiments involved pitting two heavyweights against the game environment: OSS-120B and GLM-4.6. The results shattered the assumption that LLMs would simply hallucinate their way to defeat. Both models achieved a survival rate of approximately 97.5%, effectively matching the 97.3% survival rate of standard algorithmic bots.
The Warmonger vs. The Builder
The most fascinating discovery wasn't that they survived, but how they played.
OSS-120B (The Aggressor): This model exhibited a distinct "personality" leaning toward domination. It recorded a 31.5% increase in Domination victories compared to the baseline. It actively neglected culture, showing a 23% drop in Cultural victories.
GLM-4.6 (The Strategist): In contrast, GLM-4.6 maintained a balanced profile, weighing conquest and culture equally.
This indicates that "personality" in an LLM Civilization V context isn't just flavor text—it is an emergent property of the model's training data. OSS-120B interprets the winning condition as a problem to be solved with force, while GLM sees multiple pathways.
Ideological Bias
Both models showed a statistically significant preference for the "Order" ideology, selecting it roughly 24% more often than "Freedom." In the game's mechanics, Order favors broad empires and production—concepts that perhaps align easier with the logical structuring of LLM planning compared to the specialist-heavy nuances of Freedom.
The Cost of Intelligence in Vox Deorum

Running an LLM Civilization V agent is not free, whether you pay in electricity or API credits. The tests revealed substantial data consumption.A single turn in the late game can require processing 53,000 input tokens. While output tokens remain relatively stable (around 1,500), the context window grows linearly as history accumulates.
Based on late 2025 pricing via OpenRouter, a full standard speed game costs approximately $0.86. This sounds low, but for developers scaling this technology, the bandwidth is the real killer. The extraction of "Facts" from the game state into Markdown—describing cities, units, and rules—is computationally expensive.Future iterations of Vox Deorum will likely need to move toward "incremental updates" or vector-based memory to stop re-sending the entire world state every time a unit needs to move.
Future of Vox Deorum: Memory and Agency
Current LLM Civilization V implementations suffer from a "goldfish memory" problem. The model sees the current state, makes a move, and forgets why it made that move by the next turn. It relies entirely on the context window.User feedback and developer roadmaps suggest three key areas for the next evolution:
Self-Reflection: Implementing a loop where the AI critiques its previous turn before making a new one.
Hierarchical Agents: Instead of one LLM controlling everything, a "General" agent could issue high-level orders ("Capture that city"), while smaller, cheaper models execute the tactical unit movements.
Personality Roleplay: Players want to feel like they are talking to Alexander the Great, not ChatGPT. Using the chat function to reinforce specific diplomatic personas is a high-priority user demand.
Conclusion
Vox Deorum proves that the era of scripted, cheating AI is ending. While we are still in the optimization phase—fighting token limits and context windows—the ability to play an LLM Civilization V match locally represents a massive leap forward. The models are not just playing; they are forming ideologies, choosing violence or peace based on their internal logic, and offering a glimpse into a future where every NPC has its own mind.
FAQ
What is the best way to run Vox Deorum without lag?
The most stable method is using "Hotseat" mode. This prevents network desynchronization (desync) errors by forcing the game to process turn logic sequentially on your local machine rather than over a network protocol.
Can I run an LLM Civilization V opponent on my own computer?
Yes, smaller models like OSS-20B have been verified to work on consumer hardware. However, for larger models like OSS-120B, you will likely need to use an API service or a high-end multi-GPU setup.
How does the AI know what the map looks like without seeing it?
Vox Deorum converts map data into text-based "Events" and descriptions of tactical areas. It avoids sending the raw map matrix, which would consume over 40,000 tokens per turn, and instead relies on implicit descriptive data.
Does the LLM actually play better than the standard AI?
Currently, LLMs have a similar survival rate (~97.5%) to standard AI but do not consistently beat them in victory points. Their strength lies in unpredictable, non-cheating gameplay rather than optimized min-maxing.
What ideologies do LLMs prefer in Civilization V?
Testing shows a strong bias toward the "Order" ideology. Models like GLM-4.6 and OSS-120B choose Order approximately 24% more frequently than Freedom, likely due to its production-focused bonuses.
Is Vox Deorum compatible with multiplayer?
Technically yes, as it uses standard game protocols. However, due to the latency introduced by LLM processing time, live multiplayer is prone to desyncs, making Hotseat or single-player the recommended format.


