From the very beginning, video games have been closely intertwined with the development of computer science. Over the past few decades, breakthroughs in real-time rendering and hardware have pushed games toward richer visuals, larger worlds, and more capable engines.
Yet the characters at the heart of the player experience have not advanced at the same pace. It remains one of the game industry's most persistent problems: making characters feel genuinely lifelike.
In the real world, the meaning of a journey is never determined by the destination, but by the people who travel with you. The same is true for virtual worlds. True immersion does not come from visual fidelity, but from the bond between you and these characters. They remember what you said, understand what you did, and in response, change their own fate and shape your story.
In traditional games, NPCs are driven by extensive hand-authored scripts and branching behavior trees, so the experience is largely pre-authored. This paradigm can deliver stable, controllable gameplay in the short term, but it has an inherent limit. As players spend more time and repeat the same kinds of interactions, novelty fades quickly, characters become predictable, the world feels less responsive, and players eventually get bored and leave.
For example, in The Elder Scrolls V: Skyrim, guards are designed to trigger different lines based on the player's game state, but players soon find themselves hearing the same familiar line again and again: "I used to be an adventurer like you. Then I took an arrow in the knee…" This frequent repetition eventually became a well-known gaming meme.
At Entropy Games, our goal is to build the next generation of games: AI-native experiences where every character has a coherent cognitive system—persistent memory, a stable personality, intent-aware conversation, meaningful actions, and a natural, lifelike voice.
These characters treat your pauses, pacing, choices—and even silence—as signals. They read and remember those cues, shaping what they say and do next. The story evolves through ongoing interaction rather than fixed branches. The result is a game that feels genuinely personal.
To get there, we need to bring frontier AI into games and integrate it tightly with the game world. That introduces several core challenges.
The biggest challenge is cost. High inference costs make broad adoption difficult. While per-token pricing has fallen rapidly over the past few years, it is still far from a level that commercial games can sustainably support. Today, even the most affordable models can cost anywhere from a few dollars to tens of dollars for several hours of gameplay. This cost may exceed the purchase price of many games, making it economically impractical to bring advanced AI into games.
Latency is the second bottleneck. In a cloud inference architecture, every player request must be processed server-side and returned to the client, so responsiveness depends on network latency and connection stability. AI-driven logic forces players to wait several seconds, even though rendering and physics simulation run locally in real time. This puts the AI a few beats behind the rest of the game. NPC dialogue and actions arrive a few seconds late, making characters feel sluggish and out of sync.
Character consistency and out-of-character (OOC) drift. Today's AI models are designed for general-purpose conversation rather than for games. As a result, they lack a reliable sense of diegetic boundaries.
When asked to role-play as in-game characters, they are prone to drifting beyond the game's narrative constraints and the character's established background. A "medieval physician," for example, may suggest buying amoxicillin at a pharmacy or using Google Maps to locate the nearest hospital. Such breakdowns quickly undermine player trust and can shatter immersion.
Safety and hallucination risks. Large language model (LLM) hallucinations are especially damaging in games. The model may fabricate lore or narrative events that fall outside the established worldbuilding. What's worse, it may generate extreme content—hate, discrimination, explicit material, or graphic violence—leaving players deeply uncomfortable.
Speech models lack dialogue awareness. Even the most advanced systems still operate within a text-to-speech (TTS) paradigm: they convert written text into speech waveforms and primarily optimize for acoustic fidelity. In doing so, they largely overlook the implicit structure and cues that shape human dialogue, including turn-taking, interruptions, and delivery signals like timing, pacing, and tone.
Game dialogue places even greater demands on speech generation. An NPC's spoken response must not only reflect the player's emotion and intent but also align with the current in-game situation. Yet traditional TTS pipelines remain largely disconnected from dialogue context and game state, leaving NPC speech far from natural or lifelike.
Traditional behavior trees lack native support for LLM-driven deliberation. They pass around control-flow states—success, failure, running, aborted—but those signals never enter the language model's cognitive loop. As a result, the model may produce plausible dialogue without a reliable mechanism to drive NPC action selection or execution. An NPC might claim it has launched a magic orb, but the game requires concrete outcomes: a casting animation, in-world effects, and changes to the game state.
A similar disconnect exists at the narrative layer. While dialogue may be open-ended, the underlying plot often remains hard-locked to pre-authored story nodes. That leaves characters unable to use conversation and interaction history to advance the storyline in a dynamic and responsive way.
These challenges point to a simple conclusion: building an AI-native game is not primarily about using a smarter model. It is about engineering an end-to-end system that is cost-efficient, low-latency, controllable, and safe by design.
That system should be centered on the player experience and tightly integrated with core gameplay. AI games only become genuinely fun and scalable when AI moves beyond surface-level dialogue and becomes a reliable part of the narrative-and-action loop.
We begin by introducing Entropy Games' system architecture and show, step by step, how it solves the six most critical challenges of deploying AI in games. Next, we explain why a local-first approach is the only viable path to making AI characters fast, affordable, and sustainable in real gameplay. Finally, we cover the technical core: model and architecture decisions, performance trade-offs, and the system-level techniques required to ship AI in production games.
How NPCs think and stories evolve
Inspired by cognitive neuroscience, we built a coordinated, multi-module cognitive system for NPCs.
The system centers on a language model (LM) as its cognitive core. It runs in a closed loop with modules for perception, memory, context engineering, behavior control, and speech. At a higher level, an AI Narrative Director manages narrative pacing and player experience, turning hours of interaction into a coherent, controllable story.
Click any card in the diagram to highlight its path; click any brain module for a brief description.
In this framework, NPCs rely on a perception layer to take in both the in-world state and the player's language and actions. The memory system is split into short-term memory (STM) for immediate decisions and long-term memory (LTM) for longer-horizon decisions and stable character consistency.
We trained and deployed a Game-Native Language Model (GNLM) as the NPC's cognitive core. It maintains a stateful cognition loop that tracks the evolving game state and interaction history, enabling NPCs to adjust tone, response strategy, and actions in real time.
The behavior and cognition layers do not form a one-way decide-then-execute pipeline. Instead, they operate as a bidirectional feedback loop. The cognition layer issues action and speech directives, while the behavior layer continuously streams back execution state, outcomes, and in-world signals. Those signals immediately shape the next round of deliberation and generation, enabling iterative correction in real time.
The diagram above represents the gameplay loop as a graph of narrative nodes and paths. Each player–NPC interaction—dialogue, actions, shared events, and the resulting memory updates—directly shapes the NPC's subsequent decisions and behavior policy. The outcomes of the NPC's speech and actions then flow back into the cognitive modules and help the AI Narrative Director steer the narrative.
In our architecture, the AI Narrative Director functions as a player-experience management layer. It continuously monitors the player's game state and key player-experience signals, including objective progress, resource pressure, event density, and immersion cues.
Based on those signals, it adjusts event triggers, information reveals, conflict intensity, and narrative pacing without breaking world consistency or character motivation. The aim is to sustain long-term novelty and challenge within clear, designer-defined boundaries, so that different players can develop distinct stories under the same world rules.
How do we solve the toughest challenges and bring advanced AI into game worlds?
We now outline six core problems in game AI and explain how our system addresses them. This work is ongoing and, in our view, marks the start of a new paradigm for games. Over time, these technical directions and implementations will fundamentally reshape how games are built and played.
We run inference locally rather than in the cloud, cutting per-session inference costs from tens of dollars to effectively zero. Players pay no API fees tied to playtime.
To make on-device deployment the default, we built a dedicated inference runtime that hosts the language model, speech model, and a set of lightweight specialized models. The runtime is frame-aware and runs as part of the game loop. It coordinates execution and schedules inference within the frame-time budget to preserve smooth rendering and simulation.
We further improve efficiency with quantization, mixed precision, and knowledge distillation to ensure reliable performance across consumer hardware. In addition, we integrate each model and its inference stack directly into the game runtime so the AI stays invisible to the player. In practice, this approach runs on NVIDIA RTX 30-, 40-, and 50-series GPUs and Apple Silicon Macs (M1–M4).
The entire stack is streaming-first by default, so NPC interactions stay responsive in real time. Players hear the first words of a response without waiting for the language or speech model to complete a full sentence. Game-specific models are kept lightweight enough to run in parallel. Together, these choices reduce end-to-end conversational latency to approximately 500 ms.
We are also exploring a full-duplex interaction mode. Games come with different audio and interaction constraints than voice assistants, especially in noisy and dynamic environments. For that reason, we plan to introduce full-duplex capabilities gradually and with care.
Character consistency requires targeted intervention rather than relying on a single training pass. We run multiple rounds of post-training tailored to game-specific requirements. These rounds enforce diegetic boundaries, improve memory reliability, handle pronouns and perspective correctly, and adhere strictly to developer-defined lore and narrative worlds. An internal evaluation pipeline measures these capabilities and tracks the best-performing checkpoints.
Context management is a separate but closely related challenge. We address it by training a game-specific retrieval model and using a dedicated memory module to manage both long-term and short-term NPC memory. Together, these components form our context-engineering stack. It keeps runtime context efficient, accurate, and genuinely useful during gameplay.
In practice, a stronger base model combined with effective context engineering can reduce lore violations and out-of-character behavior. However, it remains an open research problem in machine learning to make a model faithfully and consistently embody a developer-authored character. To push beyond current limits, we're starting interpretability work to study activation patterns across story branches and interaction contexts, so we can better understand what keeps a character in character.
Safety is our top priority. We treat safety as a design constraint, not an afterthought. We explicitly incorporate major risk categories, including hate, discrimination, explicit content, and violence, into our post-training pipeline. This reduces the likelihood that the model learns or reproduces harmful behaviors during play.
Looking ahead, we plan to integrate a dedicated safeguard model, similar in spirit to gpt-oss-safeguard. It screens high-risk cases before they reach the player, ensuring a consistently safe in-game experience.
We built our speech model from scratch so NPC voice interactions feel natural and lifelike. The model is designed to capture conversational nuance and in-game context. We also engineered our training pipeline and optimization objectives for both pre- and post-training to enable high-quality, context-aware speech at ultra-low latency.
This approach goes beyond traditional text-to-speech systems that synthesize audio from text alone. Given the game setting and the ongoing conversation, the model can adjust emotion, tone, and delivery to match both the moment and the character, creating a more immersive in-game dialogue experience.
For us, this is only the beginning. We are continuing to advance this work so the model can understand not only what players say, but also how they say it, including timing, speaking style, emotion, and engagement cues, in real time and at low latency. This helps the system better understand the player and deliver the most appropriate response for the situation.
We extend the traditional Behavior Tree (BT) into a Cognitive Tree (CT) that reasons over natural-language semantics. Execution state and player behavior signals—blocking, turning away, approaching, silence, and more—feed back into the AI's cognitive system as semantic inputs. NPCs then adapt not only to what was said, but also to what those in-world signals imply. Is the player's silence hesitation or refusal? Does following closely indicate trust, or a need for guidance? This bidirectional loop keeps language and action aligned.
For example, when an NPC says, "I'm going to cast a fireball," the game executes the cast and the player sees the fireball appear in the world. If the player interrupts the cast, the NPC can infer "the player doesn't want me to do this" and adjust its subsequent dialogue and behavior.
At the narrative level, the AI Narrative Director continuously monitors and aggregates player–NPC interaction and dialogue history. When it detects a drop in player engagement, it can inject new events and unlock additional story branches. This shifts the game's narrative logic so each player's story can unfold along a distinct trajectory.
Why local-first is the best solution for game AI
The fundamental flaw of an API-based cloud inference model is that it breaks the core economics of games: the longer players play, the more developers should benefit.
In many traditional games, marginal costs are close to zero. Whether a player puts in 100 hours or 1,000, server costs typically do not rise in proportion to playtime. This lets developers focus on what matters most: making the game more fun. As players spend more time, the community grows and more content gets created. That content attracts new players, creating a positive flywheel.
However, cloud-based AI inverts this logic. Each line of dialogue and each NPC reaction triggers a paid API call, so inference spending scales with interaction volume and session length. The more a player engages, the more the developer pays. In practice, this leaves developers with two options: meter AI usage or impose hard caps on AI-driven behavior. Both approaches degrade the player experience.
Local-first inference restores the natural economics of games. Marginal costs are effectively zero, so players can play as long as they want. Developers can use the same business models that work in traditional games—one-time purchase, subscription, or free-to-play with in-game purchases. They are no longer constrained by a cloud cost structure that rises with engagement.
With a local-first approach, we reduce both friction and the recurring inference cost of game AI to near zero. We keep the focus where it belongs: building a great game. Players buy the game, not tokens.
Beyond cost, local-first offers practical advantages that are easy to miss.
For players, on-device AI means the gameplay experience no longer requires permission from a remote service. There is no dependence on network stability, no API rate limiting, and no conversations cut short by timeouts. Whether a player is on a long-haul flight or in a family cabin, the full AI system can keep running.
Local games can offer something cloud games struggle to guarantee: permanence and ownership. Every year, long-running online games shut down their servers, and thousands of hours of play, communities, and shared memories can disappear overnight. A local-first architecture changes that dynamic. The game's lifespan is determined by players and the community, not by a company's operational decisions.
It also makes privacy the default. Dialogue and voice data stay on the player's device, without requiring them to accept mandatory data-collection terms.
All of this is seamless for players. There is no separate model download and no API key configuration. Developers ship the model weights and inference runtime with the game at release, making the system plug-and-play from day one.
Local-first can unlock a larger creator ecosystem. A game's fun comes not only from the content it ships with, but also from the creator community that grows around it. With the right UGC tools, developers and players can create surprising new things. Traditional games already show this compounding effect. Minecraft has evolved into a broad universe of worlds and playstyles, and Roblox hosts millions of creator-built experiences across a long tail of niche genres.
Local-first AI games have a stronger foundation for reaching comparable scale. The key advantage is cost: near-zero marginal inference cost makes creation and experimentation cheap. At the same time, modifiable weights make models easier to remix, extend, and iterate. Over time, that combination can expand the range of gameplay and creative expression.
A longer-term trend is also becoming clear. Parameter efficiency continues to improve. On many tasks, today's ~1B-parameter models can match or even surpass earlier 100B-scale models. This makes models that are smart enough yet small enough to run on-device increasingly practical.
In parallel, more hardware companies are prioritizing inference, and next-generation GPUs and NPUs are being optimized at the system level for inference workloads. With each generation, consumer GPUs keep raising the ceiling on what can run locally. Together, these shifts will continue to drive down the unit cost of on-device inference, moving local-first AI from a technical preference to the most economical and natural default for games.
To borrow Andrej Karpathy's analogy, the AI industry today resembles the mainframe era. A small number of companies operate massive GPU clusters and distribute AI capability to everyone through APIs. In the early days of computing, only large institutions, including governments, universities, and major companies, could afford computers, and everyone else had to share access to compute time. But this is not the end state.
As smaller models become more capable and consumer hardware continues to advance, demand will shift from a Model-as-a-Service paradigm to an Application-First paradigm built around specific products. This will move us into AI's personal computing era, where intelligence no longer lives only in the cloud but runs on your own device. It understands you, remembers you, and belongs to you.
Starting with games, Entropy Games is building the first wave of AI-native applications for this era.
Technical post
The Core Systems of AI-Native Games
Entropy Games Tech Team
This section describes our core technical innovations as three primary modules:
Game-Native Language Model (GNLM) serves as the cognitive core for NPCs. It pairs targeted post-training with a proprietary, game-specific retriever and dynamic context management. These components achieve ultra-low latency on consumer hardware while approaching leading performance on game-relevant tasks.
Game Context-Aware Speech Model (GCA) extends traditional TTS systems that synthesize audio from text alone. It conditions on dialogue history and in-game context, which lets it automatically adjust pacing, tone, and emotion.
The Cognitive Tree and AI Narrative Director act as the control layer for game logic. They establish a bidirectional interface between behavior trees and language models, enabling narratives to dynamically evolve through player interactions.
At runtime, the three modules run as one system. GNLM reasons over player intent and world state. GCA turns that intent into context-appropriate speech. The Cognitive Tree and AI Narrative Director then convert those decisions into executable actions and update a persistent narrative state.
Game-Native Language Model
The LM is the cognitive core of AI-native games. It largely determines whether player–NPC interactions feel realistic, natural, and engaging.
However, most LLMs are not designed for game worlds or in-character NPC dialogue. They are primarily optimized for human-facing assistants, coding agents, and general-purpose problem solving.
A central limitation is that they often fail to maintain the diegetic boundaries game characters require. Over long sessions, they struggle to stay consistent, to remain within developer-authored lore and role constraints, and keep dialogue natural and engaging.
Cost and latency add further constraints. These models are expensive to run, and many still struggle to deliver the low-latency, real-time interactivity that games demand.
Offline benchmark strength alone does not guarantee a better player experience. To meet real on-device limits, we post-trained several open-weight base models and selected the checkpoint with the best end-to-end performance. We then trained a game-specific retrieval model and integrated it with our memory manager and dynamic prompt routing to keep in-game interactions stable and consistently high quality.
Experiments
We evaluated both base and post-trained checkpoints across three open-weight model families: Gemma 3, Qwen3, and Llama 3. Because the base checkpoints performed similarly in general conversation, we focused on post-training to develop the game-specific behaviors we require.
We ran multiple post-training rounds and scored each checkpoint on general dialogue quality as well as game-specific criteria. These criteria include style and tone, conversational naturalness, memory utilization, lore and backstory consistency, and action selection under strict output-format compliance. Our final model is a Qwen3-based checkpoint, quantized to INT4 for on-device deployment on consumer hardware.
We also observed that reasoning models and hybrid architectures reach our target behavior more reliably at smaller parameter scales.
Evaluation framework
Existing LM evaluations do not capture in-character roleplay or how an LM shapes moment-to-moment player experience in games. Most benchmarks focus on general tasks—assistant behavior, preference alignment, writing, math, and coding—and miss the boundaries that govern character behavior in a game world. We therefore designed a game-specific evaluation framework.
We measure in-character LM performance along three independent dimensions:
Character Fidelity: Whether the model stays within the game's lore and the character's backstory, maintains a consistent identity and persona, and follows the requirements of the current narrative beat. We also assess dialogue for naturalness and expressiveness.
Memory Utilization: Whether the model uses indexed memories appropriately—retrieving the right details when they matter and integrating them naturally into dialogue to maintain long-horizon consistency.
Action Alignment: Whether the model selects actions within the allowed action space, produces outputs the game engine can parse, and chooses actions that fit the current narrative and interaction context without breaking immersion.
For evaluation, we run the same context prompts on our post-trained model and several state-of-the-art models. We then shuffle and batch responses for blinded scoring. GPT-5.2 serves as the judge model on a 1–10 scale, and we normalize scores to produce the final benchmark.
Benchmark performance of our post-trained model and leading baseline models on Character Fidelity, Memory Utilization, and Action Alignment. Higher is better.
On our benchmark suite, we ranked third on Character Fidelity and fourth on both Memory Utilization and Action Alignment. These results suggest that with high-quality training data and targeted post-training, a game-specialized model can be competitive with state-of-the-art systems on core gameplay behaviors.
However, offline benchmarks are not a reliable proxy for player experience in live gameplay. We therefore ran a controlled in-person study with 53 compensated participants. To isolate language-model effects, we held the TTS model constant (Eleven v3, alpha) across all conditions and varied only the underlying LMs. Participants completed gameplay sessions and selected their preferred version; depending on the baseline, 65%–76% preferred the version driven by our model.
In-person preference study (N=53). Fixed TTS (Eleven v3, alpha); only the underlying LLM varies. Bars report preference win rate against baselines without vs. with reasoning (50% = parity).
In live gameplay, low player ratings for LLM-driven characters reflected poor gameplay fit more than raw model capability. Three issues recurred across sessions: (1) turn latency that lagged behind the game loop, leaving players waiting while the world kept moving; (2) emotionally flat, generic dialogue that weakened character tension and moment-to-moment engagement; and (3) out-of-character drift that violated diegetic boundaries and developer-authored lore.
We also saw a distinct failure mode in reasoning-oriented LLMs: high-confidence, assistant-style explanations that exceeded the character's perspective, making the AI more visible and breaking immersion.
Hallucination mitigation and safety guardrails
Most evaluation pipelines still rely on binary (0/1) metrics such as accuracy and pass rate. Under this scoring scheme, abstentions and IDK responses ("I don't know" / "I'm not sure") typically receive no credit. As a result, when a model lacks sufficient evidence, the incentives push it toward a plausible "best guess," which increases hallucination risk [1].
This failure mode is amplified in games. Even large models can violate world boundaries by inventing items, mechanics, or abilities that do not exist in the game world. They may also fabricate narrative events by referencing quests, story beats, or interaction history that never occurred. These confident but incorrect outputs can mislead player decisions, break narrative continuity and immersion, and degrade the overall experience.
To mitigate this, we introduced targeted post-training data covering insufficient-context cases, out-of-scope requests, and prompts unsupported by the game's lore. When evidence is missing, this training encourages more reliable in-character behavior—for example, admitting uncertainty, refusing safely, or asking key clarifying questions.
We observed that this post-training substantially increases the deferral rate on in-game hallucination stress-test prompts. Detailed results are shown in the figure below.
Stress-prompt evaluation (insufficient context, out of scope, or lore-unsupported). Deferral rate (%) is the share of responses that defer (IDK/clarify/refuse) rather than making unsupported assertions. Before = baseline checkpoint; After = + targeted deferral post-training (all else held constant). Higher is better.
During post-training, we introduced safety-focused datasets to reinforce compliant behavior in high-risk scenarios and reduce unsafe responses. In parallel, we are exploring a dedicated, policy-based safety classifier that can automatically detect and filter potentially harmful outputs and replace or rewrite them with harmless alternatives [2]. However, reliably running such a classifier inside a local game runtime at ultra-low latency remains a significant engineering challenge.
We also found that smaller models are often easier to tune to reduce hallucinations and unsafe outputs than larger models. They more readily detect when a response deviates from the character's background or the game's lore, and they are more likely to admit uncertainty or correct earlier missteps.
On-device deployment and acceleration
Real-time, on-device deployment requires low latency and stable frame times on consumer hardware. We applied system-level inference and runtime optimizations to meet these requirements.
First, we built an on-device inference engine purpose-built for real-time game workloads. It is tuned for low time-to-first-token (TTFT) and predictable frame-time overhead, using streaming-first execution, KV caching, and optimized attention kernels. On the model side, we apply INT4 quantization for efficient local execution and ship weights in formats compatible with mainstream on-device toolchains.
With this streaming-first runtime, the LM streams output continuously. We start displaying the dialogue text as soon as the first token arrives and update it continuously as tokens stream in. This reduces perceived latency and makes real-time dialogue more responsive.
We benchmarked LM-only latency and throughput on RTX PCs under identical conditions against several strong baselines. In this setup, our model achieves 4 ms TTFT, 486 tokens/s throughput, and 265 ms end-to-end response latency (LM-only), outperforming all baselines in our evaluation. The model supports deployment across NVIDIA RTX 30-, 40-, and 50-series GPUs as well as Apple Silicon devices (M1–M4).
LM-only Latency & Throughput Comparison
Models
TTFT
TPS
End-to-End Latency
GNLM(ours/streaming/local)
4ms
486
265ms
Gemini 3 Flash Preview(Google Vertex Provider)
1300ms
64
3294ms
Claude Sonnet 4.5(Anthropic Provider)
2310ms
48
4966ms
Gemini 3 Pro Preview(Google Vertex Provider)
3910ms
80
5498ms
DeepSeek V3.2(DeepSeek Provider)
1560ms
26
6545ms
GPT-5.2(OpenAI Provider)
3850ms
40
7035ms
Benchmarked on NVIDIA RTX 5090. Non-local models were served via OpenRouter. End-to-end latency was measured with a fixed-output protocol: each model was prompted to reproduce an identical response verbatim to control output length. All numbers are LM-only.
Game-native retrieval model and memory framework
Higher benchmark scores do not necessarily translate into a better in-game experience. Games run on a developer-authored ruleset, an explicit world state, and a defined narrative canon. The LM can only produce reliable in-character dialogue and actions when it is grounded in complete character and world information—identity, motivations, relationships, timeline, and quest state.
Long-horizon continuity is equally important. The model must capture and organize memories from player dialogue, actions, and shared in-world events, then retrieve and use them naturally when they matter later. This keeps conversations coherent and helps sustain a durable player–NPC relationship. For example, if a player asks hours later, "Do you still have the sword I gave you last time?", the NPC should respond in a way that remains consistent and believable.
This level of grounding requires a game-specific retrieval model. Most general-purpose retrieval models are built for document and code search. They perform well on dense, self-contained text with clearer topical boundaries, but often struggle with in-game character dialogue and memory retrieval. Game dialogue is more context-dependent and frequently involves shifting coreference, colloquial phrasing, and pragmatic signals such as tone and emotion.
A player may refer to the same NPC across turns as "he," "that guy," and "you," or use sarcasm to express dissatisfaction ("You're really something."). At the same time, real-time gameplay leaves little latency budget for retrieval, and many off-the-shelf retrieval stacks become a bottleneck for both retrieval and immediate feedback.
We built a multi-stage retrieval stack for game settings, centered on two trained models: a bi-encoder for low-latency candidate retrieval and an in-house cross-encoder reranker for high-precision selection. Both models run as first-class components within our on-device runtime. Integrated with our NPC memory framework, the system continuously refreshes the working context during gameplay and preserves character consistency over long horizons.
Below, we report retrieval accuracy and latency under real in-game limits.
Memory retrieval is evaluated on batches of 200 real in-game queries using an on-device bi-encoder retriever with cross-encoder reranking. Hit@1 denotes top-1 ground-truth accuracy (higher is better). E2E latency is the mean end-to-end wall-clock time per query measured across the selection pipeline (encoding + retrieval + rerank) (lower is better).
Effective context engineering
Static prompts are too rigid for a game world whose state and conversational context change continuously, and they add unnecessary prefill overhead. We replaced the traditional static prompt module with a dynamic prompt management system that keeps the LM's context accurate while improving responsiveness.
We split the prompt system into modular components that activate dynamically based on runtime signals such as topic, intent, emotion, narrative phase, and relationship state.
A semantic prompt router infers a player's topic, intent, and emotional state in real time for each input, then activates the appropriate combination of modules. The routing layer draws on our in-house retrieval model and NPC memory framework, injecting only the most relevant background context and memories into the LM on demand. This preserves response accuracy while reducing irrelevant context and redundant tokens.
This design enables dynamic context orchestration with on-demand retrieval. Under the same 32k context window, we compress effective runtime context from ~28.4k tokens to ~3.8k tokens. This reduction lowers prefill compute, speeds up prompt processing, and improves in-game responsiveness.
Our future work
The best player experience requires co-designing every module with the game environment as a single integrated system. The full stack must run locally at effectively zero marginal cost and ultra-low latency.
Our current training dataset is primarily English. We plan to expand multilingual support—including Chinese, Japanese, and Spanish—to reach a broader player base and enable a wider range of narrative styles.
We are also investing in game-native evaluation. Roleplay has become a major real-world use case for open models, accounting for ~52% of open-source token volume in OpenRouter's 100T-token usage study [3]. Yet the field still lacks datasets and evaluation protocols purpose-built for game characters. We propose a game-native evaluation framework that combines controlled offline tests with online, in-game evaluation. We will continue iterating until the metrics reflect player experience and drive models toward more lifelike characters.
A deeper bottleneck is data. Training game-native LMs remains difficult because high-quality, clean, domain-specific data is scarce, and synthetic data still does not reliably meet the quality bar required for production games. Yet games generate large volumes of grounded player interactions and feedback. We will expand our data pipelines and online-learning methods to convert real in-game feedback into training signals that directly improve model behavior.
Game Context-Aware Speech Model
Most TTS systems still treat dialogue as isolated, single-turn text-to-speech. Even when the audio is highly realistic—often difficult to distinguish from human recordings—these models are typically trained on turn-level clips where dialogue history and broader context are missing or heavily truncated.
This design limits how well a model can adapt delivery to prior turns and the immediate situation in which a line is spoken. In real conversation, speakers rely on more than words and prosody: discourse context, intent, and emotion shape timing, emphasis, and affect. Without that context, the same text can feel mismatched to the interaction, and delivery can drift away from what the scene calls for.
In games, voice is the most immediate channel of interaction between the player and a character. It strongly influences whether players come to like a character and is a key driver of social presence and immersion.
We introduce the Game Context-Aware Speech Model (GCA) to bring natural, authentic, context-aware voice into games. GCA models conversational context across dialogue history, capturing both what was said and how it was delivered, and uses that signal to adapt speech in real time. It adjusts emotion and speaking style to match the scene, producing a more lifelike conversational voice experience.
Challenges in transformer-based speech modeling
We use a Transformer architecture that jointly models text and speech. It leverages the model's native context window to incorporate dialogue history and conversational memory. Two key challenges remain:
Tokenization challenge. Audio has far higher bandwidth than text and a richer temporal structure. Speech is also a continuous-time waveform. Modeling speech at the sample level with a Transformer quickly produces prohibitively long sequences and substantial computational overhead.
A more practical approach is to use a neural audio codec as the tokenizer. The codec compresses continuous waveforms into a sequence of discrete tokens and can reconstruct audio when needed, turning speech generation into a tractable discrete sequence modeling problem.
Computational scalability challenge. Codec tokenization makes speech discrete, but it does not make it low-bandwidth. High-quality generation still requires predicting large amounts of fine-grained acoustic detail. Many codec-based representations operate at a fixed frame rate and emit multiple discrete codebook indices per frame via multi-codebook schemes.
Autoregressive generation must therefore predict multiple codebook tokens per frame, and the overall token workload scales roughly with S × N (frames × codebooks per frame). Making training and real-time synthesis practical requires additional efficiency strategies that reduce token throughput and prediction cost while preserving audio quality and intelligibility.
Game Context-Aware Speech Model
GCA uses a two-stage Transformer architecture inspired by the hierarchical modeling idea in the RQ-Transformer [4]. A larger Backbone Transformer jointly models text and audio tokens and summarizes them into a context vector representing the current dialogue state. A smaller Depformer then conditions on this vector to generate the codec representation autoregressively.
At each timestep, it predicts the discrete Residual Vector Quantization (RVQ) tokens across 32 codebooks, generating them sequentially within each frame. The audio codec decoder reconstructs these tokens into speech, recovering fine-grained acoustic detail.
Both the Backbone and the Depformer follow a Llama 3–style Transformer. We use Mimi as our audio codec. Its split-RVQ representation runs at 12.5 Hz and emits one semantic codebook per frame carrying semantic and phonetic content, along with N−1 acoustic codebooks that encode fine-grained acoustic detail [5].
Conventional pipelines treat language generation and speech synthesis as separate stages. We instead unify text tokens and RVQ-quantized audio tokens in a single multimodal autoregressive model and train them jointly. We also condition training on dialogue history, using a default context window of roughly 40 seconds, so the model can explicitly incorporate game context and conversational state.
The framework also supports multi-speaker settings. With LoRA-based post-training, we add lightweight, per-character adapters to a shared base model. This enables distinct, stable voices across many NPCs without training a separate model for each character.
Training compute optimization
The Depformer remains the throughput bottleneck, even with RQ-style hierarchical modeling that reduces the Backbone's temporal cost. Within each frame, it autoregressively generates RVQ tokens across the codebook depth, creating an N-step inner loop. Compute and training-time activation memory therefore scale roughly linearly with N, and overall overhead grows on the order of O(B · S · N), where B is batch size and S is the number of frames.
We fix the semantic tokens and train the remaining audio tokens on a random 1/16 subset of frames. This produces no meaningful change in final acoustic test loss, consistent with prior observations [6].
Subjective listening tests, however, indicate perceptual degradation, including less stable utterance onsets and reduced fine-grained detail. We mitigate this by also fixing the first acoustic codebook token alongside the semantic token while keeping the same subsampling strategy for the remaining audio tokens. This retains most of the compute savings while improving perceived stability and detail.
Decoder codebook amortization: amortized Depformer training matches full training on test NLL.
Experiments and Scaling Law
Transformer-based speech models show smooth scaling trends consistent with empirical scaling laws for language modeling [7].
We curate roughly 1M hours of high-quality, publicly available speech data, predominantly in English, and run a standardized preprocessing pipeline with quality filtering, ASR transcription, and speaker diarization.
We train three model sizes—0.3B, 0.52B, and 1.2B parameters—and select the 0.52B model as our primary configuration to balance quality with on-device deployment limits and low-latency targets.
As parameter count increases, we see systematic gains in fine-grained acoustic fidelity. We also observe a clear inflection point once the training token budget becomes commensurate with model capacity. Beyond this point, marginal gains become markedly larger, suggesting that undertraining is the limiting factor below this regime.
Speech does not simply inherit the text-only scaling recipe. Compute-optimal training for text Transformers implies a specific tokens-to-parameters tradeoff [8], but we find that this optimum shifts in the codec-tokenized speech setting. We use this shift to guide how we allocate compute between scaling model capacity and extending training tokens, improving both fidelity and training efficiency.
Compute scaling: test NLL decreases consistently with compute (curves smoothed for readability).
Real-time streaming inference
We built a dedicated streaming inference engine for the GCA speech model and run it alongside our LM inference engine as a unified on-device runtime.
The speech engine operates at audio-codec frame granularity, maintains persistent streaming state across dialogue turns, and stays tightly synchronized with the LM's streaming output. Preallocated memory pools and buffers let it synthesize and decode each new audio frame as soon as the next text chunk arrives, then start playback immediately without waiting for a full response.
This design delivers low time-to-first-audio (TTFA) and smooth, continuous speech during generation. It also keeps memory footprint and frame-time overhead within tight, predictable bounds.
We benchmarked speech latency and streaming stability on RTX PCs under the same real-time budget as in-game dialogue. Under these conditions, our GCA speech stack achieves 124 ms TTFA and sustains continuous streaming playback at real-time speed as tokens arrive from the LM.
Benchmarked on NVIDIA RTX 5090. Hosted baselines were invoked via their respective public APIs. For each model, we evaluated 200 utterances and report the median TTFA.
Full-duplex conversation is a promising direction for voice interaction. The model keeps listening while it speaks, enabling ultra-low-latency generation and supporting real-time interruptions, overlapping interjections, and backchanneling.
Games add additional requirements. The dialogue system must preserve the player experience while remaining explicitly controllable and predictable inside a complex, stateful runtime. We are actively exploring interaction paradigms that enable full-duplex dialogue in-game without sacrificing robustness or controllability.
Evaluation
We evaluate our final 0.52B model against frontier commercial speech models using two protocols: (1) offline objective metrics and (2) in-game preference collected during real gameplay.
Offline, our 0.52B GCA speech model achieves 2.4% word error rate (WER ↓) and 0.9454 speaker consistency (SIM ↑), close to the ground-truth ceiling under the same evaluation protocol.
WER (lower is better) is computed by transcribing with Whisper-Large-v3 and scoring against the reference text. SIM (higher is better) is the cosine similarity between WavLM speaker embeddings from the generated audio and the reference speaker; ground-truth audio defines the ceiling.
For subjective evaluation, we hold the upstream LM fixed and swap only the speech model under identical conditions. We recruited 53 compensated participants for in-person gameplay sessions and collected both ratings and preference choices. Eleven v3 (alpha) performs best on isolated offline clips. In live gameplay, our model achieves the highest preference win rate under the same in-game protocol.
In-person preference study (N=53). Upstream LM held fixed; only the speech model varies. Bars report preference win rate on offline clips vs. in-game sessions (50% = parity).
Achieving character fidelity and presence
In games, voice is not just a channel for information—it is part of a character's identity. Character fidelity and presence depend on both system quality and available context. We will continue refining our training and inference stack and, within our on-device latency and cost budgets, extend the effective context window so the model can condition on longer dialogue history and richer in-scene cues.
Data remains the primary factor that sets the quality ceiling. At our target bar, real conversational speech and performances recorded by professional voice actors consistently outperform synthetic data.
High-fidelity characters also benefit from role-specific coverage. This is closer to a "character cloning" paradigm than traditional voice cloning. The goal is to reproduce not only timbre, but also a character's speech patterns, pacing, emotional range and boundaries, and overall in-character consistency. We plan to share additional research findings and practical lessons learned in future work.
Our current GCA speech model is not yet directly grounded in the game's evolving world state, even though it can track conversational context and emotional shifts. More grounded delivery will require a finer-grained encoding of game state, along with stronger multimodal perception and fusion modules. We are actively advancing this direction to couple voice more tightly to the game world while preserving the robustness and controllability required in a production runtime.
Cognitive Tree & AI Narrative Director
AI-native games should not merely make NPCs "smarter" conversational partners. They should build worlds that players and AI inhabit together.
AI characters maintain their own goals, memories, and social relationships. They perceive world state and sustain ongoing interaction with the player and with one another, so narrative can emerge from those interactions rather than advancing only through predefined scripts.
Traditional games, by contrast, rely on designer-authored dialogue, behavior logic, and a fixed set of player interaction options. Many "AI-wrapped" games add LLMs at the dialogue layer while keeping conventional behavior and narrative architecture intact. Dialogue may feel more capable, but it often remains decoupled from the game world's state and simulation.
AI-native games need to be built differently. Player intent must enter the core loop. That creates a true "possibility space," where players and AI shape the story together through ongoing interaction—rather than following a predetermined script.
This vision becomes playable only if game logic is redesigned around these models. We introduce the Cognitive Tree (CT) to couple language and action in a continuous, bidirectional loop. We also introduce the AI Narrative Director so narrative can evolve dynamically through player–AI interaction. These components run as engine-level modules inside the runtime. They are scheduled alongside rendering and physics within the frame-time budget and bounded by the real-time constraints of the game loop.
Cognitive Tree
Behavior trees have been a standard architecture for NPC decision-making and control since their high-profile use in Halo 2 [9]. They are debuggable, predictable, and controllable, allowing developers to specify how an NPC should respond in each situation and keep gameplay aligned with intended design. This determinism is a cornerstone of commercial game development.
Classic behavior-tree determinism rests on three implicit assumptions: (1) player input can be discretized, so condition nodes evaluate boolean predicates; (2) action outcomes can be reduced to a small set of return statuses—typically Success, Failure, or Running—without carrying semantic meaning; and (3) the behavior system is execution-only, managing control flow rather than cognition.
In closed-domain settings with tightly constrained interaction, these assumptions largely hold. Enemy strategies can be hard-coded, dialogue branches can be exhaustively authored, and sufficiently complex rules can produce a convincing illusion of intelligence.
Natural-language interaction breaks this model. Once player interaction shifts from predefined options to free-form language, NPCs must infer intent and keep cognition and behavior tightly coupled. Under these conditions, all three assumptions fail.
Many LLM-based game AI systems retain the same structural split: cognition sits in a reasoning or dialogue layer, while execution remains controlled by a conventional behavior system, connected only by thin, ad hoc handoffs. The LLM often emits a behavior tree or script once and then exits the loop. Or reasoning and execution are split behind an asynchronous action interface. Dialogue can feel more natural, yet behavior often still reads as scripted.
We designed the Cognitive Tree to remove this structural split. CT treats the behavior tree itself as part of the cognitive architecture, not merely an execution layer driven by one-shot reasoning outputs. Each node has explicit semantics, and runtime execution state becomes a first-class signal the model can reason over directly. In this design, the model and the tree are two views of the same cognitive system.
CT's key innovation is a continuous, bidirectional cognitive pathway between the behavior tree and the language model. This keeps the model in the loop, rather than relying on one-shot generation or external replanning.
Semantic signal input: In a conventional behavior tree, condition nodes evaluate boolean variables. In CT, condition nodes read semantic signals directly from the AI runtime.
We define four sources of semantic signals: (1) the tree's own state (node outcomes, interruption reasons, retry counts), (2) player behavior (approach, blocking, interruptions, natural-language input), (3) narrative state (the current narrative beat, quest phase), and (4) world state (environment variables, physical limits).
Condition nodes therefore no longer test low-level numeric predicates like "distance < 5." They operate on inferred semantic signals such as "the player is avoiding me" or "the narrative is currently in a conflict phase."
Execution feedback loop: In a conventional behavior tree, action nodes return only discrete statuses such as Success, Failure, or Running. In CT, action nodes stream execution feedback to the model, exposing the full execution context. This context includes the outcome, interruption causes (timeouts, preemption by higher-priority tasks, player intervention), the execution trace, and retry history.
For example, if a player interrupts an NPC who's playing the guitar, a traditional system might simply increment a counter and jump to a pre-authored branch. CT instead feeds the full interruption context back to the model—including what the player said and did in the moment, and the current relationship state—so the NPC can infer whether the player is annoyed, curious, or trying to join in, and respond accordingly.
CT keeps the model and the behavior tree in-process, rather than coupling them via an external interface. In this division of labor, the model provides the mind, and the behavior tree provides the spine.
This division of labor is inspired by hierarchical motor control in neuroscience and sensorimotor integration in robotics. In biological motor control, cortical circuits support higher-level, goal-directed planning, while the spinal cord is not merely a passive executor. Spinal circuitry can organize coordinated motor patterns under physical constraints, and ongoing sensory feedback keeps higher-level control in the loop [10]. The result is a closed-loop system rather than a one-way command-and-execute pipeline.
The AI Narrative Director
AI-native games make interaction the primary layer for both narrative and gameplay, enabling a new class of experiences and reshaping the player–game relationship.
Rather than selecting from predefined options, players use natural language, in-world actions, and relationships to change world state. Instead of relying on fixed scripts for content delivery, AI-native games turn ongoing interaction into narrative progression within designer-defined structure and guardrails, keeping the system controllable and traceable.
Two structural bottlenecks still limit long-running, interaction-driven narrative in both traditional games and most current AI approaches.
Predictable pacing and content scalability are difficult to achieve simultaneously. Pacing requires a stable progression structure and disciplined state transitions. Scalability requires coverage across an expanding set of situations and possible player interactions.
Historically, AI Director systems have had to choose between the two. They either rely on a closed event pool to keep the experience curve stable, or rely on large volumes of hand-authored content to broaden coverage, with costs rising quickly as complexity grows. Some systems add dynamism locally while the macro narrative still runs on rails.
The deeper problem is that making "more possibilities" production-ready and maintainable pushes narrative systems back toward discrete triggers and branching conditions. Each additional decision point expands the downstream state space, creating combinatorial growth in the content and logic that must be implemented and validated. Long-horizon scaling quickly becomes unmanageable.
An LLM can dramatically expand what the game can generate. But without an explicit pacing model and anchored narrative beats as guardrails, pacing drifts and the player experience becomes uneven.
The second issue is more structural: traditional games rarely let players express intent in the way people naturally do. Narrative systems then struggle to make those interactions shape the story reliably over time.
In real life, intent is communicated not through buttons, but through language, tone, silence, and continuous social behavior—moving closer or pulling back, probing or avoiding, cooperating or confronting. These cues are the basis on which relationships form and stories evolve.
Most production narrative systems, however, primarily interpret discrete triggers and menu options, such as reaching a location, pressing an interaction prompt, and selecting a dialogue option. Player expression is either compressed into menu-style choices or falls into an unmodeled space the system cannot interpret, and the narrative ultimately advances along a predefined track.
We take a different approach. We are not optimizing for infinite narrative content. We are optimizing for open-ended play within designer-defined structure.
Our design principle is to anchor key narrative beats and let the path to each beat emerge through play. Designers define which moments must happen, which emotions must land, and which boundaries cannot be crossed. Within those constraints, the system lets each player use natural language and open-ended actions to arrive at those beats through different trajectories. This is similar to the narrative control room in Westworld.
The Director does not author a fixed script for the player. Instead, it continuously infers player intent and updates which character reactions and events are available next, keeping the experience personalized while preserving pacing and the intended impact of the key narrative beats.
A believable sense of "controlled evolution" requires two visible capabilities. The world must reliably remember what has happened. The narrative must then organize those events into a compelling progression. We implement this through two core components: Shared Narrative Memory and the AI Narrative Director.
Shared Narrative Memory is the system's single source of truth for narrative state. It maintains a structured record of what the player has actually changed: world facts, the character relationship graph, a canonical log of key events, and the current narrative beat or phase. Before generating any dialogue or action, NPCs read from this shared state to obtain consistent context and guardrails, then decide what to say and do in the moment.
For players, this makes the world feel like it genuinely acknowledges their actions. Choices carry into the next scene. Costs already paid still matter. Trust earned with one character carries over to others and into the quest system.
Shared Narrative Memory replaces isolated per-NPC memory with a single shared record of what actually happened in the world. It avoids a common multi-agent failure mode: each character maintains an isolated state, quests and dialogue drift out of sync, and players are forced to restate what they already did. A shared foundation keeps facts consistent as the cast grows, makes long-horizon consequences traceable across characters, and provides a reliable basis for debugging and controllable narrative orchestration.
The AI Narrative Director handles the other half of the gameplay problem: not just whether the world remembers, but how remembered events become a coherent experience that keeps moving forward. This lets players influence outcomes through natural language and open-ended actions, rather than selecting from predefined branches and dialogue options.
At runtime, the AI Narrative Director tracks the player's dialogue and action trajectory. Using the world and relationship state in Shared Narrative Memory, it makes orchestration decisions within designer-defined narrative boundaries.
For example, it can decide when to push a relationship past a threshold, when to delay a key beat until conditions are credible, when to let a conversation resolve naturally into quest motivation, and when to escalate conflict or release pressure to maintain pacing. Each decision writes back, keeping story events aligned over time with the world's shared narrative state. This lets content scale without sacrificing key beats or lasting consequences.
The critical difference in AI-native games is that players can finally use human-native interaction—language, actions, and relationships—to shape a world that remembers, responds, and evolves. Instead of following the same scripted storyline, each player generates a unique causal trajectory and a personal version of the story within the same designer-defined structure, reaching carefully anchored key moments at the right time.
This enables a new gameplay paradigm: an entertainment form built around sustained, ongoing interaction.
Architecture comparison
Game AI has advanced rapidly in recent years, and several deployable approaches for AI NPCs have emerged. Character.AI popularized LLM-driven character chat, while NVIDIA ACE, Inworld, and Convai have pushed the ecosystem forward with infrastructure, character engines, and authoring workflows.
Yet most of these systems still follow a command-and-execute paradigm. The LLM is often treated as a smarter script generator: it issues commands through an external action interface to a separate behavior system. In effect, this replaces a rules engine with a language model without changing the underlying loop.
Our architecture rewires that loop. Cognitive Tree reframes the behavior tree into a first-class component of the game's cognitive core. The AI Narrative Director turns what the player says and does into narrative progression, anchored to designer-defined beats and guardrails. Perception, cognition, execution, and narrative no longer run as separate stages; they operate as a single, continuously coupled loop.
We implement this architecture end-to-end on-device in our runtime. We believe this is the most practical way to ship AI-native games at scale: it satisfies real-time latency, cost, robustness, and controllability requirements while enabling a new class of player-driven, emergent gameplay.
In service of play
Games are the only medium where the author must recede so the player can emerge. In most art forms, greatness comes from the creator's expression. In games, greatness exists only in the player's experience. That is what makes games uniquely powerful.
We put AI into the core gameplay loop to unlock that strength. The goal is to make the game world genuinely understand the player. NPCs do not need higher benchmark scores. They need to be characters the player wants to care about.
Next, we will push multi-agent narrative coordination further. Once multiple AI-driven NPCs share a world, coordination becomes the hard problem. The system needs a shared view of world state, coherent narrative progression, and social relationships that form through interaction.
Emergent narrative is becoming an increasingly active research frontier in AI games. However, two foundational problems remain open. First, we need to keep closing the gap between AI-generated content and the bar set by designer-authored content in shipped games. Second, the field needs a reproducible, quantitative evaluation loop to improve emergent content over time—what we call quality-aware emergence. Without this loop, production teams lack clear optimization targets and a reliable way to iterate.
Great games start with a clear definition of fun, and technology should amplify it. AI-native games inherit the same principle. Their enjoyment still rests on decades of game design craft—pacing, game feel, and emotional arcs. Simply adding AI doesn't create a great experience. It has to be shaped by a designer's intent and judgment, then validated through iteration.
We will keep pushing toward tighter integration between AI capabilities and design craft so they reinforce each other. The result is a player experience that feels genuinely joyful from the moment a player picks up the controller.
Looking Ahead
Video games have reinvented themselves repeatedly—moving from arcades to home consoles, from 2D to 3D, and from offline play to persistent online worlds. We believe the next major shift will be AI-native games.
Our aim is to integrate AI in a way that reshapes the core player experience and unlocks the next wave of innovation in how games and characters are built. To make that practical, we're designing our game and NPC architecture around effectively zero marginal inference cost, low latency, intent-aware narrative direction, and a seamless, native gameplay feel.
This work extends beyond the game itself. We take a systems view of the full stack: training infrastructure, data and feedback loops that drive continual improvement, a measurement and evaluation framework that can evolve over time, and a runtime layer that synchronizes gameplay with on-device inference and orchestration.
We will keep advancing that runtime and work toward broad adoption of it as a standard foundation for AI-native games—and, over time, for a broader on-device application ecosystem. Next, we will launch a publicly playable AI-native game. We will use it as a proving ground to evolve the inference runtime and developer tooling in tandem, while enabling a UGC community to form and new playstyles to emerge.
AI-native games are also a natural testbed for AGI. They can give agents a long-lived, real-world-like interactive environment and grounded feedback from real players. NPCs in these worlds need to learn and remember the adventures they share with players, then fold those memories into their own cognition. Prompts alone, hard-coded knowledge, and the context windows of Transformers are not enough; reaching this level requires architectural innovation.
Just as many of the most important breakthroughs in AI have emerged from games, we believe the next pivotal innovation will follow a similar path. We hope next-generation games can do more than deliver a transformative gameplay experience. They can also help catalyze the next phase of AI development, accelerate scientific progress and discovery, and ultimately deliver meaningful benefits to the world.