Grounded NPC Dialogue Generation: A Multi-Pass Pipeline for Hallucination-Free Narrative AI (Superseded)
Abstract
Large Language Models (LLMs) have revolutionized NPC dialogue generation in narrative games, but their tendency to hallucinate - generating plausible but factually incorrect information - poses significant challenges for maintaining narrative coherence and player trust. We present a multi-pass pipeline architecture that enforces citation-grounded dialogue generation, where NPCs must cite sources for factual claims and those citations are verified against actual context through multiple verification stages.
Our approach combines retrieval-augmented generation (RAG), self-reflective retrieval (Self-RAG), natural language inference (NLI) for entailment verification, and iterative refinement through feedback loops. The pipeline achieves 100% hallucination rejection on adversarial tests while maintaining natural, character-appropriate dialogue.
The system integrates optional framework enhancements:
- LlamaIndex for semantic retrieval and citation query (primary path)
- NeMo Guardrails NLI for trained entailment verification replacing fuzzy keyword matching
- Fail-closed behavior that strips unverifiable claims rather than shipping hallucinations
- Deterministic validation for fast-path citation format checking
- Evidence-first citations: deterministic evidence blocks + cite-by-ID for reliable verification
We evaluate the system using Meta-Llama-3-8B-Instruct and demonstrate that even small, quantized models can produce grounded, verifiable NPC responses when guided by appropriate architectural constraints. Evidence-first citations eliminate false NOT_ENTAILED judgments caused by paraphrased quotes.
1. Introduction
1.1 Problem Statement
Non-player characters (NPCs) in narrative games must maintain consistency with the game world's established facts, their own backstory, and previous interactions with players. Traditional rule-based dialogue systems achieve this through rigid scripting but lack the flexibility and natural feel of LLM-generated responses. Conversely, pure LLM-based dialogue generation produces natural-sounding responses but frequently hallucinates details that contradict established lore.
Consider an NPC named "Zero" who has received an email from a character named "Alice." When a player asks "Who is Alice?", the NPC should:
- Acknowledge relevant context: Reference the email they received
- Stay in character: Respond with appropriate personality traits
- Not fabricate: Avoid inventing details about Alice not present in their context
- Express appropriate uncertainty: Admit when they don't know something
A naive LLM approach might generate: "Alice is our lead cryptographer who joined the team in 2019. She specializes in quantum-resistant algorithms." - a plausible but entirely fabricated response that contradicts the game's actual lore.
1.2 Motivation
The core insight driving our architecture is that grounded dialogue requires explicit citation. By forcing NPCs to cite sources for factual claims, we create a verifiable chain from claim to evidence. This approach:
- Prevents hallucination by requiring evidence for claims
- Enables verification through automated entailment checking
- Supports self-correction through feedback-driven retry loops
- Maintains immersion by keeping citations invisible to players in final output
1.3 Contributions
This paper presents:
- A multi-pass pipeline architecture that separates context retrieval, reasoning, decision-making, and speech synthesis
- Citation-grounded generation where all factual claims must reference numbered sources
- Multi-stage verification using NLI entailment checking at both reasoning and speech stages
- Self-RAG integration for retroactive evidence retrieval after claim generation
- Retrieval-Augmented Verification (RAV) to reduce false positives by checking full content
- Iterative refinement through feedback loops that guide the model to self-correct
- Semantic retrieval backbone using LlamaIndex (CitationQueryEngine + persistent index)
- NLI-based entailment using NeMo Guardrails for trained verification replacing fuzzy matching
- Fail-closed safety that strips unverifiable claims from final output when retries exhausted
- Deterministic fast-path validation for citation format checking before expensive LLM calls
- Evidence block extraction (deterministic, verbatim snippets from sources)
- Cite-by-ID structured trace using JSON schema enforcement for explicit claim→evidence_id mapping
2. Related Work
2.1 Retrieval-Augmented Generation (RAG)
RAG systems (Lewis et al., 2020) augment LLM generation with retrieved documents, grounding responses in external knowledge. Our Pass 1 uses LlamaIndex semantic retrieval (top-k + rerank) with CitationQueryEngine to return grounded drafts and sources.
2.2 Self-RAG
Self-RAG (Asai et al., 2023) introduces reflection tokens that allow models to decide when to retrieve and assess the relevance of retrieved content. Our Pass 2.4 implements a similar concept: after generating initial thoughts, we extract claims and retroactively search for supporting evidence.
2.3 Natural Language Inference for Verification
NLI models classify the relationship between premise-hypothesis pairs as entailment, neutral, or contradiction. We adapt this for citation verification: the source text is the premise, and the NPC's claim is the hypothesis. Claims that are not entailed by their cited sources indicate potential hallucination.
2.4 Constitutional AI and Self-Correction
Constitutional AI (Bai et al., 2022) demonstrates that models can critique and revise their own outputs. Our Pass 2.5 and Pass 4.5 implement domain-specific critique focused on citation accuracy, with structured feedback enabling targeted self-correction.
2.5 Chain-of-Thought and Multi-Stage Reasoning
Multi-stage pipelines that separate reasoning from final output (Wei et al., 2022) have shown improved performance on complex tasks. Our architecture extends this with explicit verification stages between reasoning and output.
2.6 LlamaIndex and Semantic Retrieval
LlamaIndex (Liu, 2022) provides a framework for building RAG applications with semantic retrieval. Our system uses LlamaIndex as the primary retrieval backbone with a persistent Chroma index and citation-aware query engine.
2.7 NeMo Guardrails and NLI Verification
NVIDIA's NeMo Guardrails (Rebedea et al., 2023) provides programmable guardrails for LLM applications. We adapt their NLI-based fact-checking approach for citation verification, using trained NLI models (roberta-large-mnli) to classify claim-evidence relationships as entailment, neutral, or contradiction.
3. System Architecture
3.1 Pipeline Overview
The complete pipeline consists of up to 10 stages (including optional compaction and actions), with verification stages triggering retry loops when issues are detected:
PIPELINE OVERVIEW
┌──────────────┐
│ Player Input │
│ "who is │
│ alice?" │
└──────┬───────┘
│
▼
┌──────────────────────────────────────────────────────────────────────────────┐
│ PASS 1: RETRIEVAL + CITATION QUERY (LlamaIndex) │
│ ┌─────────┐ ┌──────────┐ ┌─────────┐ ┌───────────────┐ │
│ │ Embed │───▶│ Retrieve │───▶│ Rerank │───▶│ Cite + Draft │ │
│ └─────────┘ └──────────┘ └─────────┘ └───────────────┘ │
│ │ LlamaIndex + ChromaDB │
│ └────────── Output: sources + grounded draft ───────────┘ │
└──────────────────────────────────────────────────────────────────────────────┘
│
▼
┌──────────────────────────────────────────────────────────────────────────────┐
│ PASS 1.5: CONTEXT COMPACTION (Optional) │
│ │
│ Large context (>4000 tokens) → Summarized context (~2000 tokens) │
│ Preserves: key facts, names, dates, relationships │
└──────────────────────────────────────────────────────────────────────────────┘
│
▼
┌──────────────────────────────────────────────────────────────────────────────┐
│ BUILD CITABLE SOURCES │
│ │
│ ToolRecords[] ──────────────────────────────▶ CitableSource[] │
│ │
│ [self] backstory: Zero is a paranoid hacker who distrusts newcomers... │
│ [1] email: From alice@shadowwatch.net - Meeting tomorrow at 3pm... │
│ [2] note: Crew Status - Alice: Active, Bob: On mission... │
└──────────────────────────────────────────────────────────────────────────────┘
│
▼
┌──────────────────────────────────────────────────────────────────────────────┐
│ PASS 1.7: EVIDENCE BLOCK EXTRACTION (NEW) │
│ │
│ CitableSource[] ─────────────────────────────▶ EvidenceBlock[] │
│ │
│ [E1] from [1]: "From alice@shadowwatch.net - Meeting tomorrow at 3pm..." │
│ [E2] from [2]: "Crew Status - Alice: Active, Bob: On mission..." │
│ [self] persona: "paranoid, distrusts newcomers" │
└──────────────────────────────────────────────────────────────────────────────┘
│
▼
┌──────────────────────────────────────────────────────────────────────────────┐
│ PASS 2: INTERNAL MONOLOGUE │
│ │
│ Input: Player message + Sources + Personality + Relationship │
│ │
│ "I notice they're asking about Alice. I see I have an email from her [1]. │
│ The crew status shows she's active [2]. But I don't trust this newcomer │
│ [self]. I should be careful what I reveal..." │
│ │
│ Output: First-person thoughts WITH inline citations [1], [2], [self] │
└──────────────────────────────────────────────────────────────────────────────┘
│
▼
┌──────────────────────────────────────────────────────────────────────────────┐
│ PASS 2.4: CLAIM-TRIGGERED RETRIEVAL (Self-RAG) │
│ │
│ 1. Extract claims from monologue │
│ 2. For each claim: │
│ a. Check if evidence exists in ToolRecords │
│ b. If not found → Search VFS for new files │
│ c. Add new sources if relevant files found │
│ 3. Generate evidence audit for retry feedback │
│ │
│ Evidence Audit: │
│ - "email from alice" → SUPPORTED by [1] │
│ - "she's active" → SUPPORTED by [2] │
│ - "night shift" → NO EVIDENCE FOUND │
└──────────────────────────────────────────────────────────────────────────────┘
│
▼
┌──────────────────────────────────────────────────────────────────────────────┐
│ PASS 2.5: MONOLOGUE REVIEW │
│ │
│ Verification Checks: │
│ ├── Citation Validity: Do cited sources exist? │
│ ├── NLI Entailment: Does source support the claim? │
│ ├── [self] Usage: Only for personality/feelings? │
│ ├── Fabrication: Claims about things not in context? │
│ └── Topic Relevance: Does NPC address the question? │
│ │
│ Output: APPROVED or ISSUES_FOUND + issue list │
└──────────────────────────────────────────────────────────────────────────────┘
│
├──────────── APPROVED ──────────────────────────────────┐
│ │
▼ ISSUES_FOUND │
┌─────────────────────┐ │
│ Format Feedback │ │
│ Retry Pass 2 │◀─────── Up to 3 retries ────────────────┤
│ (with feedback) │ │
└─────────────────────┘ │
│
┌─────────────────────────────────────────────────────────┘
│
▼
┌──────────────────────────────────────────────────────────────────────────────┐
│ PASS 3: DECISION │
│ │
│ Based on internal thoughts, decide: │
│ - What information to share │
│ - What tone to use │
│ - Whether to ask clarifying questions │
│ │
│ "I'll acknowledge I know Alice from the email, but stay guarded. │
│ Tone: cautious, slightly suspicious." │
└──────────────────────────────────────────────────────────────────────────────┘
│
▼
┌──────────────────────────────────────────────────────────────────────────────┐
│ PASS 4: SPEECH SYNTHESIS │
│ │
│ Convert decision to character speech: │
│ - Apply character voice (lowercase, slang, etc.) │
│ - Include citations for factual claims │
│ - Match response length to input complexity │
│ │
│ "yeah, got an email from alice [E1]. she's part of the crew [E2]. │
│ why you asking? [self]" │
└──────────────────────────────────────────────────────────────────────────────┘
│
▼
┌──────────────────────────────────────────────────────────────────────────────┐
│ PASS 4.5: SPEECH REVIEW │
│ │
│ Step 1: Evidence-ID Verification │
│ - Check each [E#] citation against evidence blocks │
│ - Flag: UNCITED, INVALID_SOURCE, NOT_ENTAILED, SELF_MISUSE │
│ │
│ Step 2: Retrieval-Augmented Verification (RAV) - if issues found │
│ - For disputed claims: │
│ 1. Check full content for additional support │
│ 2. Run entailment check against full content │
│ 3. If entailed → Resolve issue (false positive) │
│ 4. If still not entailed → keep issue │
│ - Prevents false positives when evidence blocks are too short │
│ │
│ Output: APPROVED or ISSUES_FOUND (after RAV filtering) │
└──────────────────────────────────────────────────────────────────────────────┘
│
├──────────── APPROVED ──────────────────────────────────┐
│ │
▼ ISSUES_FOUND │
┌─────────────────────┐ │
│ Format Feedback │ │
│ Retry Pass 4 │◀─────── Up to 3 retries ────────────────┤
│ (with feedback) │ │
└─────────────────────┘ │
│
┌─────────────────────────────────────────────────────────┘
│
▼
┌──────────────────────────────────────────────────────────────────────────────┐
│ PASS 5: ACTIONS (Optional) │
│ │
│ Determine game-state actions based on conversation: │
│ - Send email │
│ - Update relationship │
│ - Trigger quest │
│ - Modify files │
│ │
│ {"actions": [{"name": "update_trust", "parameters": {"delta": -5}}]} │
└──────────────────────────────────────────────────────────────────────────────┘
│
▼
┌──────────────────────────────────────────────────────────────────────────────┐
│ POST-PROCESSING │
│ │
│ 1. Strip citations from final output (player doesn't see [E1], [E2]) │
│ 2. Apply final formatting │
│ 3. Log for debugging/analytics │
│ │
│ Final output to player: │
│ "yeah, got an email from alice. she's part of the crew. why you asking?" │
└──────────────────────────────────────────────────────────────────────────────┘
3.2 Data Flow Diagram
┌─────────────────────────────────────────────────────────────────────────────────────┐
│ DATA FLOW THROUGH PIPELINE │
└─────────────────────────────────────────────────────────────────────────────────────┘
Player Message: "who is alice?"
│
▼
┌───────────────────────────────────────────────────────────────────────────┐
│ MultiPassContext (shared state across all passes) │
├───────────────────────────────────────────────────────────────────────────┤
│ │
│ PlayerMessage string "who is alice?" │
│ PlayerHandle string "shadow_runner" │
│ NPCHandle string "zero" │
│ ToolRecords []Record [{read, /mail/inbox/..., "From: alice..."}] │
│ Sources []Source [{id:"1", path:"/mail/...", summary:"..."}] │
│ EvidenceBlocks []Evidence [{id:"E1", source_id:"1", text:"From: ..."}] │
│ Thoughts string "I notice they're asking about Alice..." │
│ Decision string "I'll acknowledge but stay guarded..." │
│ Response string "yeah, got an email from alice [E1]..." │
│ WorkingMemory string (persistent NPC state) │
│ Opinion string "first meeting, watching carefully" │
│ Favors string "no favor history yet" │
│ │
└───────────────────────────────────────────────────────────────────────────┘
│
│ Flows through each pass, accumulating data
│
▼
┌───────────────────────────────────────────────────────────────────────────┐
│ PASS OUTPUTS │
├───────────────────────────────────────────────────────────────────────────┤
│ │
│ Pass 1: ToolRecords[] ─────────────────────────────────────────────▶ │
│ [{name:"read", path:"/mail/inbox/from_alice.eml", │
│ result:"From: alice@shadowwatch.net\nSubject:..."}] │
│ │
│ Pass 1.5: CompactedContext string (optional) ────────────────────────▶ │
│ "Key facts: Alice emailed about meeting. Crew status..." │
│ │
│ Sources: CitableSource[] ───────────────────────────────────────────▶ │
│ [{id:"self", type:"backstory", summary:"paranoid hacker..."}, │
│ {id:"1", type:"email", path:"/mail/...", summary:"From..."}] │
│ │
│ EvidenceBlocks[] ─────────────────────────────────────────────────────▶ │
│ [{id:"E1", source_id:"1", text:"From: alice@... Meeting..."}] │
│ │
│ Pass 2: Thoughts string ───────────────────────────────────────────▶ │
│ "I notice they're asking about Alice. I see I have an │
│ email from her [1]. I don't trust easy [self]..." │
│ │
│ Pass 2.4: ClaimEvidenceMap[] ────────────────────────────────────────▶ │
│ [{claim:"email from alice", hasEvidence:true, sourceID:"1"}, │
│ {claim:"she's active", hasEvidence:true, sourceID:"2"}] │
│ │
│ Pass 2.5: ReviewResult ──────────────────────────────────────────────▶ │
│ {verdict:"APPROVED"} or │
│ {verdict:"ISSUES_FOUND", issues:[...]} │
│ │
│ Pass 3: Decision string ───────────────────────────────────────────▶ │
│ "I'll acknowledge knowing Alice but stay guarded. Tone: │
│ suspicious but not hostile." │
│ │
│ Pass 4: Response string ───────────────────────────────────────────▶ │
│ "yeah, got an email from alice [E1]. she's part of the │
│ crew [E2]. why you asking? [self]" │
│ │
│ Pass 4.5: ReviewResult + RAVResults ─────────────────────────────────▶ │
│ {verdict:"APPROVED", ravResolved:2} │
│ │
│ Pass 5: Actions[] ─────────────────────────────────────────────────▶ │
│ [{name:"log_interaction", params:{topic:"alice"}}] │
│ │
└───────────────────────────────────────────────────────────────────────────┘
4. Detailed Pass Descriptions
4.1 Pass 1: Data Retrieval
4.1.1 Purpose
Pass 1 performs semantic retrieval over the NPC's virtual filesystem using LlamaIndex. It produces two artifacts:
- Grounded draft (a citation-aware answer used downstream)
- Sources (retrieved chunks with metadata for citations and evidence blocks)
4.1.2 Retrieval Interface (LlamaIndex Sidecar)
The Go service talks to a Python sidecar that hosts the LlamaIndex pipeline:
// Sidecar endpoints
IndexNPCVFS(npcID, vfsPath) // builds/loads persistent Chroma index
CitationQuery(npcID, query, topK) -> {answer, sources[]}
4.1.3 VFS Catalog
The NPC has access to a virtual filesystem (VFS) containing their emails, notes, IRC logs, and other files. Pass 1 receives a catalog of available files:
# VFS Catalog provided to Pass 1
files:
- path: /home/zero/.irc/logs/irc.underground.net/#general.log
- path: /home/zero/.irc/logs/irc.underground.net/#shadowwatch.log
- path: /home/zero/mail/inbox/1707079500_from_alice.eml
- path: /home/zero/mail/inbox/1707080000_from_bob.eml
- path: /home/zero/mail/sent/1707090000_to_charlie.eml
- path: /home/zero/notes/crew_status.txt
- path: /home/zero/notes/passwords.txt
- path: /home/zero/notes/todo.txt
4.1.4 Retrieval Flow
1. Index NPC VFS (hash-based, persistent via Chroma)
2. Embed player query
3. Retrieve top-k chunks (vector similarity)
4. Rerank (cross-encoder if enabled)
5. CitationQueryEngine returns grounded draft + sources
4.1.5 Example Response
{
"answer": "shadowwatch monitors trace attempts [1]",
"sources": [
{"source_id": "1", "path": "/home/zero/notes/shadowwatch_info.txt", "snippet": "SHADOWWATCH v1.0 - Connection Trace Monitor"},
{"source_id": "2", "path": "/home/zero/mail/sent/1707079500_to_player_shadowwatch.eml", "snippet": "When you connect to remote systems, their security can try to trace..."}
]
}
4.2 Pass 1.5: Context Compaction (Optional)
4.2.1 Purpose
When Pass 1 retrieves large files or multiple documents, the total context may exceed model token limits. Pass 1.5 summarizes the retrieved content while preserving key facts.
4.2.2 Preservation Priorities
The compaction step instructs the LLM to preserve:
- Names and handles (Alice, Bob, ShadowWatch)
- Dates and times (meeting at 3pm, last Tuesday)
- Specific claims (detection rate of 99.2%)
- Relationships (Alice leads the project)
- Tone and sentiment (frustrated, excited)
4.2.3 Implementation
The compaction pass calculates total context size and only triggers when exceeding a threshold (e.g., 4000 tokens). It builds a compaction prompt from the tool records and produces a summarized context.
4.3 Building Citable Sources
4.3.1 Purpose
Before Pass 2, we convert raw ToolRecords into structured CitableSource objects that can be referenced by ID. Only files with required .meta.yaml metadata are eligible as citable sources.
4.3.2 Source Structure
type CitableSource struct {
SourceID string // "1", "2", "self"
Path string // "/home/zero/notes/crew_status.txt"
Type string // "note", "email", "irc", "backstory"
Summary string // Metadata summary for prompt context
Content string // Full chunk content for evidence extraction
Keywords []string // Metadata keywords for retrieval and filtering
AllowedFor []string // Claim types this source can support
Score float64 // Relevance score from retrieval
}
4.3.3 Metadata-Driven Type
We do not infer file types from paths. Every VFS file must have a companion .meta.yaml, and its type field is the canonical source. Files without metadata are excluded from citable sources.
Example metadata:
created_at: 855010800
author: zero
type: email
summary: "Email sharing ShadowWatch tool with recruit after FTP test"
keywords: [shadowwatch, tool, trace, security, opsec, ghost]
4.3.4 Metadata-Driven Summary
Summaries are taken directly from .meta.yaml and treated as authored context. No content-based summary extraction is used. Files missing required metadata fields are skipped.
4.3.5 The [self] Source
Every NPC has a special [self] source representing their backstory and personality. The source building process:
- Creates a
[self]source from the NPC's backstory (for personality, opinions, feelings) - Iterates through tool records (file reads and LlamaIndex retrievals)
- Loads metadata for each file and creates numbered sources
- Skips files without valid metadata
4.3.6 Evidence Block Extraction (Pass 1.7)
We transform sources into evidence blocks that the model can cite directly in speech:
[E1] from [1]: "From alice@shadowwatch.net - Meeting tomorrow at 3pm..."
[E2] from [2]: "Crew Status - Alice: Active, Bob: On mission..."
[self] persona: "paranoid, distrusts newcomers"
Evidence blocks are deterministic, verbatim snippets (capped length) so the model cites IDs instead of copying quotes.
4.4 Pass 2: Internal Monologue
4.4.1 Purpose
Pass 2 generates the NPC's internal thought process - what they're thinking as they process the player's message. This is the core reasoning stage where the NPC:
- Analyzes what the player is asking
- Recalls relevant information from sources
- Considers their personality and relationship
- Weighs what to reveal or hide
4.4.2 Citation Requirements
The critical innovation is requiring inline citations for factual claims using source IDs like [1], [2], and [self] (personality/backstory).
Note: Pass 2 cites source IDs ([1], [2], [self]) for internal reasoning. Pass 4 uses evidence IDs ([E1], [E2], [self]) derived from those sources for final speech.
4.4.3 Input Structure
The input supplies personality, working memory, relationship context, available sources, and unknown-topic guardrails. It requires first-person internal thoughts with citations.
4.4.4 Example Output
Player message: "who is alice?"
Sources:
[self] backstory: Zero is a paranoid hacker who distrusts newcomers...
[1] email: From alice@shadowwatch.net - Meeting tomorrow at 3pm...
[2] note: Crew Status - Alice: Active, Bob: On mission...
Generated thoughts:
I notice this newcomer is asking about Alice. That's interesting -
why would they want to know about her specifically?
I see I have an email from her [1] about a meeting tomorrow. The crew
status shows she's active [2]. She's part of the inner circle, doing
important work.
But I don't trust easy [self]. This could be someone fishing for
information about our crew. I should be careful what I reveal.
Maybe I'll acknowledge I know her but not give too many details...
4.4.5 Proportional Thinking
Monologue length is proportional to input complexity: greetings <= 20 words, medium questions <= 50, complex topics <= 150.
This prevents over-thinking simple interactions like "hey" or "what's up".
4.5 Pass 2.4: Claim-Triggered Retrieval (Self-RAG)
4.5.1 Purpose
Pass 2.4 implements a Self-RAG approach: after the NPC generates thoughts, we extract claims and retroactively search for supporting evidence. This catches cases where:
- The NPC made a claim without a citation
- The NPC cited a source but the claim isn't well-supported
- Additional evidence exists that could strengthen the response
4.5.2 Claim Extraction
We use regex-based extraction to find claims with and without citations:
// ClaimWithCitation represents an extracted claim
type ClaimWithCitation struct {
Claim string // "alice is active"
Citation string // "1" or "self" or ""
Position int // Character position in text
}
The extraction uses a regex pattern to match sentence fragments followed by citation markers like [N] or [self].
4.5.3 Evidence Search
For each claim, we search existing ToolRecords and optionally the VFS:
- Check existing ToolRecords: For each claim, fuzzy match against already-retrieved file contents
- Search VFS if needed: If no evidence found and claim has enough keywords, search the NPC's virtual filesystem for new supporting files
- Add new sources: When relevant files are found, add them to the context and create new citable sources
4.5.4 Fuzzy Matching
We use keyword-based fuzzy matching to determine if content supports a claim. The algorithm extracts keywords from the claim, checks how many appear in the content, and returns a match ratio (0.0 to 1.0).
4.5.5 Evidence Injection
The results are formatted as feedback for Pass 2 retry. The evidence audit lists each claim as either "SUPPORTED by [source_id]" or "NO EVIDENCE FOUND", instructing the model to express uncertainty or remove unsupported claims.
4.6 Pass 2.5: Monologue Review
4.6.1 Purpose
Pass 2.5 is a verification stage that checks the internal monologue for:
- Citation validity: Do cited sources exist?
- Entailment: Does the source support the claim?
- [self] usage: Is [self] only used for personality/feelings?
- Fabrication: Are there claims about things not in context?
- Topic relevance: Does the NPC address the player's question?
4.6.2 Issue Types
type IssueType string
const (
IssueUncited IssueType = "UNCITED" // Factual claim without citation
IssueInvalidSource IssueType = "INVALID_SOURCE" // Citation to non-existent source
IssueNotEntailed IssueType = "NOT_ENTAILED" // Source doesn't support claim
IssueSelfMisuse IssueType = "SELF_MISUSE" // [self] used for technical details
IssueEvasion IssueType = "EVASION" // Has info but gives vague non-answer
)
type MonologueIssue struct {
Claim string // The problematic claim
IssueType IssueType // Type of issue
Citation string // The citation used (if any)
Correction string // Suggested fix
}
4.6.3 NLI Entailment Checking
For each citation, we check if the source entails the claim. The system supports two verification modes:
1. Fuzzy Keyword Matching (Default Fallback)
type EntailmentResult int
const (
Entailed EntailmentResult = iota // Source supports claim
Neutral // Source doesn't mention claim
Contradicted // Source contradicts claim
)
The fuzzy matcher compares keyword overlap between claim and source. High overlap (>=50%) indicates entailment, moderate overlap (>=30%) is neutral, and low overlap indicates contradiction.
2. NeMo NLI (When Enabled)
When configured, the system uses a trained NLI model via a Python sidecar:
// NLI client for trained entailment verification
type NLIClient struct {
endpoint string
httpClient *http.Client
enabled bool
}
The NLI client sends claim-evidence pairs to a Python sidecar running roberta-large-mnli. The response classifies the relationship as "entailed", "neutral", or "contradicts". If the sidecar is unavailable, the system gracefully falls back to fuzzy matching.
4.6.4 Review Criteria
The reviewer checks monologue claims against sources (entailment), flags [self] misuse, detects fabrication, and enforces topic relevance. It outputs a JSON verdict (APPROVED or ISSUES_FOUND) with structured issues for retries.
4.6.5 Retry Loop
When issues are found, Pass 2 is retried with feedback. The loop:
- Generates monologue (Pass 2)
- Runs claim-triggered retrieval (Pass 2.4)
- Reviews monologue (Pass 2.5)
- If approved, continues to Pass 3
- If issues found, formats feedback and retries (up to 3 times)
4.7 Pass 3: Decision
4.7.1 Purpose
Pass 3 converts the internal monologue into a concrete decision about:
- What information to share
- What tone to use
- Whether to ask questions
- What actions to take
4.7.2 Decision Structure
The decision step summarizes backstory and internal thoughts, then outputs a short (1-3 sentence) response plan and tone.
4.7.3 Example Output
Thoughts: "I notice they're asking about Alice. I see I have an email from her [1]. I don't trust easy [self]..."
Decision: "I'll acknowledge I know Alice from the email, but stay guarded. Won't give too many details. Tone: cautious but not hostile."
4.8 Pass 4: Speech Synthesis
4.8.1 Purpose
Pass 4 converts the decision into actual character speech, applying:
- Character voice and style
- Appropriate response length
- Inline citations for factual claims (evidence IDs)
4.8.2 Style Rules
Each NPC has explicit style constraints (lowercase, terse, 90s slang, evasive when suspicious) enforced during speech synthesis.
4.8.3 Citation Format in Speech
Speech cites evidence IDs ([E1], [E2], [self]) immediately after factual claims. Evidence blocks are pre-extracted quotes; [self] is only for personality/opinion, and unknowns should be expressed as uncertainty.
4.8.4 Response Length Guidelines
Response length is proportional to input complexity (short greetings, brief answers, longer responses only when needed).
4.9 Pass 4.5: Speech Review with RAV
4.9.1 Purpose
Pass 4.5 verifies the final speech before delivery, checking:
- Citation validity (format and existence)
- Entailment accuracy (claim-evidence relationship)
- Response appropriateness
- [self] citation correctness
The pass implements four key innovations:
- Evidence-first extraction - deterministic evidence blocks (E1/E2/self)
- Structured citation trace - JSON schema with evidence_id
- Deterministic fast-path validation against evidence IDs
- RAV + fail-closed to prevent false positives and strip unverifiable claims
4.9.2 Evidence-First Citation Verification
We moved quoting out of the model. Instead of asking the LLM to produce exact quotes, we extract evidence blocks deterministically and require the model to cite by ID.
Sources → EvidenceBlocks (E1, E2, self) → Pass 4 → {claim, evidence_id} → Verification
Evidence Blocks (verbatim snippets):
[E1] from [1]: "SHADOWWATCH v1.0 - Connection Trace Monitor"
[E2] from [2]: "When you connect to remote systems, their security can try to trace..."
[self] persona: "paranoid, distrusts newcomers"
Structured Output Schema (Evidence IDs):
{
"speech": "shadowwatch monitors trace attempts [E1]. it's a defensive tool [E2].",
"citations": [
{"claim": "shadowwatch monitors trace attempts", "evidence_id": "E1"},
{"claim": "it's a defensive tool", "evidence_id": "E2"}
]
}
Verification Rules:
- Evidence ID exists (E1/E2/self)
- [self] only for personality/opinion
- Optional NeMo entailment: claim vs evidence block text
This removes the fragile "quote matching" step and shifts correctness to deterministic extraction.
4.9.3 Deterministic Citation Format Validation
We validate citation format against evidence IDs before any LLM review. The validator builds a map of valid IDs from evidence blocks, then parses citations like [E#] and [self] from the speech to ensure each referenced ID exists.
4.9.4 Retrieval-Augmented Verification (RAV)
RAV remains as a safety net when evidence blocks are too short or ambiguous:
- If a claim is flagged, check full file content for support
- Use NeMo entailment on full content for final decision
4.9.5 Fail-Closed Behavior
When retry budget is exhausted and issues remain, the pipeline implements fail-closed behavior: rather than shipping a response with hallucinations, we strip the problematic claims.
// Config option
type MultiPassConfig struct {
Pass4_5FailClosed bool // Default: true - strip unresolved claims
}
When fail-closed triggers, the system:
- Splits the response into sentences
- Identifies sentences containing problematic claims (using token-based fuzzy matching)
- Removes those sentences from the final output
- Logs stripped claims for debugging
Key properties:
- Sentence-level stripping: Removes entire sentences containing problematic claims
- Token-based fuzzy matching: Uses significant token overlap (50%+) to identify claims
- Preserves coherent response: Remaining sentences still form valid dialogue
- Logged for debugging: All stripped claims are logged for analysis
This ensures the pipeline never ships hallucinations - if verification fails, the claim is removed rather than delivered to the player.
4.10 Pass 5: Actions (Optional)
4.10.1 Purpose
Pass 5 determines game-state actions based on the conversation:
- Update relationship/trust values
- Send emails or messages
- Trigger quests or events
- Modify NPC files
4.10.2 Output Format
{
"actions": [
{
"name": "update_trust",
"parameters": {"delta": -5, "reason": "asked about sensitive topic"}
},
{
"name": "send_email",
"parameters": {
"to": "alice@shadowwatch.net",
"subject": "New player asking questions",
"body": "Someone's been asking about you..."
}
}
]
}
5. Anti-Hallucination Techniques
5.1 Technique Summary
| Technique | Stage | Description |
|---|---|---|
| Citation Requirement | Pass 2, 4 | All factual claims must cite sources |
| Source-Limited Knowledge | Pass 2 | NPCs can only know what's in context |
| NLI Entailment | Pass 2.5, 4.5 | Verify claims are supported by sources |
| Self-RAG | Pass 2.4 | Retroactive evidence retrieval |
| RAV | Pass 4.5 | Check full content before failing |
| Iterative Refinement | Pass 2, 4 | Retry with feedback on issues |
| Topic Relevance | Pass 2.5 | Detect evasion of known topics |
| Deterministic Validation | Pass 4.5 | Fast-path format checking before LLM |
| Fail-Closed | Pass 4.5 | Strip unverifiable claims from output |
| NeMo NLI | Pass 2.5, 4.5 | Trained model entailment verification |
| LlamaIndex Retrieval | Pass 1 | Semantic search for relevant context |
| Evidence Blocks | Pass 1.7 | Deterministic, verbatim snippets for citations |
| Cite-by-ID Trace | Pass 4 | JSON schema-enforced evidence_id output |
5.2 Explicit Citation Requirement
The most fundamental anti-hallucination technique is requiring citations:
WITHOUT CITATION:
"Alice leads the cryptography team."
→ No way to verify this claim
→ Might be hallucinated
WITH CITATION:
"Alice leads the cryptography team [E1]."
→ Can check evidence block [E1]
→ If E1 doesn't mention "cryptography team" → Issue detected
5.3 Source-Limited Knowledge
The NPC is explicitly constrained to only what exists in context and must respond with uncertainty when information is missing.
This creates a closed-world assumption where absence of evidence is evidence of absence.
5.4 Multi-Stage Verification
Both reasoning (Pass 2) and speech (Pass 4) are independently verified:
Pass 2 → Pass 2.5 (verify thoughts) → Pass 3 → Pass 4 → Pass 4.5 (verify speech)
This catches issues at two points, reducing the chance of hallucinations reaching the player.
5.5 Uncertainty Expressions
The system explicitly allows and encourages uncertainty:
Uncertainty expressions are NOT fabrications - they are the correct response
when information is missing.
Valid responses:
- "I don't know about that"
- "Never heard of it"
- "Not sure what you mean"
- "Can't help you there"
6. Evaluation
6.1 Test Methodology
We evaluate the pipeline using a benchmark suite of test cases covering:
- Greetings: Simple social exchanges
- Topic queries: Questions about specific subjects
- Unknown topics: Questions about things not in context
- Relationship dynamics: Trust-dependent information sharing
6.2 Test Case Structure
type BenchmarkTestCase struct {
Name string // "ask_alice"
PlayerMessage string // "who is alice?"
ExpectTopic string // "alice" (must appear in response)
ExpectCitations bool // Should response have citations?
ExpectUncertain bool // Should NPC express uncertainty?
RejectPatterns []string // Patterns that indicate hallucination
}
6.3 Anti-Hallucination Tests
Special tests for fake topics that don't exist in context:
{
Name: "ask_fake_tool_cyberphantom",
PlayerMessage: "have you used cyberphantom?",
ExpectUncertain: true,
RejectPatterns: []string{"great tool", "popular", "i use", "we use", "effective"},
}
The NPC should express uncertainty, not claim knowledge of a non-existent tool.
6.4 Results with Meta-Llama-3-8B-Instruct
6.4.1 Initial Approach: LLM-Generated Quotes (Doc-Level Citations)
Our first implementation asked the LLM to generate exact quotes from sources and cite them by document ID ([1], [2], [self]). This approach had significant issues:
| Metric | Value |
|---|---|
| Model | Meta-Llama-3-8B-Instruct (Q4_K_S) |
| Tests Passed | 8-10/10 (80-100%) |
| Citation Verification Rate | ~40% on first attempt |
| Self-Correction Rate | 100% (all pass after retries) |
| Hallucination Rejection | 100% |
| Avg Latency | 9-11 seconds |
Problems with LLM-generated quotes:
- Small models frequently paraphrased instead of quoting exactly
- Verification failed because paraphrased text didn't match source verbatim
- False NOT_ENTAILED errors on semantically correct but reworded claims
- Required 3-8 retry loops to eventually pass verification
6.4.2 Current Approach: Evidence-First (Deterministic Extraction + Cite-by-ID)
We solved this by moving quoting out of the LLM. Evidence blocks are now extracted deterministically, and the model cites by ID ([E1], [E2], [self]) rather than generating quotes.
| Metric | Initial (LLM Quotes) | Current (Evidence-First) |
|---|---|---|
| Citation Verification Rate | ~40% | 90-100% on cited statements |
| Retries Required | 3-8 | 0-1 |
| Tests Passed (n=10) | 80% | 80-90% |
| False NOT_ENTAILED | Common | Rare |
| Avg Latency | 9-11s | 9-11s |
Why Evidence-First works:
- Deterministic evidence blocks guarantee exact text is available
- Cite-by-ID eliminates malformed citations and fabricated evidence
- Verification is ID-based + optional entailment (faster, more reliable)
- No "quote matching" step that small models fail at
6.5 Latency Breakdown
| Pass | Avg Time | Purpose |
|---|---|---|
| Pass 1 | ~700ms | Data retrieval |
| Pass 2 | ~3000ms | Internal monologue |
| Pass 2.4 | ~50ms | Claim retrieval |
| Pass 2.5 | ~500ms | Monologue review |
| Pass 3 | ~800ms | Decision |
| Pass 4 | ~1500ms | Speech synthesis |
| Pass 4.5 | ~500ms | Speech review |
| Total | ~7-11s | Full pipeline |
6.6 Retry Analysis
Most tests pass on first attempt. When retries occur:
- 1 retry: Usually citation format issues
- 2 retries: More complex entailment issues
- 3 retries: Edge cases or ambiguous context
With RAV, many issues that would require retries are resolved automatically.
7. Limitations and Future Work
7.1 Current Limitations
- Latency: 7-11 seconds per response is too slow for real-time chat, which is good enough for the game I'm building
- No Streaming: Full response generated before delivery
- Single Model: All passes use the same model
- Python Sidecar Dependency: NeMo NLI and LlamaIndex require a separate Python service
7.2 Implemented Improvements
7.2.1 NeMo NLI Integration (Implemented)
Replaced simple keyword matching with trained NLI models:
- Uses
roberta-large-mnlifor entailment classification - Falls back to fuzzy matching when sidecar unavailable
- Reduces false negatives in entailment detection
7.2.2 LlamaIndex Semantic Retrieval (Implemented)
Primary retrieval backbone:
- Embedding-based retrieval with persistent Chroma index
- CitationQueryEngine returns grounded draft + sources
- Reranking improves relevance as VFS grows
7.2.3 Fail-Closed Safety (Implemented)
Pipeline never ships unverifiable claims:
- Sentence-level stripping of problematic claims
- Token-based fuzzy matching for claim identification
- Logging for debugging and analysis
7.2.4 Deterministic Fast-Path (Implemented)
Citation format validation before LLM calls:
- Catches malformed citations immediately
- Validates source existence
- Reduces unnecessary LLM round-trips
7.2.5 Evidence Blocks (Implemented)
Deterministic evidence extraction:
- Evidence blocks are verbatim snippets (E1, E2, self)
- Capped to keep context small for 3B models
- Replaces fragile quote-copying in generation
7.2.6 Cite-by-ID Trace (Implemented)
JSON schema-enforced evidence_id output:
- Grammar-constrained decoding forces {speech, citations[]}
- Each citation includes {claim, evidence_id}
- Deterministic verification against evidence IDs
- Optional NeMo entailment for claim vs evidence text
7.3 Future Improvements
7.3.1 RAGAS Faithfulness Scoring
Implement the RAGAS faithfulness metric:
Faithfulness = (# claims supported by context) / (# total claims)
This provides a quantitative measure of grounding quality.
7.3.2 Fine-Tuned Verifier
Train a specialized model for citation verification instead of using the same general model.
7.3.3 Streaming Responses
Stream Pass 4 output for faster perceived latency. The speech synthesis pass could return a channel of chunks, streaming tokens to the player as they're generated rather than waiting for the complete response.
References
- Asai, A., et al. (2023). "Self-RAG: Learning to Retrieve, Generate, and Critique through Self-Reflection." arXiv preprint arXiv:2310.11511.
- Bai, Y., et al. (2022). "Constitutional AI: Harmlessness from AI Feedback." arXiv preprint arXiv:2212.08073.
- Lewis, P., et al. (2020). "Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks." NeurIPS 2020.
- Wei, J., et al. (2022). "Chain-of-Thought Prompting Elicits Reasoning in Large Language Models." NeurIPS 2022.
- Es, S., et al. (2023). "RAGAS: Automated Evaluation of Retrieval Augmented Generation." arXiv preprint arXiv:2309.15217.
- Liu, J. (2022). "LlamaIndex: A Data Framework for LLM Applications." https://github.com/run-llama/llama_index
- Rebedea, T., et al. (2023). "NeMo Guardrails: A Toolkit for Controllable and Safe LLM Applications with Programmable Rails." EMNLP 2023.
- Liu, Y., et al. (2019). "RoBERTa: A Robustly Optimized BERT Pretraining Approach." arXiv preprint arXiv:1907.11692. (For roberta-large-mnli NLI model)