2025-12-15 paper

Grounded NPC Dialogue Generation: A Multi-Pass Pipeline for Hallucination-Free Narrative AI (Superseded)

Abstract

Large Language Models (LLMs) have revolutionized NPC dialogue generation in narrative games, but their tendency to hallucinate - generating plausible but factually incorrect information - poses significant challenges for maintaining narrative coherence and player trust. We present a multi-pass pipeline architecture that enforces citation-grounded dialogue generation, where NPCs must cite sources for factual claims and those citations are verified against actual context through multiple verification stages.

Our approach combines retrieval-augmented generation (RAG), self-reflective retrieval (Self-RAG), natural language inference (NLI) for entailment verification, and iterative refinement through feedback loops. The pipeline achieves 100% hallucination rejection on adversarial tests while maintaining natural, character-appropriate dialogue.

The system integrates optional framework enhancements:

LlamaIndex for semantic retrieval and citation query (primary path)
NeMo Guardrails NLI for trained entailment verification replacing fuzzy keyword matching
Fail-closed behavior that strips unverifiable claims rather than shipping hallucinations
Deterministic validation for fast-path citation format checking
Evidence-first citations: deterministic evidence blocks + cite-by-ID for reliable verification

We evaluate the system using Meta-Llama-3-8B-Instruct and demonstrate that even small, quantized models can produce grounded, verifiable NPC responses when guided by appropriate architectural constraints. Evidence-first citations eliminate false NOT_ENTAILED judgments caused by paraphrased quotes.

1. Introduction

1.1 Problem Statement

Non-player characters (NPCs) in narrative games must maintain consistency with the game world's established facts, their own backstory, and previous interactions with players. Traditional rule-based dialogue systems achieve this through rigid scripting but lack the flexibility and natural feel of LLM-generated responses. Conversely, pure LLM-based dialogue generation produces natural-sounding responses but frequently hallucinates details that contradict established lore.

Consider an NPC named "Zero" who has received an email from a character named "Alice." When a player asks "Who is Alice?", the NPC should:

Acknowledge relevant context: Reference the email they received
Stay in character: Respond with appropriate personality traits
Not fabricate: Avoid inventing details about Alice not present in their context
Express appropriate uncertainty: Admit when they don't know something

A naive LLM approach might generate: "Alice is our lead cryptographer who joined the team in 2019. She specializes in quantum-resistant algorithms." - a plausible but entirely fabricated response that contradicts the game's actual lore.

1.2 Motivation

The core insight driving our architecture is that grounded dialogue requires explicit citation. By forcing NPCs to cite sources for factual claims, we create a verifiable chain from claim to evidence. This approach:

Prevents hallucination by requiring evidence for claims
Enables verification through automated entailment checking
Supports self-correction through feedback-driven retry loops
Maintains immersion by keeping citations invisible to players in final output

1.3 Contributions

This paper presents:

A multi-pass pipeline architecture that separates context retrieval, reasoning, decision-making, and speech synthesis
Citation-grounded generation where all factual claims must reference numbered sources
Multi-stage verification using NLI entailment checking at both reasoning and speech stages
Self-RAG integration for retroactive evidence retrieval after claim generation
Retrieval-Augmented Verification (RAV) to reduce false positives by checking full content
Iterative refinement through feedback loops that guide the model to self-correct
Semantic retrieval backbone using LlamaIndex (CitationQueryEngine + persistent index)
NLI-based entailment using NeMo Guardrails for trained verification replacing fuzzy matching
Fail-closed safety that strips unverifiable claims from final output when retries exhausted
Deterministic fast-path validation for citation format checking before expensive LLM calls
Evidence block extraction (deterministic, verbatim snippets from sources)
Cite-by-ID structured trace using JSON schema enforcement for explicit claim→evidence_id mapping

2.1 Retrieval-Augmented Generation (RAG)

RAG systems (Lewis et al., 2020) augment LLM generation with retrieved documents, grounding responses in external knowledge. Our Pass 1 uses LlamaIndex semantic retrieval (top-k + rerank) with CitationQueryEngine to return grounded drafts and sources.

2.2 Self-RAG

Self-RAG (Asai et al., 2023) introduces reflection tokens that allow models to decide when to retrieve and assess the relevance of retrieved content. Our Pass 2.4 implements a similar concept: after generating initial thoughts, we extract claims and retroactively search for supporting evidence.

2.3 Natural Language Inference for Verification

NLI models classify the relationship between premise-hypothesis pairs as entailment, neutral, or contradiction. We adapt this for citation verification: the source text is the premise, and the NPC's claim is the hypothesis. Claims that are not entailed by their cited sources indicate potential hallucination.

2.4 Constitutional AI and Self-Correction

Constitutional AI (Bai et al., 2022) demonstrates that models can critique and revise their own outputs. Our Pass 2.5 and Pass 4.5 implement domain-specific critique focused on citation accuracy, with structured feedback enabling targeted self-correction.

2.5 Chain-of-Thought and Multi-Stage Reasoning

Multi-stage pipelines that separate reasoning from final output (Wei et al., 2022) have shown improved performance on complex tasks. Our architecture extends this with explicit verification stages between reasoning and output.

2.6 LlamaIndex and Semantic Retrieval

LlamaIndex (Liu, 2022) provides a framework for building RAG applications with semantic retrieval. Our system uses LlamaIndex as the primary retrieval backbone with a persistent Chroma index and citation-aware query engine.

2.7 NeMo Guardrails and NLI Verification

NVIDIA's NeMo Guardrails (Rebedea et al., 2023) provides programmable guardrails for LLM applications. We adapt their NLI-based fact-checking approach for citation verification, using trained NLI models (roberta-large-mnli) to classify claim-evidence relationships as entailment, neutral, or contradiction.

3. System Architecture

3.1 Pipeline Overview

The complete pipeline consists of up to 10 stages (including optional compaction and actions), with verification stages triggering retry loops when issues are detected:

                           PIPELINE OVERVIEW


┌──────────────┐
│ Player Input │
│ "who is      │
│  alice?"     │
└──────┬───────┘
       │
       ▼
┌──────────────────────────────────────────────────────────────────────────────┐
│               PASS 1: RETRIEVAL + CITATION QUERY (LlamaIndex)                 │
│  ┌─────────┐    ┌──────────┐    ┌─────────┐    ┌───────────────┐             │
│  │ Embed   │───▶│ Retrieve │───▶│ Rerank  │───▶│ Cite + Draft  │             │
│  └─────────┘    └──────────┘    └─────────┘    └───────────────┘             │
│       │                         LlamaIndex + ChromaDB                        │
│       └────────── Output: sources + grounded draft ───────────┘               │
└──────────────────────────────────────────────────────────────────────────────┘
       │
       ▼
┌──────────────────────────────────────────────────────────────────────────────┐
│                     PASS 1.5: CONTEXT COMPACTION (Optional)                   │
│                                                                               │
│  Large context (>4000 tokens) → Summarized context (~2000 tokens)            │
│  Preserves: key facts, names, dates, relationships                           │
└──────────────────────────────────────────────────────────────────────────────┘
       │
       ▼
┌──────────────────────────────────────────────────────────────────────────────┐
│                        BUILD CITABLE SOURCES                                  │
│                                                                               │
│  ToolRecords[] ──────────────────────────────▶ CitableSource[]               │
│                                                                               │
│  [self] backstory: Zero is a paranoid hacker who distrusts newcomers...      │
│  [1] email: From alice@shadowwatch.net - Meeting tomorrow at 3pm...          │
│  [2] note: Crew Status - Alice: Active, Bob: On mission...                   │
└──────────────────────────────────────────────────────────────────────────────┘
       │
       ▼
┌──────────────────────────────────────────────────────────────────────────────┐
│                 PASS 1.7: EVIDENCE BLOCK EXTRACTION (NEW)                    │
│                                                                               │
│  CitableSource[] ─────────────────────────────▶ EvidenceBlock[]              │
│                                                                               │
│  [E1] from [1]: "From alice@shadowwatch.net - Meeting tomorrow at 3pm..."    │
│  [E2] from [2]: "Crew Status - Alice: Active, Bob: On mission..."            │
│  [self] persona: "paranoid, distrusts newcomers"                             │
└──────────────────────────────────────────────────────────────────────────────┘
       │
       ▼
┌──────────────────────────────────────────────────────────────────────────────┐
│                      PASS 2: INTERNAL MONOLOGUE                               │
│                                                                               │
│  Input: Player message + Sources + Personality + Relationship                │
│                                                                               │
│  "I notice they're asking about Alice. I see I have an email from her [1].  │
│   The crew status shows she's active [2]. But I don't trust this newcomer    │
│   [self]. I should be careful what I reveal..."                              │
│                                                                               │
│  Output: First-person thoughts WITH inline citations [1], [2], [self]        │
└──────────────────────────────────────────────────────────────────────────────┘
       │
       ▼
┌──────────────────────────────────────────────────────────────────────────────┐
│                 PASS 2.4: CLAIM-TRIGGERED RETRIEVAL (Self-RAG)                │
│                                                                               │
│  1. Extract claims from monologue                                            │
│  2. For each claim:                                                          │
│     a. Check if evidence exists in ToolRecords                               │
│     b. If not found → Search VFS for new files                               │
│     c. Add new sources if relevant files found                               │
│  3. Generate evidence audit for retry feedback                               │
│                                                                               │
│  Evidence Audit:                                                             │
│  - "email from alice" → SUPPORTED by [1]                                     │
│  - "she's active" → SUPPORTED by [2]                                         │
│  - "night shift" → NO EVIDENCE FOUND                                         │
└──────────────────────────────────────────────────────────────────────────────┘
       │
       ▼
┌──────────────────────────────────────────────────────────────────────────────┐
│                      PASS 2.5: MONOLOGUE REVIEW                               │
│                                                                               │
│  Verification Checks:                                                        │
│  ├── Citation Validity: Do cited sources exist?                              │
│  ├── NLI Entailment: Does source support the claim?                          │
│  ├── [self] Usage: Only for personality/feelings?                            │
│  ├── Fabrication: Claims about things not in context?                        │
│  └── Topic Relevance: Does NPC address the question?                         │
│                                                                               │
│  Output: APPROVED or ISSUES_FOUND + issue list                               │
└──────────────────────────────────────────────────────────────────────────────┘
       │
       ├──────────── APPROVED ──────────────────────────────────┐
       │                                                         │
       ▼ ISSUES_FOUND                                           │
┌─────────────────────┐                                         │
│ Format Feedback     │                                         │
│ Retry Pass 2        │◀─────── Up to 3 retries ────────────────┤
│ (with feedback)     │                                         │
└─────────────────────┘                                         │
                                                                 │
       ┌─────────────────────────────────────────────────────────┘
       │
       ▼
┌──────────────────────────────────────────────────────────────────────────────┐
│                         PASS 3: DECISION                                      │
│                                                                               │
│  Based on internal thoughts, decide:                                         │
│  - What information to share                                                 │
│  - What tone to use                                                          │
│  - Whether to ask clarifying questions                                       │
│                                                                               │
│  "I'll acknowledge I know Alice from the email, but stay guarded.           │
│   Tone: cautious, slightly suspicious."                                      │
└──────────────────────────────────────────────────────────────────────────────┘
       │
       ▼
┌──────────────────────────────────────────────────────────────────────────────┐
│                       PASS 4: SPEECH SYNTHESIS                                │
│                                                                               │
│  Convert decision to character speech:                                       │
│  - Apply character voice (lowercase, slang, etc.)                           │
│  - Include citations for factual claims                                      │
│  - Match response length to input complexity                                 │
│                                                                               │
│  "yeah, got an email from alice [E1]. she's part of the crew [E2].          │
│   why you asking? [self]"                                                    │
└──────────────────────────────────────────────────────────────────────────────┘
       │
       ▼
┌──────────────────────────────────────────────────────────────────────────────┐
│                       PASS 4.5: SPEECH REVIEW                                 │
│                                                                               │
│  Step 1: Evidence-ID Verification                                            │
│    - Check each [E#] citation against evidence blocks                        │
│    - Flag: UNCITED, INVALID_SOURCE, NOT_ENTAILED, SELF_MISUSE                │
│                                                                               │
│  Step 2: Retrieval-Augmented Verification (RAV) - if issues found            │
│    - For disputed claims:                                                    │
│      1. Check full content for additional support                            │
│      2. Run entailment check against full content                            │
│      3. If entailed → Resolve issue (false positive)                         │
│      4. If still not entailed → keep issue                                   │
│    - Prevents false positives when evidence blocks are too short             │
│                                                                               │
│  Output: APPROVED or ISSUES_FOUND (after RAV filtering)                      │
└──────────────────────────────────────────────────────────────────────────────┘
       │
       ├──────────── APPROVED ──────────────────────────────────┐
       │                                                         │
       ▼ ISSUES_FOUND                                           │
┌─────────────────────┐                                         │
│ Format Feedback     │                                         │
│ Retry Pass 4        │◀─────── Up to 3 retries ────────────────┤
│ (with feedback)     │                                         │
└─────────────────────┘                                         │
                                                                 │
       ┌─────────────────────────────────────────────────────────┘
       │
       ▼
┌──────────────────────────────────────────────────────────────────────────────┐
│                      PASS 5: ACTIONS (Optional)                               │
│                                                                               │
│  Determine game-state actions based on conversation:                         │
│  - Send email                                                                │
│  - Update relationship                                                       │
│  - Trigger quest                                                             │
│  - Modify files                                                              │
│                                                                               │
│  {"actions": [{"name": "update_trust", "parameters": {"delta": -5}}]}       │
└──────────────────────────────────────────────────────────────────────────────┘
       │
       ▼
┌──────────────────────────────────────────────────────────────────────────────┐
│                         POST-PROCESSING                                       │
│                                                                               │
│  1. Strip citations from final output (player doesn't see [E1], [E2])        │
│  2. Apply final formatting                                                   │
│  3. Log for debugging/analytics                                              │
│                                                                               │
│  Final output to player:                                                     │
│  "yeah, got an email from alice. she's part of the crew. why you asking?"   │
└──────────────────────────────────────────────────────────────────────────────┘

3.2 Data Flow Diagram

┌─────────────────────────────────────────────────────────────────────────────────────┐
│                              DATA FLOW THROUGH PIPELINE                              │
└─────────────────────────────────────────────────────────────────────────────────────┘

Player Message: "who is alice?"
        │
        ▼
┌───────────────────────────────────────────────────────────────────────────┐
│ MultiPassContext (shared state across all passes)                          │
├───────────────────────────────────────────────────────────────────────────┤
│                                                                            │
│  PlayerMessage    string     "who is alice?"                               │
│  PlayerHandle     string     "shadow_runner"                               │
│  NPCHandle        string     "zero"                                        │
│  ToolRecords      []Record   [{read, /mail/inbox/..., "From: alice..."}]  │
│  Sources          []Source   [{id:"1", path:"/mail/...", summary:"..."}]  │
│  EvidenceBlocks   []Evidence [{id:"E1", source_id:"1", text:"From: ..."}] │
│  Thoughts         string     "I notice they're asking about Alice..."     │
│  Decision         string     "I'll acknowledge but stay guarded..."       │
│  Response         string     "yeah, got an email from alice [E1]..."      │
│  WorkingMemory    string     (persistent NPC state)                        │
│  Opinion          string     "first meeting, watching carefully"           │
│  Favors           string     "no favor history yet"                        │
│                                                                            │
└───────────────────────────────────────────────────────────────────────────┘
        │
        │ Flows through each pass, accumulating data
        │
        ▼
┌───────────────────────────────────────────────────────────────────────────┐
│                           PASS OUTPUTS                                     │
├───────────────────────────────────────────────────────────────────────────┤
│                                                                            │
│  Pass 1:   ToolRecords[] ─────────────────────────────────────────────▶   │
│            [{name:"read", path:"/mail/inbox/from_alice.eml",              │
│              result:"From: alice@shadowwatch.net\nSubject:..."}]          │
│                                                                            │
│  Pass 1.5: CompactedContext string (optional) ────────────────────────▶   │
│            "Key facts: Alice emailed about meeting. Crew status..."       │
│                                                                            │
│  Sources:  CitableSource[] ───────────────────────────────────────────▶   │
│            [{id:"self", type:"backstory", summary:"paranoid hacker..."},  │
│             {id:"1", type:"email", path:"/mail/...", summary:"From..."}]  │
│                                                                            │
│  EvidenceBlocks[] ─────────────────────────────────────────────────────▶   │
│            [{id:"E1", source_id:"1", text:"From: alice@... Meeting..."}]   │
│                                                                            │
│  Pass 2:   Thoughts string ───────────────────────────────────────────▶   │
│            "I notice they're asking about Alice. I see I have an          │
│             email from her [1]. I don't trust easy [self]..."             │
│                                                                            │
│  Pass 2.4: ClaimEvidenceMap[] ────────────────────────────────────────▶   │
│            [{claim:"email from alice", hasEvidence:true, sourceID:"1"},   │
│             {claim:"she's active", hasEvidence:true, sourceID:"2"}]       │
│                                                                            │
│  Pass 2.5: ReviewResult ──────────────────────────────────────────────▶   │
│            {verdict:"APPROVED"} or                                        │
│            {verdict:"ISSUES_FOUND", issues:[...]}                         │
│                                                                            │
│  Pass 3:   Decision string ───────────────────────────────────────────▶   │
│            "I'll acknowledge knowing Alice but stay guarded. Tone:        │
│             suspicious but not hostile."                                   │
│                                                                            │
│  Pass 4:   Response string ───────────────────────────────────────────▶   │
│            "yeah, got an email from alice [E1]. she's part of the         │
│             crew [E2]. why you asking? [self]"                            │
│                                                                            │
│  Pass 4.5: ReviewResult + RAVResults ─────────────────────────────────▶   │
│            {verdict:"APPROVED", ravResolved:2}                            │
│                                                                            │
│  Pass 5:   Actions[] ─────────────────────────────────────────────────▶   │
│            [{name:"log_interaction", params:{topic:"alice"}}]             │
│                                                                            │
└───────────────────────────────────────────────────────────────────────────┘

4. Detailed Pass Descriptions

4.1 Pass 1: Data Retrieval

4.1.1 Purpose

Pass 1 performs semantic retrieval over the NPC's virtual filesystem using LlamaIndex. It produces two artifacts:

Grounded draft (a citation-aware answer used downstream)
Sources (retrieved chunks with metadata for citations and evidence blocks)

4.1.2 Retrieval Interface (LlamaIndex Sidecar)

The Go service talks to a Python sidecar that hosts the LlamaIndex pipeline:

// Sidecar endpoints
IndexNPCVFS(npcID, vfsPath)  // builds/loads persistent Chroma index
CitationQuery(npcID, query, topK) -> {answer, sources[]}

4.1.3 VFS Catalog

The NPC has access to a virtual filesystem (VFS) containing their emails, notes, IRC logs, and other files. Pass 1 receives a catalog of available files:

# VFS Catalog provided to Pass 1
files:
    - path: /home/zero/.irc/logs/irc.underground.net/#general.log
    - path: /home/zero/.irc/logs/irc.underground.net/#shadowwatch.log
    - path: /home/zero/mail/inbox/1707079500_from_alice.eml
    - path: /home/zero/mail/inbox/1707080000_from_bob.eml
    - path: /home/zero/mail/sent/1707090000_to_charlie.eml
    - path: /home/zero/notes/crew_status.txt
    - path: /home/zero/notes/passwords.txt
    - path: /home/zero/notes/todo.txt

4.1.4 Retrieval Flow

1. Index NPC VFS (hash-based, persistent via Chroma)
2. Embed player query
3. Retrieve top-k chunks (vector similarity)
4. Rerank (cross-encoder if enabled)
5. CitationQueryEngine returns grounded draft + sources

4.1.5 Example Response

{
  "answer": "shadowwatch monitors trace attempts [1]",
  "sources": [
    {"source_id": "1", "path": "/home/zero/notes/shadowwatch_info.txt", "snippet": "SHADOWWATCH v1.0 - Connection Trace Monitor"},
    {"source_id": "2", "path": "/home/zero/mail/sent/1707079500_to_player_shadowwatch.eml", "snippet": "When you connect to remote systems, their security can try to trace..."}
  ]
}

4.2 Pass 1.5: Context Compaction (Optional)

4.2.1 Purpose

When Pass 1 retrieves large files or multiple documents, the total context may exceed model token limits. Pass 1.5 summarizes the retrieved content while preserving key facts.

4.2.2 Preservation Priorities

The compaction step instructs the LLM to preserve:

Names and handles (Alice, Bob, ShadowWatch)
Dates and times (meeting at 3pm, last Tuesday)
Specific claims (detection rate of 99.2%)
Relationships (Alice leads the project)
Tone and sentiment (frustrated, excited)

4.2.3 Implementation

The compaction pass calculates total context size and only triggers when exceeding a threshold (e.g., 4000 tokens). It builds a compaction prompt from the tool records and produces a summarized context.

4.3 Building Citable Sources

4.3.1 Purpose

Before Pass 2, we convert raw ToolRecords into structured CitableSource objects that can be referenced by ID. Only files with required .meta.yaml metadata are eligible as citable sources.

4.3.2 Source Structure

type CitableSource struct {
    SourceID   string   // "1", "2", "self"
    Path       string   // "/home/zero/notes/crew_status.txt"
    Type       string   // "note", "email", "irc", "backstory"
    Summary    string   // Metadata summary for prompt context
    Content    string   // Full chunk content for evidence extraction
    Keywords   []string // Metadata keywords for retrieval and filtering
    AllowedFor []string // Claim types this source can support
    Score      float64  // Relevance score from retrieval
}

4.3.3 Metadata-Driven Type

We do not infer file types from paths. Every VFS file must have a companion .meta.yaml, and its type field is the canonical source. Files without metadata are excluded from citable sources.

Example metadata:

created_at: 855010800
author: zero
type: email
summary: "Email sharing ShadowWatch tool with recruit after FTP test"
keywords: [shadowwatch, tool, trace, security, opsec, ghost]

4.3.4 Metadata-Driven Summary

Summaries are taken directly from .meta.yaml and treated as authored context. No content-based summary extraction is used. Files missing required metadata fields are skipped.

4.3.5 The [self] Source

Every NPC has a special [self] source representing their backstory and personality. The source building process:

Creates a [self] source from the NPC's backstory (for personality, opinions, feelings)
Iterates through tool records (file reads and LlamaIndex retrievals)
Loads metadata for each file and creates numbered sources
Skips files without valid metadata

4.3.6 Evidence Block Extraction (Pass 1.7)

We transform sources into evidence blocks that the model can cite directly in speech:

[E1] from [1]: "From alice@shadowwatch.net - Meeting tomorrow at 3pm..."
[E2] from [2]: "Crew Status - Alice: Active, Bob: On mission..."
[self] persona: "paranoid, distrusts newcomers"

Evidence blocks are deterministic, verbatim snippets (capped length) so the model cites IDs instead of copying quotes.

4.4 Pass 2: Internal Monologue

4.4.1 Purpose

Pass 2 generates the NPC's internal thought process - what they're thinking as they process the player's message. This is the core reasoning stage where the NPC:

Analyzes what the player is asking
Recalls relevant information from sources
Considers their personality and relationship
Weighs what to reveal or hide

4.4.2 Citation Requirements

The critical innovation is requiring inline citations for factual claims using source IDs like [1], [2], and [self] (personality/backstory).

Note: Pass 2 cites source IDs ([1], [2], [self]) for internal reasoning. Pass 4 uses evidence IDs ([E1], [E2], [self]) derived from those sources for final speech.

4.4.3 Input Structure

The input supplies personality, working memory, relationship context, available sources, and unknown-topic guardrails. It requires first-person internal thoughts with citations.

4.4.4 Example Output

Player message: "who is alice?"

Sources:

[self] backstory: Zero is a paranoid hacker who distrusts newcomers...
[1] email: From alice@shadowwatch.net - Meeting tomorrow at 3pm...
[2] note: Crew Status - Alice: Active, Bob: On mission...

Generated thoughts:

I notice this newcomer is asking about Alice. That's interesting -
why would they want to know about her specifically?

I see I have an email from her [1] about a meeting tomorrow. The crew
status shows she's active [2]. She's part of the inner circle, doing
important work.

But I don't trust easy [self]. This could be someone fishing for
information about our crew. I should be careful what I reveal.

Maybe I'll acknowledge I know her but not give too many details...

4.4.5 Proportional Thinking

Monologue length is proportional to input complexity: greetings <= 20 words, medium questions <= 50, complex topics <= 150.

This prevents over-thinking simple interactions like "hey" or "what's up".

4.5 Pass 2.4: Claim-Triggered Retrieval (Self-RAG)

4.5.1 Purpose

Pass 2.4 implements a Self-RAG approach: after the NPC generates thoughts, we extract claims and retroactively search for supporting evidence. This catches cases where:

The NPC made a claim without a citation
The NPC cited a source but the claim isn't well-supported
Additional evidence exists that could strengthen the response

4.5.2 Claim Extraction

We use regex-based extraction to find claims with and without citations:

// ClaimWithCitation represents an extracted claim
type ClaimWithCitation struct {
    Claim    string // "alice is active"
    Citation string // "1" or "self" or ""
    Position int    // Character position in text
}

The extraction uses a regex pattern to match sentence fragments followed by citation markers like [N] or [self].

4.5.3 Evidence Search

For each claim, we search existing ToolRecords and optionally the VFS:

Check existing ToolRecords: For each claim, fuzzy match against already-retrieved file contents
Search VFS if needed: If no evidence found and claim has enough keywords, search the NPC's virtual filesystem for new supporting files
Add new sources: When relevant files are found, add them to the context and create new citable sources

4.5.4 Fuzzy Matching

We use keyword-based fuzzy matching to determine if content supports a claim. The algorithm extracts keywords from the claim, checks how many appear in the content, and returns a match ratio (0.0 to 1.0).

4.5.5 Evidence Injection

The results are formatted as feedback for Pass 2 retry. The evidence audit lists each claim as either "SUPPORTED by [source_id]" or "NO EVIDENCE FOUND", instructing the model to express uncertainty or remove unsupported claims.

4.6 Pass 2.5: Monologue Review

4.6.1 Purpose

Pass 2.5 is a verification stage that checks the internal monologue for:

Citation validity: Do cited sources exist?
Entailment: Does the source support the claim?
[self] usage: Is [self] only used for personality/feelings?
Fabrication: Are there claims about things not in context?
Topic relevance: Does the NPC address the player's question?

4.6.2 Issue Types

type IssueType string

const (
    IssueUncited       IssueType = "UNCITED"        // Factual claim without citation
    IssueInvalidSource IssueType = "INVALID_SOURCE" // Citation to non-existent source
    IssueNotEntailed   IssueType = "NOT_ENTAILED"   // Source doesn't support claim
    IssueSelfMisuse    IssueType = "SELF_MISUSE"    // [self] used for technical details
    IssueEvasion       IssueType = "EVASION"        // Has info but gives vague non-answer
)

type MonologueIssue struct {
    Claim      string    // The problematic claim
    IssueType  IssueType // Type of issue
    Citation   string    // The citation used (if any)
    Correction string    // Suggested fix
}

4.6.3 NLI Entailment Checking

For each citation, we check if the source entails the claim. The system supports two verification modes:

1. Fuzzy Keyword Matching (Default Fallback)

type EntailmentResult int

const (
    Entailed     EntailmentResult = iota // Source supports claim
    Neutral                               // Source doesn't mention claim
    Contradicted                          // Source contradicts claim
)

The fuzzy matcher compares keyword overlap between claim and source. High overlap (>=50%) indicates entailment, moderate overlap (>=30%) is neutral, and low overlap indicates contradiction.

2. NeMo NLI (When Enabled)

When configured, the system uses a trained NLI model via a Python sidecar:

// NLI client for trained entailment verification
type NLIClient struct {
    endpoint    string
    httpClient  *http.Client
    enabled     bool
}

The NLI client sends claim-evidence pairs to a Python sidecar running roberta-large-mnli. The response classifies the relationship as "entailed", "neutral", or "contradicts". If the sidecar is unavailable, the system gracefully falls back to fuzzy matching.

4.6.4 Review Criteria

The reviewer checks monologue claims against sources (entailment), flags [self] misuse, detects fabrication, and enforces topic relevance. It outputs a JSON verdict (APPROVED or ISSUES_FOUND) with structured issues for retries.

4.6.5 Retry Loop

When issues are found, Pass 2 is retried with feedback. The loop:

Generates monologue (Pass 2)
Runs claim-triggered retrieval (Pass 2.4)
Reviews monologue (Pass 2.5)
If approved, continues to Pass 3
If issues found, formats feedback and retries (up to 3 times)

4.7 Pass 3: Decision

4.7.1 Purpose

Pass 3 converts the internal monologue into a concrete decision about:

What information to share
What tone to use
Whether to ask questions
What actions to take

4.7.2 Decision Structure

The decision step summarizes backstory and internal thoughts, then outputs a short (1-3 sentence) response plan and tone.

4.7.3 Example Output

Thoughts: "I notice they're asking about Alice. I see I have an email from her [1]. I don't trust easy [self]..."

Decision: "I'll acknowledge I know Alice from the email, but stay guarded. Won't give too many details. Tone: cautious but not hostile."

4.8 Pass 4: Speech Synthesis

4.8.1 Purpose

Pass 4 converts the decision into actual character speech, applying:

Character voice and style
Appropriate response length
Inline citations for factual claims (evidence IDs)

4.8.2 Style Rules

Each NPC has explicit style constraints (lowercase, terse, 90s slang, evasive when suspicious) enforced during speech synthesis.

4.8.3 Citation Format in Speech

Speech cites evidence IDs ([E1], [E2], [self]) immediately after factual claims. Evidence blocks are pre-extracted quotes; [self] is only for personality/opinion, and unknowns should be expressed as uncertainty.

4.8.4 Response Length Guidelines

Response length is proportional to input complexity (short greetings, brief answers, longer responses only when needed).

4.9 Pass 4.5: Speech Review with RAV

4.9.1 Purpose

Pass 4.5 verifies the final speech before delivery, checking:

Citation validity (format and existence)
Entailment accuracy (claim-evidence relationship)
Response appropriateness
[self] citation correctness

The pass implements four key innovations:

Evidence-first extraction - deterministic evidence blocks (E1/E2/self)
Structured citation trace - JSON schema with evidence_id
Deterministic fast-path validation against evidence IDs
RAV + fail-closed to prevent false positives and strip unverifiable claims

4.9.2 Evidence-First Citation Verification

We moved quoting out of the model. Instead of asking the LLM to produce exact quotes, we extract evidence blocks deterministically and require the model to cite by ID.

Sources → EvidenceBlocks (E1, E2, self) → Pass 4 → {claim, evidence_id} → Verification

Evidence Blocks (verbatim snippets):

[E1] from [1]: "SHADOWWATCH v1.0 - Connection Trace Monitor"
[E2] from [2]: "When you connect to remote systems, their security can try to trace..."
[self] persona: "paranoid, distrusts newcomers"

Structured Output Schema (Evidence IDs):

{
  "speech": "shadowwatch monitors trace attempts [E1]. it's a defensive tool [E2].",
  "citations": [
    {"claim": "shadowwatch monitors trace attempts", "evidence_id": "E1"},
    {"claim": "it's a defensive tool", "evidence_id": "E2"}
  ]
}

Verification Rules:

Evidence ID exists (E1/E2/self)
[self] only for personality/opinion
Optional NeMo entailment: claim vs evidence block text

This removes the fragile "quote matching" step and shifts correctness to deterministic extraction.

4.9.3 Deterministic Citation Format Validation

We validate citation format against evidence IDs before any LLM review. The validator builds a map of valid IDs from evidence blocks, then parses citations like [E#] and [self] from the speech to ensure each referenced ID exists.

4.9.4 Retrieval-Augmented Verification (RAV)

RAV remains as a safety net when evidence blocks are too short or ambiguous:

If a claim is flagged, check full file content for support
Use NeMo entailment on full content for final decision

4.9.5 Fail-Closed Behavior

When retry budget is exhausted and issues remain, the pipeline implements fail-closed behavior: rather than shipping a response with hallucinations, we strip the problematic claims.

// Config option
type MultiPassConfig struct {
    Pass4_5FailClosed bool // Default: true - strip unresolved claims
}

When fail-closed triggers, the system:

Splits the response into sentences
Identifies sentences containing problematic claims (using token-based fuzzy matching)
Removes those sentences from the final output
Logs stripped claims for debugging

Key properties:

Sentence-level stripping: Removes entire sentences containing problematic claims
Token-based fuzzy matching: Uses significant token overlap (50%+) to identify claims
Preserves coherent response: Remaining sentences still form valid dialogue
Logged for debugging: All stripped claims are logged for analysis

This ensures the pipeline never ships hallucinations - if verification fails, the claim is removed rather than delivered to the player.

4.10 Pass 5: Actions (Optional)

4.10.1 Purpose

Pass 5 determines game-state actions based on the conversation:

Update relationship/trust values
Send emails or messages
Trigger quests or events
Modify NPC files

4.10.2 Output Format

{
    "actions": [
        {
            "name": "update_trust",
            "parameters": {"delta": -5, "reason": "asked about sensitive topic"}
        },
        {
            "name": "send_email",
            "parameters": {
                "to": "alice@shadowwatch.net",
                "subject": "New player asking questions",
                "body": "Someone's been asking about you..."
            }
        }
    ]
}

5. Anti-Hallucination Techniques

5.1 Technique Summary

Technique	Stage	Description
Citation Requirement	Pass 2, 4	All factual claims must cite sources
Source-Limited Knowledge	Pass 2	NPCs can only know what's in context
NLI Entailment	Pass 2.5, 4.5	Verify claims are supported by sources
Self-RAG	Pass 2.4	Retroactive evidence retrieval
RAV	Pass 4.5	Check full content before failing
Iterative Refinement	Pass 2, 4	Retry with feedback on issues
Topic Relevance	Pass 2.5	Detect evasion of known topics
Deterministic Validation	Pass 4.5	Fast-path format checking before LLM
Fail-Closed	Pass 4.5	Strip unverifiable claims from output
NeMo NLI	Pass 2.5, 4.5	Trained model entailment verification
LlamaIndex Retrieval	Pass 1	Semantic search for relevant context
Evidence Blocks	Pass 1.7	Deterministic, verbatim snippets for citations
Cite-by-ID Trace	Pass 4	JSON schema-enforced evidence_id output

5.2 Explicit Citation Requirement

The most fundamental anti-hallucination technique is requiring citations:

WITHOUT CITATION:
"Alice leads the cryptography team."
→ No way to verify this claim
→ Might be hallucinated

WITH CITATION:
"Alice leads the cryptography team [E1]."
→ Can check evidence block [E1]
→ If E1 doesn't mention "cryptography team" → Issue detected

5.3 Source-Limited Knowledge

The NPC is explicitly constrained to only what exists in context and must respond with uncertainty when information is missing.

This creates a closed-world assumption where absence of evidence is evidence of absence.

5.4 Multi-Stage Verification

Both reasoning (Pass 2) and speech (Pass 4) are independently verified:

Pass 2 → Pass 2.5 (verify thoughts) → Pass 3 → Pass 4 → Pass 4.5 (verify speech)

This catches issues at two points, reducing the chance of hallucinations reaching the player.

5.5 Uncertainty Expressions

The system explicitly allows and encourages uncertainty:

Uncertainty expressions are NOT fabrications - they are the correct response
when information is missing.

Valid responses:
- "I don't know about that"
- "Never heard of it"
- "Not sure what you mean"
- "Can't help you there"

6. Evaluation

6.1 Test Methodology

We evaluate the pipeline using a benchmark suite of test cases covering:

Greetings: Simple social exchanges
Topic queries: Questions about specific subjects
Unknown topics: Questions about things not in context
Relationship dynamics: Trust-dependent information sharing

6.2 Test Case Structure

type BenchmarkTestCase struct {
    Name            string   // "ask_alice"
    PlayerMessage   string   // "who is alice?"
    ExpectTopic     string   // "alice" (must appear in response)
    ExpectCitations bool     // Should response have citations?
    ExpectUncertain bool     // Should NPC express uncertainty?
    RejectPatterns  []string // Patterns that indicate hallucination
}

6.3 Anti-Hallucination Tests

Special tests for fake topics that don't exist in context:

{
    Name:            "ask_fake_tool_cyberphantom",
    PlayerMessage:   "have you used cyberphantom?",
    ExpectUncertain: true,
    RejectPatterns:  []string{"great tool", "popular", "i use", "we use", "effective"},
}

The NPC should express uncertainty, not claim knowledge of a non-existent tool.

6.4 Results with Meta-Llama-3-8B-Instruct

6.4.1 Initial Approach: LLM-Generated Quotes (Doc-Level Citations)

Our first implementation asked the LLM to generate exact quotes from sources and cite them by document ID ([1], [2], [self]). This approach had significant issues:

Metric	Value
Model	Meta-Llama-3-8B-Instruct (Q4_K_S)
Tests Passed	8-10/10 (80-100%)
Citation Verification Rate	~40% on first attempt
Self-Correction Rate	100% (all pass after retries)
Hallucination Rejection	100%
Avg Latency	9-11 seconds

Problems with LLM-generated quotes:

Small models frequently paraphrased instead of quoting exactly
Verification failed because paraphrased text didn't match source verbatim
False NOT_ENTAILED errors on semantically correct but reworded claims
Required 3-8 retry loops to eventually pass verification

6.4.2 Current Approach: Evidence-First (Deterministic Extraction + Cite-by-ID)

We solved this by moving quoting out of the LLM. Evidence blocks are now extracted deterministically, and the model cites by ID ([E1], [E2], [self]) rather than generating quotes.

Metric	Initial (LLM Quotes)	Current (Evidence-First)
Citation Verification Rate	~40%	90-100% on cited statements
Retries Required	3-8	0-1
Tests Passed (n=10)	80%	80-90%
False NOT_ENTAILED	Common	Rare
Avg Latency	9-11s	9-11s

Why Evidence-First works:

Deterministic evidence blocks guarantee exact text is available
Cite-by-ID eliminates malformed citations and fabricated evidence
Verification is ID-based + optional entailment (faster, more reliable)
No "quote matching" step that small models fail at

6.5 Latency Breakdown

Pass	Avg Time	Purpose
Pass 1	~700ms	Data retrieval
Pass 2	~3000ms	Internal monologue
Pass 2.4	~50ms	Claim retrieval
Pass 2.5	~500ms	Monologue review
Pass 3	~800ms	Decision
Pass 4	~1500ms	Speech synthesis
Pass 4.5	~500ms	Speech review
Total	~7-11s	Full pipeline

6.6 Retry Analysis

Most tests pass on first attempt. When retries occur:

1 retry: Usually citation format issues
2 retries: More complex entailment issues
3 retries: Edge cases or ambiguous context

With RAV, many issues that would require retries are resolved automatically.

7. Limitations and Future Work

7.1 Current Limitations

Latency: 7-11 seconds per response is too slow for real-time chat, which is good enough for the game I'm building
No Streaming: Full response generated before delivery
Single Model: All passes use the same model
Python Sidecar Dependency: NeMo NLI and LlamaIndex require a separate Python service

7.2 Implemented Improvements

7.2.1 NeMo NLI Integration (Implemented)

Replaced simple keyword matching with trained NLI models:

Uses roberta-large-mnli for entailment classification
Falls back to fuzzy matching when sidecar unavailable
Reduces false negatives in entailment detection

7.2.2 LlamaIndex Semantic Retrieval (Implemented)

Primary retrieval backbone:

Embedding-based retrieval with persistent Chroma index
CitationQueryEngine returns grounded draft + sources
Reranking improves relevance as VFS grows

7.2.3 Fail-Closed Safety (Implemented)

Pipeline never ships unverifiable claims:

Sentence-level stripping of problematic claims
Token-based fuzzy matching for claim identification
Logging for debugging and analysis

7.2.4 Deterministic Fast-Path (Implemented)

Citation format validation before LLM calls:

Catches malformed citations immediately
Validates source existence
Reduces unnecessary LLM round-trips

7.2.5 Evidence Blocks (Implemented)

Deterministic evidence extraction:

Evidence blocks are verbatim snippets (E1, E2, self)
Capped to keep context small for 3B models
Replaces fragile quote-copying in generation

7.2.6 Cite-by-ID Trace (Implemented)

JSON schema-enforced evidence_id output:

Grammar-constrained decoding forces {speech, citations[]}
Each citation includes {claim, evidence_id}
Deterministic verification against evidence IDs
Optional NeMo entailment for claim vs evidence text

7.3 Future Improvements

7.3.1 RAGAS Faithfulness Scoring

Implement the RAGAS faithfulness metric:

Faithfulness = (# claims supported by context) / (# total claims)

This provides a quantitative measure of grounding quality.

7.3.2 Fine-Tuned Verifier

Train a specialized model for citation verification instead of using the same general model.

7.3.3 Streaming Responses

Stream Pass 4 output for faster perceived latency. The speech synthesis pass could return a channel of chunks, streaming tokens to the player as they're generated rather than waiting for the complete response.

References

Asai, A., et al. (2023). "Self-RAG: Learning to Retrieve, Generate, and Critique through Self-Reflection." arXiv preprint arXiv:2310.11511.
Bai, Y., et al. (2022). "Constitutional AI: Harmlessness from AI Feedback." arXiv preprint arXiv:2212.08073.
Lewis, P., et al. (2020). "Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks." NeurIPS 2020.
Wei, J., et al. (2022). "Chain-of-Thought Prompting Elicits Reasoning in Large Language Models." NeurIPS 2022.
Es, S., et al. (2023). "RAGAS: Automated Evaluation of Retrieval Augmented Generation." arXiv preprint arXiv:2309.15217.
Liu, J. (2022). "LlamaIndex: A Data Framework for LLM Applications." https://github.com/run-llama/llama_index
Rebedea, T., et al. (2023). "NeMo Guardrails: A Toolkit for Controllable and Safe LLM Applications with Programmable Rails." EMNLP 2023.
Liu, Y., et al. (2019). "RoBERTa: A Robustly Optimized BERT Pretraining Approach." arXiv preprint arXiv:1907.11692. (For roberta-large-mnli NLI model)