Case Study: Coherence Diagnostic Engine

The Problem

Organizations project a narrative from the center — press releases, job postings, earnings calls. The edge — customers, employees, media, social — tells a different story.

The distance between these two stories is where coherence breaks down. Most companies can't measure this gap because nothing in their stack is designed to surface it. Surveys measure satisfaction. Sentiment tools measure mood. Nothing measures structural alignment between what an organization claims and what actually shows up in the world.

The Coherence Triangle

Three dimensions, weighted by structural importance:

Truth

55% weight

Does the center narrative match edge reality? Measured by comparing claims from official sources against observations from the outside world.

Authority

45% weight

Does the entity's voice carry weight and consistency? Signal strength, message discipline, and whether the center speaks with one voice.

Continuity

Phase 2

Temporal consistency across collection periods. Not yet implemented — requires multiple data windows to measure drift over time.

Overall coherence = Truth × 0.55 + Authority × 0.45. Continuity will factor in once multi-period collection is live.

The Pipeline

Four stages. Each produces structured, schema-validated outputs. The full pipeline runs on a single DGX Spark with no external API calls.

Collect

Gather center sources (press releases, job postings, earnings transcripts) and edge sources (customer reviews, social media, news coverage, employee reviews). All publicly available data.
Extract

Agent-based extraction via local LLMs. Each source document yields structured claims and observations with type annotations and schema validation. A single run produces over 1,000 claims and 4,000+ typed observations.
Score

Agents score findings on Truth and Authority. An adversarial Skeptic challenges every finding — only sustained findings enter the evidence ledger. Stable sustain rate: 73–75%.
Synthesize

Produces a scored case file: findings, failure mode codes, field notes, and an overall coherence narrative. The case file is the deliverable — structured evidence for diagnostic conversations.

The Adversarial Skeptic

The scoring step includes a built-in challenge mechanism. Every finding is produced, then attacked by a Skeptic agent that argues against it. Only findings that survive the challenge are sustained and enter the record.

This isn't a gimmick — it's structural quality control. If the sustain rate drops below 60%, the problem is in extraction quality, not the Skeptic. The Skeptic is a load-bearing constraint that prevents the pipeline from producing findings it can't defend.

The Infrastructure

Everything runs on local hardware. No data leaves the network. No API calls to cloud inference providers.

DGX Spark

NVIDIA Grace Blackwell GB10. 128GB unified memory. All inference runs here via Ollama. Primary compute for extraction, scoring, and synthesis.

qwen3:32b

The proven stable model. 32.8B parameters, Q4_K_M quantization. Batch size of 1. Validated across 13 pipeline runs.

ChromaDB

Vector database indexing canon documents and pipeline memory. Embedded with nomic-embed-text.

Synology NAS

Filesystem-first truth store. Source data, schemas, and markdown canon live here. 10 GbE backbone to all nodes.

Structural Constraints

Not all limitations can be solved by model tuning. Some are structural — the pipeline telling you what evidence it doesn't have:

Authority capped at 0.70 without employee review data in the edge sources
Truth capped when fewer than 2 distinct edge source types are present
Overall confidence capped at 0.50 for single-period data (no Continuity signal)
Schema validation failures are prompt-driven, not model-driven — if extraction fails, check the prompts

These aren't bugs. They're binding constraints that enforce honesty about what the data can and cannot support.

What It Doesn't Do

It doesn't decide what findings mean. It doesn't recommend actions. It doesn't access proprietary or private data — all sources are publicly available.

AR-001 isn't bolted onto this system. It's baked into the pipeline architecture. The Skeptic doesn't just challenge findings for quality — it enforces the principle that the system produces evidence, not conclusions.

Why This Matters

The default assumption is that AI should optimize decisions and automate judgment. Coherence takes the opposite position: the diagnostic should surface what's there, challenge its own findings, and hand structured evidence to a human who decides what it means.

This is what Decision & Responsibility Infrastructure™ looks like when it's built on actual machines instead of slides.

Coherence Diagnostic Engine