Spaces:
Sleeping
DeepTrust Architecture
This document provides a detailed technical overview of the DeepTrust Research Agent architecture, covering the state machine design, data flow, and implementation patterns.
System Overview
DeepTrust is a research automation system that orchestrates an LLM through a multi-stage workflow. The system decomposes research questions into executable plans, validates them against policy, executes tool calls, and synthesizes results into reports.
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β FRONTEND (app/page.tsx) β
β βββββββββββββββ βββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β Model Card β β Chat + Context panel β β
β β (SSE load) β β β’ Messages (user / assistant with word-by-word) β β
β ββββββββ¬βββββββ β β’ Knowledge drop zone (files, URLs, notes) β β
β β β β’ Quick-action chips, preview prompts β β
β β β β’ Reasoning trace (node summaries) β β
β β ββββββββββββββββββββββββββββ¬βββββββββββββββββββββββββββ β
βββββββββββΌββββββββββββββββββββββββββββββββββββΌββββββββββββββββββββββββββββββ
β β
βΌ βΌ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β API LAYER β
β βββββββββββββββββββββββ βββββββββββββββββββββββββββββββββββββββ β
β β GET/POST β β POST /api/research β β
β β /api/model/load β β (SSE: event + data per research step β β
β β (SSE progress) β βββββββββββββββββββ¬ββββββββββββββββββββ β
β βββββββββββ¬ββββββββββββ β β
β β β β
β β βββββββββββββββββββββββββββββββββββββββ β
β β β POST /api/research/approve β β
β β β (resume after HITL approval) β β
β β βββββββββββββββββββββββββββββββββββββββ β
ββββββββββββββΌββββββββββββββββββββββββββββββββββΌββββββββββββββββββββββββββββββ
β β
βΌ βΌ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β AGENT CORE (lib/agent) β
β βββββββββββββββββββ βββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β LLM Client β β StateGraph β β
β β (worker thread) β β βββββββββββ βββββββββββ ββββββββββββββββ β β
β β loadModel() β β β Thinker ββββ Auditor ββββ Tool Executor β β β
β β chatComplete() ββββΌββββββββ²βββββ ββββββ¬βββββ ββββββββ¬ββββββββ β β
β βββββββββββββββββββ β β β β β β
β β ββββββββββββββ βΌ β β
β β βββββββββββββββββ β β
β β β Synthesizer β β β
β β βββββββββββββββββ β β
β βββββββββββββββββββββββββββββββββββββββββββββββββββ β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
State Machine Design
LangGraph StateGraph
The agent is implemented as a LangGraph StateGraph<ResearchState>. LangGraph provides:
- Typed State Channels: Each state field has a merge strategy
- Conditional Routing: Functions determine the next node based on state
- Checkpointing: State snapshots enable pause/resume workflows
- Streaming: Events are yielded as nodes complete
State Flow
Initial State
β
βΌ
βββββββββββ
β thinker β ββββββββββββββββββββββββββββββββ
ββββββ¬βββββ β
β produces plan β
βΌ β
βββββββββββ β
β auditor β β
ββββββ¬βββββ β
β β
βββ verdict: "rejected" βββββββββββββββ
β (planRevisionCount++)
β
βββ planRevisionCount >= max βββββββΊ END
β
βββ verdict: "approved"
β
βΌ
βββββββββββββ
β hitl_gate β ββββ interrupt() pauses here
βββββββ¬ββββββ
β humanApproved = true (external)
βΌ
βββββββββββββββββ
β tool_executor β βββββββββββββββ
βββββββββ¬ββββββββ β
β β
βββ more steps ββββββββββ
β
βββ all steps done
β
βΌ
βββββββββββββββ
β synthesizer β
ββββββββ¬βββββββ
β
βΌ
END
Channel Merge Strategies
LangGraph channels define how incoming state updates merge with existing state:
| Field | Strategy | Rationale |
|---|---|---|
reasoning |
Append + cap | Nodes pass one new entry per update; reducer concatenates and keeps the last 20 (MAX_REASONING_ENTRIES in state.ts) |
| All others | Replace | Last-write-wins for scalar values |
channels: {
reasoning: {
value: (existing, incoming) => {
const merged = [...(existing ?? []), ...(incoming ?? [])];
const cap = 20; // MAX_REASONING_ENTRIES in state.ts
return merged.length > cap ? merged.slice(merged.length - cap) : merged;
},
default: () => [],
},
plan: { value: (_, n) => n },
// ...
}
appendReasoning returns a single-element array so the reducer performs one append per node; this avoids duplicating the full history on every write.
Data Schemas
Type System Philosophy
All data structures use Zod for runtime validation. TypeScript types are inferred from Zod schemas, ensuring a single source of truth.
// Schema definition
export const ResearchStep = z.object({
id: z.string().uuid(),
tool: z.enum(["web_search"]),
input: z.string().min(1),
rationale: z.string(),
output: z.string().optional(),
});
// Type inference (no manual duplication)
export type ResearchStep = z.infer<typeof ResearchStep>;
Schema Hierarchy
ResearchState (root)
βββ threadId: UUID
βββ userQuery: string
βββ plan: ResearchPlan | null
β βββ objective: string
β βββ steps: ResearchStep[]
β β βββ id: UUID
β β βββ tool: enum
β β βββ input: string
β β βββ rationale: string
β β βββ output?: string
β βββ estimatedTokenBudget: number
β βββ revision: number
βββ auditResult: AuditResult | null
β βββ verdict: enum
β βββ policyViolations: string[]
β βββ suggestions: string[]
βββ reasoning: ReasoningEntry[]
β βββ node: enum
β βββ timestamp: datetime
β βββ summary: string
β βββ rawThought?: string
βββ status: enum
Node Implementation Patterns
Node Function Signature
All nodes follow the same pattern:
async function nodeName(state: ResearchState): Promise<Partial<ResearchState>> {
// 1. Read required state
// 2. Perform computation (LLM calls, tool execution, etc.)
// 3. Return partial state update
}
LangGraph merges the returned partial state into the existing state using channel strategies.
Prompt Engineering Pattern
Each LLM-calling node structures prompts for reliable JSON output:
const system = `
You are the [Role] node of DeepTrust.
[Brief description of responsibility]
Return ONLY a valid JSON object matching:
{
"field1": type,
"field2": type
}
Rules:
- [Constraint 1]
- [Constraint 2]
- Do not include markdown fences or prose outside JSON.
`;
const userMessage = `[Contextual input]`;
const raw = await chatComplete(system, userMessage);
const parsed = extractJSON(raw);
const validated = Schema.parse(parsed);
Error Handling Pattern
Nodes append to the reasoning trace even on failure, enabling debugging:
async function node(state: ResearchState) {
try {
// ... main logic
return { /* success state */ };
} catch (error) {
const reasoning = appendReasoning(state, {
node: "node_name",
summary: `Error: ${error.message}`,
});
return {
status: "failed",
errorMessage: error.message,
reasoning,
};
}
}
Routing Logic
Conditional Edges
LangGraph addConditionalEdges accepts a router function that returns the next node name:
graph.addConditionalEdges("auditor", routeAfterAudit, {
thinker: "thinker",
hitl_gate: "hitl_gate",
[END]: END,
});
The mapping object defines legal transitions. If the router returns a key not in the map, LangGraph throws an error.
Router Functions
Routers are pure functions that inspect state:
function routeAfterAudit(state: ResearchState): "thinker" | "hitl_gate" | typeof END {
// Safety ceiling check
if (state.planRevisionCount >= state.maxPlanRevisions) {
return END;
}
// Rejection triggers revision
if (state.auditResult?.verdict !== "approved") {
return "thinker";
}
// Approval proceeds to HITL
return "hitl_gate";
}
LLM Integration Layer
Architecture
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β llm/index.ts β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β Module-level State β
β βββ generatorPromise: Promise<TextGenerationPipeline> β
β βββ isModelLoaded: boolean β
β βββ currentProgress: number β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β Exports β
β βββ loadModel(onProgress?) β Promise<Pipeline> β
β βββ chatComplete(system, user) β Promise<string> β
β βββ getModelStatus() β ModelProgress β
β βββ MODEL_ID: string β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Lazy Loading Pattern
The model is loaded once and reused across all requests:
let generatorPromise: Promise<TextGenerationPipeline> | null = null;
export function loadModel(): Promise<TextGenerationPipeline> {
if (generatorPromise) return generatorPromise; // Return cached promise
generatorPromise = pipeline("text-generation", MODEL_ID, {
progress_callback: handleProgress,
});
return generatorPromise;
}
Progress Streaming
The Hugging Face Transformers library supports progress callbacks:
pipeline("text-generation", MODEL_ID, {
progress_callback: (data: { status: string; file?: string; progress?: number }) => {
// data.status: "download", "progress", "ready"
// data.file: current file being downloaded
// data.progress: 0-1 fraction
},
});
API Design
Research Streaming: Server-Sent Events (SSE)
Research results are streamed as Server-Sent Events so the client can distinguish event types and get low-latency, non-blocking updates. The response uses Content-Type: text/event-stream and each message has an event name plus a data payload (JSON).
Why SSE (not NDJSON): Standard SSE gives a single, well-understood protocol for streaming; event names (start, research, error) allow the client to handle each kind of message without guessing. Proxies and browsers handle SSE well, and we can add ping/heartbeat later without changing the wire format.
Wire format:
event: start
data: {"node":"_start","state":{"status":"started","plan":{...},"reasoning":[...]}}
event: research
data: {"node":"thinker","state":{"status":"thinking","plan":{...}}}
event: research
data: {"node":"auditor","state":{"status":"awaiting_approval","auditResult":{...}}}
event: research
data: {"node":"synthesizer","state":{"status":"complete","finalReport":"..."}}
event: error
data: {"node":"_error","state":{"status":"failed","errorMessage":"..."}}
Server (route): The route encodes each event with event: <name>\ndata: <JSON>\n\n and enqueues into a ReadableStream, then closes the stream when the graph run finishes or throws.
Client: The client reads the response body with getReader(), accumulates chunks, splits on \n\n to get full SSE messages, then for each message parses the event: line and the data: line (JSON). Events of type research (and start) are appended to the events list; the last event carrying finalReport is used to drive the word-by-word streaming animation in the chat.
Model Loading Protocol
Model load also uses SSE for download progress:
data: {"status":"downloading","progress":25,"file":"model.onnx"}\n\n
data: {"status":"downloading","progress":50,"file":"model.onnx"}\n\n
data: {"status":"ready","progress":100,"modelId":"...","dtype":"q4"}\n\n
SSE format: data: prefix, JSON body, double newline (\n\n) between events.
Frontend Architecture
The workspace (app/page.tsx) is built for a Cursor/Gemini-like flow: immediate feedback, non-blocking streaming, and clear separation between chat, context, and observability.
Layout and Responsibilities
| Area | Purpose |
|---|---|
| Chat | User messages and assistant replies. Assistant messages show a shimmer placeholder while waiting, then the final report is revealed word-by-word for a live-conversation feel. |
| Context panel | Knowledge drop zone: drag-and-drop PDFs, notes, and URL references. Items are indexed in IndexedDB via lib/knowledge (browser embeddings + chunks); on submit, retrieve(query) runs client-side and the request sends retrievedContext + contextUrls to POST /api/research. |
| Model card | Model selection, load/progress, status pill (Ready / Loading / Error). Uses the same SSE pattern as model load API. |
| Reasoning trace | Scrollable list of the latest reasoning summaries per node so you can follow the graphβs flow while the chat shows the final answer. |
Optimistic UI and Streaming Flow
- On submit: The client immediately appends a user message and an assistant placeholder (with shimmer) to the chat and sets
isStreaming = true. No wait for the first byte. - SSE consumption:
POST /api/researchis read withresponse.body.getReader(). Chunks are decoded and split on\n\n. Each SSE message is parsed forevent:anddata:;startandresearchevents are appended to the events list. - Final report: When an event contains
state.finalReport, that text is stored and a word-by-word animation is started for the latest assistant message: a timer (e.g. every 40ms) reveals the next word until the full report is shown, thenisStreamingis cleared. - Abort/cleanup: A ref holds an
AbortControllerfor the in-flight request; starting a new run aborts the previous one and clears the streaming timer so only one βliveβ reply runs at a time.
Client-side knowledge (browser βvector DBβ)
RAG is implemented under lib/knowledge/ and runs only in the browser (see README βClient-side knowledge storeβ).
- Storage: IndexedDB database
deeptrust-knowledge:documents(id, type: file | url | note, label, optional url) andchunks(id, documentId, text, embedding float array, span indices). IndexbyDocumenton chunks supports cascading delete. - Embeddings:
@xenova/transformersfeature-extractionwithXenova/all-MiniLM-L6-v2, mean-pooled and normalized; cosine similarity in JS for scoring. Distinct from the serverβs chat LLM (worker thread). - Ingestion: PDFs β extract text β
chunkText(~500 chars, overlap) β embed each chunk β persist. Notes β same. URLs β one chunk embeddingURL: β¦(no network fetch of page body in v1). - Retrieval:
retrieve(userQuery)embeds the query, scores all chunks, returns top-K (8) concatenated asretrievedContextand deduped URL list ascontextUrls. - API contract:
POST /api/researchbody includesquery,retrievedContext,contextUrls. The route passes them intorunResearchoptions βResearchState.knowledgeContext/contextUrlsfor Thinker and Synthesizer. Original files are not uploadedβonly retrieved snippet text crosses the wire.
Quick Actions and Starter Cards
- Quick-action chips below the input (e.g. local knowledge + cited web results, approve-before-tools, auditable source trail, on-device vs network) set or extend the query and optionally trigger a run.
- Starter cards in the empty state show example prompts aligned with privacy-first research (e.g. local inference, verifiable sources) and populate the input or start a run when clicked.
Why This Structure
- Single page: All controls (model, context, chat, trace) stay on one screen to reduce context switching and match a βflow stateβ tool.
- SSE end-to-end: Both research and model load use SSE so the client has one mental model: stream events, parse by type, update UI.
- Word-by-word: The synthesizer returns the full report in one event; animating it word-by-word on the client gives a streaming feel without changing the backend contract.
File Organization
Separation of Concerns
| Directory | Responsibility |
|---|---|
lib/agent/nodes/ |
Individual node implementations |
lib/agent/llm/ |
LLM client abstraction |
lib/agent/utils/ |
Shared utilities (JSON extraction, policy loading) |
lib/agent/ |
Graph construction, state schemas, routing |
lib/knowledge/ |
Client-only RAG: IndexedDB, Xenova embeddings, chunking, retrieve (import from browser only) |
app/api/ |
HTTP endpoints |
app/ |
React UI components |
Import Hierarchy
app/api/research/route.ts
βββ @/lib/agent (public API)
βββ graph.ts
βββ state.ts
βββ nodes/index.ts
β βββ thinker.ts β llm, utils, state
β βββ auditor.ts β llm, utils, state
β βββ ...
βββ routing.ts β state
Checkpointing and Persistence
MemorySaver (Development)
Default checkpointer stores state in memory. State is lost on server restart.
import { MemorySaver } from "@langchain/langgraph";
const checkpointer = new MemorySaver();
Production Persistence
For production, swap to a persistent checkpointer:
import { PostgresSaver } from "@langchain/langgraph-checkpoint-postgres";
const checkpointer = await PostgresSaver.fromConnString(process.env.DATABASE_URL);
Thread-Based State
Each research session has a unique threadId. The checkpointer keys state by thread:
const config = { configurable: { thread_id: initialState.threadId } };
// Stream with checkpointing
for await (const event of graph.stream(initialState, config)) { ... }
// Resume after HITL: pass a Command with `resume` so interrupt() in hitl_gate can proceed,
// and set humanApproved in the same step
for await (const event of graph.stream(
new Command({ resume: true, update: { humanApproved: true } }),
config
)) { ... }
Security Considerations
Policy Enforcement
The Auditor node validates plans against POLICY.md before execution. Policy rules should cover:
- Data access restrictions
- External request limits
- Allowed tool types
- Content guidelines
Tool execution
web_search: Fetches HTML results from DuckDuckGo by default (no API key). IfGOOGLE_CSE_API_KEYandGOOGLE_CSE_CXare set, uses Google Custom Search JSON API instead. Timeouts and bounded result counts limit external HTTP usage.
Further hardening (rate limits, allowlists) can be added as deployment needs grow.
Input Validation
All state mutations pass through Zod schemas, preventing malformed data from propagating.
Future Considerations
Potential Enhancements
- Persistent Checkpointing: PostgreSQL or Redis for production state storage
- More tools: Document fetch (e.g. Playwright), optional code execution, or alternate search backends (Tavily, SearXNG)
- Multi-Model Support: Router to select appropriate model per task complexity
- Observability: OpenTelemetry traces for node-level metrics
- Parallel Tool Execution: Execute independent steps concurrently