Spaces:

dings4ever
/

guide

Sleeping

App Files Files Community

guide / docs /architecture.md

anmol-iisc

UI enhancements, letter text redundant text removed

d230384 12 days ago

preview code

Raw

History Blame Contribute Delete

31 kB

	# G.U.I.D.E. — Technical Architecture & Specification

	## 1. Overview

	G.U.I.D.E. (Grievance Utility for Information Extraction, Drafting and Enrichment)
	is a four-layer, spec-driven system built for consumer complaint resolution.
	Every component has a clear contract; layers communicate through defined interfaces
	so each piece can be tested and replaced independently.

	```
	┌─────────────────────────────────────────────────────────────────┐
	│ GRADIO FRONTEND │
	│ Chat · Verify (HITL) · Documents · Draft · Escalation · About │
	└──────────────────────────┬──────────────────────────────────────┘
	│ HTTP (REST)
	┌──────────────────────────▼──────────────────────────────────────┐
	│ FASTAPI BACKEND │
	│ /api/session /api/message /api/upload │
	│ /api/session/{id}/validate-entities /api/status │
	└──┬──────────────────┬────────────────────────────────────────────┘
	│ │
	│ Step 1 (local) │ Step 2 (external API — after redaction)
	│ │
	┌──▼──────────────┐ ┌▼────────────────────────────────────────────┐
	│ PRESIDIO │ │ CLAUDE MANAGED AGENT (CMA) │
	│ PIIRedactor │ │ │
	│ (runs locally) │ │ Tools: │
	│ │ │ - classify_domain() ──► DomainClassifier │
	│ Redacts: │ │ - extract_entities() ──► EvidenceNER │
	│ PERSON │ │ - process_document() ──► OCR / ViT + NER │
	│ PHONE_NUMBER │ │ - draft_complaint() ──► Claude (internal) │
	│ EMAIL_ADDRESS │ │ - recommend_action() ──► NextActionPredict │
	│ CREDIT_CARD │ │ │
	│ IN_AADHAAR │ │ HITL gate: agent pauses before drafting │
	│ IN_PAN │ │ and requests user confirmation of entities │
	│ IBAN_CODE │ │ │
	│ ... │ │ Memory: per-user session state │
	└─────────────────┘ └─────────────────────────────────────────────┘
	│
	┌────────────▼─────────────────────────────┐
	│ DEEP LEARNING LAYER │
	│ │
	│ 1. DomainClassifier (DistilBERT) │
	│ 2. EvidenceNER (DistilBERT tokens) │
	│ 3. DocumentViT (ViT image encoder) │
	│ 4. NextActionPredictor (MLP) │
	└──────────────────────────────────────────┘
	│
	┌────────────▼─────────────┐
	│ DOCUMENT PROCESSOR │
	│ Tesseract OCR │
	│ pdfplumber (PDF parse) │
	│ PIL (image pre-process) │
	│ ViT (image understanding)│
	└──────────────────────────┘
	```

	---

	## 2. Component Specifications

	### 2.0 Privacy Preprocessing — Microsoft Presidio

	This layer runs locally before any message is forwarded to Claude or any
	external service. It is the first step in the pipeline.

	\| Attribute \| Value \|
	\|--------------\|-----------------------------------------------------------\|
	\| Library \| `presidio-analyzer` + `presidio-anonymizer` \|
	\| NLP engine \| spaCy `en_core_web_lg` (local, no network call) \|
	\| Trigger \| Every `/api/session/{id}/message` call \|
	\| Entity types \| `PERSON`, `PHONE_NUMBER`, `EMAIL_ADDRESS`, `CREDIT_CARD`, `IBAN_CODE`, `US_BANK_NUMBER`, `IN_AADHAAR`, `IN_PAN`, `IN_VEHICLE_REGISTRATION` \|
	\| Replacement \| Each detected span → `<ENTITY_TYPE>` placeholder \|
	\| Output \| `RedactionResult(redacted_text, pii_types_found, pii_redacted)` \|
	\| Failure mode \| On any error, original text is returned unchanged (fail-open so pipeline is never blocked) \|

	Why local?
	The user's name, account number, and Aadhaar UID must never leave the device
	in plaintext. Running Presidio in-process (same Python server) ensures redaction
	happens before TCP bytes are written to the Anthropic API.

	What the DL models receive:
	The local DL models (DomainClassifier, EvidenceNER) are called by Claude through
	tool calls — they therefore also receive redacted text. The EvidenceNER model is
	still effective because it targets structural patterns (amounts, dates, reference
	IDs) that are not redacted by Presidio.

	---

	### 2.1 Deep Learning Layer

	#### 2.1.1 DomainClassifier

	\| Attribute \| Value \|
	\|-----------------\|----------------------------------------------------\|
	\| Architecture \| `distilbert-base-uncased` + linear classification head \|
	\| Task \| Multi-class text classification \|
	\| Classes (6) \| `ecommerce`, `telecom`, `banking`, `cibil`, `insurance`, `general` \|
	\| Training data \| CFPB Consumer Complaint Database (3M+ rows) — one-time download from Kaggle. Save as `data/raw/complaints.csv`. \|
	\| Script \| `python -m src.classifier.train --cfpb_csv data/raw/complaints.csv --output_dir models/domain_classifier` \|
	\| Input \| Redacted complaint text (string, max 512 tokens) \|
	\| Output \| `DomainResult(domain: str, confidence: float, all_probs: dict, low_confidence: bool)` \|
	\| Confidence threshold \| `0.50` — results below this set `low_confidence=True` \|
	\| Low-confidence path \| Agent asks user one clarifying domain question; does not proceed until user confirms \|
	\| Keyword fallback \| Used when no checkpoint exists; always returns `confidence=0.0`, `low_confidence=True` \|
	\| Fine-tune time \| ~30 min CPU / ~5 min GPU (T4) \|
	\| Library \| HuggingFace `transformers` + `datasets` \|

	Why DistilBERT?
	DistilBERT is 40% smaller and 60% faster than BERT-base with only 3% accuracy
	loss. For a project with limited compute, it is the ideal starting point.
	The CFPB dataset maps naturally to our 6 classes after label remapping.

	Low-confidence handling:
	`general` is the intentional catch-all class — the model never returns an error, only a
	domain + probability. However, a low probability on all classes (e.g., the complaint text
	is too short or ambiguous) means the winning domain is unreliable. When `confidence < 0.50`
	the `low_confidence` flag is set and the CMA agent pauses to ask the user one clarifying
	question ("Is this about e-commerce, telecom, banking, credit score, insurance, or other?")
	before continuing. The user's answer overrides the model's suggestion and is stored with
	`domain_source = "user_confirmed"` so later tools know the domain is authoritative.

	---

	#### 2.1.2 EvidenceNER

	\| Attribute \| Value \|
	\|--------------\|------------------------------------------------------------\|
	\| Architecture \| `distilbert-base-uncased` with token classification head \|
	\| Task \| Named Entity Recognition (NER) on complaint text \|
	\| Entity types \| `ORG`, `AMOUNT`, `DATE`, `REF_ID`, `ACCOUNT`, `PERSON` \|
	\| Training \| ~4,000 synthetic complaint sentences generated in-memory by `src/ner/train.py` (no download needed). Optionally augmented with CoNLL-2003 via HuggingFace if internet is available (maps PER→PERSON, ORG→ORG; discards LOC/MISC). \|
	\| Script \| `python -m src.ner.train --output_dir models/evidence_ner` \|
	\| Input \| Redacted text (from user or OCR) \|
	\| Output \| List of `{text, label, start, end, confidence}` spans \|

	Entities and their use in drafting:
	\| Entity \| Example \| Used for \|
	\|----------\|---------------------------------\|-----------------------------\|
	\| ORG \| "Flipkart", "HDFC Bank" \| Complaint addressee \|
	\| AMOUNT \| "₹4,299", "Rs. 1,200" \| Financial loss quantified \|
	\| DATE \| "12 March 2024", "last Tuesday" \| Incident timeline \|
	\| REF_ID \| "Order #OD-2930291", "TXN123" \| Evidence reference \|
	\| ACCOUNT \| "XXXX-1234", "loan account" \| Dispute target \|
	\| PERSON \| "customer care executive" \| Named witness/contact \|

	---

	#### 2.1.3 DocumentViT

	\| Attribute \| Value \|
	\|--------------\|---------------------------------------------------------------\|
	\| Architecture \| Vision Transformer (`google/vit-base-patch16-224` fine-tuned)\|
	\| Task \| Structured evidence extraction from document images \|
	\| Input \| Scanned receipt / bill / screenshot (PIL Image) \|
	\| Output \| List of `{text, label, confidence}` spans (same schema as NER)\|
	\| When used \| After OCR; ViT runs as a complementary pass on image-type docs\|
	\| Library \| HuggingFace `transformers` (`ViTForImageClassification` + custom head) \|

	Why ViT alongside OCR?
	Tesseract OCR excels at clean printed text but struggles with handwriting, logos,
	and table structures. The ViT model, fine-tuned on receipt and bill images, directly
	classifies image regions and extracts amount/date/provider fields — especially
	useful for blurry screenshots and poorly-scanned documents.

	---

	#### 2.1.4 NextActionPredictor

	\| Attribute \| Value \|
	\|---------------\|-------------------------------------------------------------------\|
	\| Architecture \| 2-hidden-layer MLP (12-dim input → 64 → 64 → 6) \|
	\| Input \| 12-dim feature vector: domain one-hot (6) + entity flags (5) + prior_contact (1) \|
	\| Output \| Ranked list of `{action, authority, url, confidence}` \|
	\| Actions \| 6 classes: `company_support`, `nch`, `trai`, `rbi_ombudsman`, `irdai`, `legal` \|
	\| Training data \| ~6,000 synthetic (domain, entity_flags, prior_contact → action) examples generated in-memory from `DOMAIN_ACTION_PRIORS`; no download needed. Trains in < 30 seconds on CPU. \|
	\| Script \| `python -m src.next_action.train --output_dir models/next_action` \|
	\| Fallback \| If no checkpoint exists, `DOMAIN_ACTION_PRIORS` rule-based mapping is used so the pipeline always works. \|

	Escalation routing logic:
	\| Domain \| Primary Authority \| Secondary \|
	\|-----------\|---------------------------\|--------------------\|
	\| E-commerce\| Company support → NCH \| Consumer Forum \|
	\| Telecom \| Company support → TRAI \| NCH \|
	\| Banking \| Company support → RBI BO \| Banking Ombudsman \|
	\| CIBIL \| Bureau direct → RBI BO \| SEBI (if investment)\|
	\| Insurance \| Company support → IRDAI \| Insurance Ombudsman\|
	\| General \| Company support → NCH \| Consumer Forum \|

	---

	### 2.2 Claude Managed Agent (CMA)

	The CMA is the orchestration layer. It maintains per-user session state
	(conversation history, extracted entities, uploaded docs, draft versions) and
	decides at each turn which tool to invoke.

	Key constraint: Claude only ever sees redacted text (PII replaced with
	`<ENTITY_TYPE>` placeholders by Presidio before the API call). This is documented
	in the system prompt so Claude knows not to try to recover original values.

	#### Agent System Prompt Summary
	```
	You are G.U.I.D.E., an expert consumer complaint assistant.
	PII has already been redacted locally — work with placeholders as-is.

	Rules:
	1. Always classify the domain first using classify_domain().
	• If low_confidence=false (≥ 0.50): store domain and proceed.
	• If low_confidence=true (< 0.50 or keyword fallback): ask the user ONE
	clarifying question ("Is this about e-commerce, telecom, banking, credit
	score, insurance, or other?") before continuing. Store domain_source=
	"user_confirmed" when the domain comes from the user.
	• If classify_domain() errors: same clarifying question as above.
	2. Ask ONE targeted follow-up question at a time if information is missing.
	3. If documents are uploaded, always run process_document before drafting.
	4. HITL gate: Before calling draft_complaint, present extracted details
	as a numbered summary and ask the user to confirm them. Wait for
	[USER CONFIRMED] message before proceeding.
	5. Never draft until domain, provider, date, amount, prior contact, and
	desired resolution are all known AND user-confirmed.
	6. Generate drafts in formal English: Subject / To / Body / From.
	7. Always recommend the next escalation step with specific portal URLs.
	```

	#### Tool Specifications

	\| Tool \| Input \| Output \| Calls \|
	\|-------------------\|------------------------------\|---------------------------------\|-------------------\|
	\| `classify_domain` \| `complaint_text: str` \| `DomainResult` \| DL DomainClassifier \|
	\| `extract_entities`\| `text: str` \| `List[Entity]` \| DL EvidenceNER \|
	\| `process_document`\| `file_path: str` \| `{raw_text, entities}` \| OCR + ViT + EvidenceNER \|
	\| `draft_complaint` \| `complaint_context: dict` \| `ComplaintDraft` \| Claude (internal) \|
	\| `recommend_action`\| `domain: str, entities: dict`\| `List[EscalationAction]` \| DL NextAction \|
	\| `store_memory` \| `key: str, value: any` \| `None` \| CMA Memory Store \|
	\| `get_memory` \| `key: str` \| `any` \| CMA Memory Store \|

	#### CMA Decision Flow (per user turn)

	```
	User message received
	│
	▼ ── PRESIDIO (API layer, before agent) ───────────────
	PII redacted locally → redacted_text forwarded to Claude
	│
	▼
	Is domain known? ──No──► call classify_domain() ──► store in memory
	│
	Yes
	│
	▼
	Are minimum fields complete? ──No──► ask ONE follow-up question
	(provider, date, amount, ref)
	│
	Yes
	│
	▼
	Was a document uploaded? ──Yes──► call process_document() ──► merge entities
	│
	No
	│
	▼ ── HITL GATE ──────────────────────────────────────────
	Present extracted details summary → ask user to confirm
	│
	Wait for [USER CONFIRMED] message (from /validate-entities endpoint)
	│
	▼
	Has user confirmed entities? ──Yes──► call draft_complaint() ──► show draft
	│
	▼
	Has user asked next steps? ──Yes──► call recommend_action() ──► show escalation
	```

	---

	### 2.3 Human-in-the-Loop (HITL) Validation

	After all required fields are collected and before draft generation, the system
	pauses and requires explicit user confirmation of the extracted entities.

	\| Step \| Component \| Description \|
	\|------\|-----------\|-------------\|
	\| 1 \| CMA \| Presents extracted entities as a numbered summary in chat \|
	\| 2 \| Frontend \| Populates the Verify Entities tab with pre-filled editable fields \|
	\| 3 \| User \| Reviews, edits any incorrect value, and clicks "Confirm & Generate Draft" \|
	\| 4 \| API \| `POST /api/session/{id}/validate-entities` sends verified entities to CMA \|
	\| 5 \| CMA \| Receives `[USER CONFIRMED]` message and calls `draft_complaint()` \|

	Why HITL?
	PII redaction replaces some values with placeholders (e.g., a name becomes
	`<PERSON>`). The HITL step lets the user supply the correct readable label
	(e.g., "HDFC Bank" rather than just `<ORG>`) that will appear in the final draft,
	improving both accuracy and trust in the generated complaint.

	---

	### 2.4 Document Processor

	\| Feature \| Implementation \|
	\|--------------\|---------------------------------------\|
	\| PDF parsing \| `pdfplumber` (text-native PDFs) \|
	\| Image OCR \| `pytesseract` + `Pillow` (pre-process)\|
	\| ViT pass \| `google/vit-base-patch16-224` fine-tuned on receipt/bill images \|
	\| Pre-process \| Greyscale → adaptive threshold → deskew \|
	\| Output \| Clean extracted text + NER entities \|
	\| Formats \| PDF, PNG, JPG, JPEG, WEBP \|

	---

	### 2.5 FastAPI Backend

	\| Endpoint \| Method \| Description \|
	\|---------------------------------------\|--------\|----------------------------------------\|
	\| `/api/health` \| GET \| Health check (all components) \|
	\| `/api/session/create` \| POST \| Create new CMA session \|
	\| `/api/session/{id}/message` \| POST \| Send message → Presidio redact → agent \|
	\| `/api/session/{id}/upload` \| POST \| Upload a document to session \|
	\| `/api/session/{id}/validate-entities` \| POST \| HITL: submit user-confirmed entities \|
	\| `/api/session/{id}/history` \| GET \| Retrieve conversation history \|
	\| `/api/classify` \| POST \| Direct DL classification (debug) \|
	\| `/api/extract` \| POST \| Direct NER extraction (debug) \|

	---

	### 2.6 Gradio Frontend

	Tabs:
	1. Chat — Conversational interface; shows 🔒 privacy badge when PII is redacted
	2. Verify Entities — HITL panel: editable entity fields + "Confirm & Generate Draft"
	3. Documents — Drag-and-drop upload; shows extracted entities
	4. Complaint Draft — Rendered complaint with copy/download
	5. Escalation Guide — Recommended authorities with portal links
	6. About — Architecture diagram, model cards, tech stack

	---

	## 3. Technology Stack

	\| Layer \| Technology \| Reason \|
	\|--------------------\|--------------------------------\|-----------------------------------------------\|
	\| Launcher \| `start.py` (stdlib only) \| Single script — trains all models then starts servers \|
	\| Privacy \| Microsoft Presidio + spaCy \| Local PII redaction, no cloud call \|
	\| DL Models \| HuggingFace Transformers \| Industry standard for NLP + ViT \|
	\| Classifier data \| CFPB dataset (Kaggle, one-time)\| 3M+ real complaints, public license \|
	\| NER data \| Synthetic in-memory \| Template-generated; no download required \|
	\| NextAction data \| Synthetic in-memory \| Generated from domain priors; no download \|
	\| Agent \| Anthropic CMA (default `claude-sonnet-4-6`, set via `GUIDE_MODEL`) \| Stateful, tool-using agent \|
	\| Backend \| FastAPI + Uvicorn \| Async, fast, OpenAPI auto-docs \|
	\| Frontend \| Gradio 4.x \| ML-native UI, file upload, chat \|
	\| OCR \| pytesseract + pdfplumber \| Proven, open-source \|
	\| ViT doc model \| HuggingFace ViT \| Image-based evidence extraction \|
	\| Env \| Python 3.10+ \| Required by CMA SDK \|
	\| Config \| python-dotenv \| Secure API key management \|
	\| Notebooks (optional) \| Jupyter \| EDA and demo only; not required to run system \|

	---

	## 4. Data Flow — End to End

	```
	User types: "Rahul Sharma — Flipkart hasn't refunded ₹4,299 for order OD-123
	cancelled 3 weeks ago. My phone is 9876543210."
	│
	▼ ── LOCAL ONLY ─────────────────────────────────────────────
	Presidio PIIRedactor detects: PERSON("Rahul Sharma"), PHONE("9876543210")
	Redacted: "<PERSON> — Flipkart hasn't refunded ₹4,299 for order OD-123
	cancelled 3 weeks ago. My phone is <PHONE_NUMBER>."
	│
	▼ ── EXTERNAL API ────────────────────────────────────────────
	FastAPI → CMA session with redacted text
	│
	Claude CMA agent processes redacted message
	│
	├──► classify_domain(...)
	│ └──► DomainClassifier → {domain: "ecommerce", conf: 0.97}
	│
	├──► extract_entities(...)
	│ └──► EvidenceNER → [ORG:"Flipkart", AMOUNT:"₹4,299",
	│ REF_ID:"OD-123", DATE:"3 weeks ago"]
	│
	└──► HITL gate: "I have extracted the following — please confirm:
	- Company: Flipkart
	- Amount: ₹4,299
	- Order ID: OD-123
	- Date: 3 weeks ago
	Is this correct?"
	│
	User reviews in "Verify Entities" tab → edits if needed → clicks Confirm
	│
	▼
	POST /validate-entities → [USER CONFIRMED] → draft_complaint()
	│
	ComplaintDraft generated and shown in Draft tab
	│
	recommend_action(domain="ecommerce") → [NCH, Consumer Forum]
	│
	Gradio renders: Draft · Evidence table · Escalation panel
	```

	---

	## 5. Project Directory Structure

	```
	Project_ResolveAI/
	│
	├── start.py ← SINGLE ENTRY POINT — trains all models then starts servers
	│
	├── docs/
	│ ├── abstract.md ← project abstract (G.U.I.D.E.)
	│ └── architecture.md ← this file (spec)
	│
	├── src/
	│ ├── __init__.py
	│ ├── privacy/ # Presidio PII redaction (runs before any external call)
	│ │ ├── __init__.py
	│ │ └── redactor.py ← PIIRedactor singleton
	│ │
	│ ├── classifier/ # DL Domain Classifier (DistilBERT)
	│ │ ├── model.py ← DomainClassifier, CFPB_PRODUCT_MAP, LABEL2ID
	│ │ ├── dataset.py ← load_cfpb_csv(), clean_complaint_text(), ComplaintDataset
	│ │ ├── train.py ← CLI: --cfpb_csv, --output_dir, --epochs, --batch_size
	│ │ └── predict.py ← DomainPredictor singleton, classify_domain()
	│ │
	│ ├── ner/ # DL NER model (DistilBERT token classifier)
	│ │ ├── model.py ← EvidenceNER, NER_LABELS, NER_LABEL2ID
	│ │ ├── train.py ← Generates synthetic data in-memory; CLI: --output_dir
	│ │ └── predict.py ← NERPredictor singleton, extract_entities()
	│ │
	│ ├── next_action/ # Next-action MLP predictor
	│ │ ├── model.py ← NextActionMLP, DOMAIN_ACTION_PRIORS, build_feature_vector()
	│ │ ├── train.py ← Generates synthetic features; CLI: --output_dir, --epochs
	│ │ └── predict.py ← NextActionPredictor singleton (MLP or rule-based fallback)
	│ │
	│ ├── document_processor/ # OCR + PDF parsing + ViT
	│ │ ├── ocr.py ← Tesseract + pdfplumber pipeline
	│ │ └── vit_extractor.py ← ViT-based image evidence extraction
	│ │
	│ ├── agent/ # Claude CMA integration
	│ │ ├── tools.py ← Tool definitions (JSON Schema) + execute_tool()
	│ │ ├── prompts.py ← SYSTEM_PROMPT (HITL rule 6, privacy context)
	│ │ └── session.py ← AgentManager singleton, send_message()
	│ │
	│ └── api/ # FastAPI application
	│ ├── main.py ← Lifespan: Presidio → DL models → CMA agent
	│ ├── routes.py ← /message (Presidio→agent), /validate-entities (HITL)
	│ └── schemas.py ← Pydantic models incl. HITL + pii_redacted fields
	│
	├── ui/
	│ └── app.py ← Gradio: Chat · Verify · Docs · Draft · Escalation · About
	│
	├── notebooks/ # OPTIONAL — EDA and interactive demos only
	│ ├── 01_data_exploration.ipynb ← Explore CFPB dataset, save processed CSV
	│ ├── 02_classifier_training.ipynb
	│ ├── 04_cma_agent_demo.ipynb
	│ └── 05_end_to_end_demo.ipynb
	│
	├── data/
	│ ├── raw/ ← Place CFPB complaints.csv here (one-time download)
	│ ├── processed/ ← Output of EDA notebook; not required for training
	│ └── sample_complaints/ ← Synthetic domain-specific CSVs for augmentation
	│
	├── models/ ← Created by training; populated by start.py
	│ ├── domain_classifier/ ← best_model.pt + tokenizer files
	│ ├── evidence_ner/ ← best_model.pt + tokenizer files
	│ ├── document_vit/ ← ViT checkpoint
	│ └── next_action/ ← best_model.pt
	│
	├── CLAUDE.md ← Guidance for Claude Code
	├── requirements.txt ← presidio-analyzer, presidio-anonymizer, spacy, torch, etc.
	├── .env.example ← Template — copy to .env and add ANTHROPIC_API_KEY
	├── .gitignore
	└── README.md
	```

	---

	## 6. Setup and Running

	### Step 1 — Get an Anthropic API key

	1. Go to https://console.anthropic.com → Sign up (free tier available)
	2. Navigate to API Keys → Create Key
	3. Copy the key (shown only once)
	4. In the project root, copy the template and fill in your key:
	```
	cp .env.example .env
	# then edit .env:
	ANTHROPIC_API_KEY=sk-ant-...
	```
	Never commit `.env` to git (listed in `.gitignore`).

	### Step 2 — Install dependencies

	```bash
	pip install -r requirements.txt
	python -m spacy download en_core_web_lg # Presidio NLP model (local, ~750 MB)
	```

	### Step 3 — Download CFPB data (first run only)

	The DomainClassifier requires the CFPB Consumer Complaint Database:

	- Download from Kaggle: `consumer-complaint-database` dataset
	- Save the CSV to `data/raw/complaints.csv`
	- Size: ~600 MB; one-time download; not committed to git

	This is only needed to train the classifier. The NER and NextAction models generate their training data in-memory automatically.

	### Step 4 — Run (single command)

	```bash
	# First run — trains all models then starts servers:
	python start.py --cfpb_csv data/raw/complaints.csv

	# After first run — models already trained, skip training:
	python start.py --no-train

	# Force retrain everything:
	python start.py --cfpb_csv data/raw/complaints.csv --train

	# Train only (no servers):
	python start.py --cfpb_csv data/raw/complaints.csv --train-only
	```

	When running, `start.py` will:

	1. Validate `.env` and `ANTHROPIC_API_KEY`
	2. Train DomainClassifier (~30 min CPU / ~5 min GPU T4) — skipped if checkpoint exists
	3. Train EvidenceNER (~10 min CPU) — skipped if checkpoint exists
	4. Train NextActionMLP (< 30 sec CPU) — skipped if checkpoint exists
	5. Start FastAPI at `http://localhost:8000` (Swagger docs at `/docs`)
	6. Start Gradio UI at `http://localhost:7860`

	Both servers print to the same terminal, prefixed with `[API]` or `[UI]`. Press Ctrl+C to stop everything cleanly.

	---

	## 7. Development Phases

	\| Phase \| Deliverable \| Status \|
	\|-------\|------------------------------------------------\|--------------\|
	\| 0 \| Abstract submitted (G.U.I.D.E.) \| Done ✓ \|
	\| 1 \| Architecture + spec (`docs/architecture.md`) \| Done ✓ \|
	\| 2 \| Project scaffold + environment setup \| Done ✓ \|
	\| 3 \| Presidio PII redaction layer (`src/privacy/`) \| Done ✓ \|
	\| 4 \| DL: DomainClassifier — model, dataset, train \| Done ✓ \|
	\| 5 \| DL: EvidenceNER + NextActionMLP — model, train \| Done ✓ \|
	\| 6 \| DL: ViT document extractor (fine-tuning) \| In progress \|
	\| 7 \| CMA agent + tools integration \| Done ✓ \|
	\| 8 \| Document processor (OCR + ViT pipeline) \| Done ✓ \|
	\| 9 \| FastAPI backend (HITL endpoint, schemas) \| Done ✓ \|
	\| 10 \| Gradio UI (Verify tab, privacy badge, HITL) \| Done ✓ \|
	\| 11 \| Single launcher (`start.py`) + CLAUDE.md \| Done ✓ \|
	\| 12 \| Integration testing + demo notebooks \| Remaining \|
	\| 13 \| Final report + presentation \| Remaining \|