Spaces:

ZedLow
/

Constrained-Financial-RAG

Sleeping

App Files Files Community

ZedLow commited on Feb 6

Commit

80ff8d2

verified ·

1 Parent(s): f464881

Update README.md

Browse files

Files changed (1) hide show

README.md +176 -10

README.md CHANGED Viewed

@@ -1,13 +1,179 @@
 ---
-title: Finance RAG Analyst
-emoji: 📉
-colorFrom: purple
-colorTo: green
-sdk: gradio
-sdk_version: 6.5.1
-app_file: app.py
-pinned: false
-license: mit
 ---
-Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

+# 🛡️ Financial RAG with Entity-Aware Routing & Vision-Native Tables
+**Portfolio project – Master 1 Artificial Intelligence**
+Design of a **constrained Financial RAG pipeline** focused on reducing hallucinations and accidental cross-document errors **through architecture, not prompt engineering**.
+---
+## ⚠️ Scope
+This is a **controlled demo system** operating on a small “golden dataset” (6 pages from the 2023 annual reports of **Apple** and **Microsoft**).
+The goal is to demonstrate **system design choices and failure modes**, not dataset scale or production readiness.
+---
+## Why this project exists
+Most RAG failures in finance are **structural**, not model-related:
+- Documents from different companies are mixed due to semantic similarity
+- Tables are flattened by OCR, breaking numeric alignment
+- Models answer confidently even when data is missing or ambiguous
+This project explores how **explicit constraints and routing** can eliminate entire classes of errors — and where those constraints **still break**.
+---
+## Design Philosophy
+### Constraints > Prompts
+Instead of asking the model to “be careful”, the system:
+- restricts what can be retrieved
+- restricts what can be answered
+- rejects unsupported or ambiguous queries
+The objective is **inspectability and predictable failure**, not conversational flexibility.
+---
+## Key Properties
+- 🧠 **Entity-aware routing (CPU)**
+  Company entities are detected before retrieval.
+- 🔒 **Corpus-level filtering**
+  Single-entity queries are routed to a single company corpus.
+- 👁️ **Vision-native table reasoning**
+  Financial tables are processed as images to preserve structure (no OCR flattening).
+- 🚫 **Explicit refusal**
+  Out-of-scope or ambiguous queries fail loudly instead of hallucinating.
+- 🔍 **Transparent pipeline**
+  Retrieval, reranking, and source pages are fully visible in the UI.
 ---
+## Architecture Overview
+```mermaid
+flowchart LR
+    Q[User Query] --> R{Entity Router (GLiNER • CPU)}
+    R -->|Apple| A[Apple corpus only]
+    R -->|Microsoft| M[Microsoft corpus only]
+    R -->|Other / none| X[Reject (out of scope / ambiguous)]
+    A --> D[Dense retrieval (gte-Qwen2-7B)]
+    M --> D
+    D --> K[Reranker (BGE-M3)]
+    K --> V[Vision reasoning (Qwen2-VL)]
+    V --> Y[Grounded answer]
+## Observed System Behavior (Important)
+This section reflects **actual behavior observed in the demo**, not idealized guarantees.
+- Single-entity queries are routed to the corresponding company corpus only
+- Explicit multi-entity queries may trigger cross-company reasoning
+- Source-constrained prompts are not strictly enforced
+- Out-of-scope entities (Google, Tesla, etc.) are explicitly rejected
+- Implicit references (e.g. “Cupertino-based company”) are rejected
+The system prevents **accidental corpus mixing**, not intentional multi-entity analysis.
 ---
+## What the system guarantees (by construction)
+- No hallucination outside the provided documents
+- No answers for unsupported entities
+- No numeric invention (values must appear in source pages)
+- Clear refusal when data is missing or ambiguous
+---
+## Technical Stack
+- **Entity Routing:** GLiNER (CPU)
+- **Dense Retrieval:** gte-Qwen2-7B
+- **Reranking:** BAAI/bge-reranker-v2-m3
+- **Vision Reasoning:** Qwen2-VL-2B
+- **UI:** Gradio (Hugging Face Spaces, ZeroGPU)
+**Design choice:**
+Embeddings are recomputed on-the-fly to keep the system **stateless and fully inspectable**.
+---
+## Demo
+### ▶ Live Demo (Recommended)
+Deployed on **Hugging Face Spaces** using ZeroGPU.
+Link provided on CV.
+### 💻 Local Execution
+Requires a GPU with ~24 GB VRAM.
+```bash
+git clone https://huggingface.co/spaces/YOUR_USERNAME/YOUR_SPACE_NAME
+pip install -r requirements.txt
+python app.py
+## Known Limitations & Trade-offs
+### 1. Entity detection ≠ entity role understanding
+The router detects *which* entities are mentioned, not *how* they are used.
+Example:
+> “According to Apple documents, what is Microsoft’s revenue?”
+This may succeed because the system prioritizes answering a factual question over enforcing adversarial source constraints.
+**Production fix:** dependency parsing or semantic role labeling.
+---
+### 2. Explicit multi-entity reasoning is allowed
+Queries that intentionally involve multiple companies may produce cross-company aggregation.
+This is a **design choice**, not a bug.
+---
+### 3. Stateless retrieval
+No vector database is used.
+**Trade-off:**
+- Higher latency
+- Maximum transparency
+---
+## What this project demonstrates
+- Ability to design constrained RAG pipelines
+- Understanding of LLM failure modes
+- Practical multimodal reasoning on structured financial data
+- Clear separation between routing logic, retrieval, prompting, and UI
+- Engineering honesty about system guarantees vs limitations
+---
+## Context
+Built as part of a **Master 1 in Artificial Intelligence**.
+Learning-focused portfolio project — **not** a production financial advisory system.