| # FinEE v2.0 - Upgrade Roadmap | |
| ## From Extraction Engine to Intelligent Financial Agent | |
| ### Current vs Target Comparison | |
| | Dimension | Current State | Target State | Priority | | |
| |-----------|--------------|--------------|----------| | |
| | **Base Model** | Phi-3 Mini (3.8B) | Llama 3.1 8B / Qwen2.5 7B | P0 | | |
| | **Training Data** | 456 samples | 100K+ distilled samples | P0 | | |
| | **Output Format** | Token extraction | Instruction-following JSON | P0 | | |
| | **Context** | None | RAG + Knowledge Graph | P1 | | |
| | **Interaction** | Single-turn | Multi-turn agent | P1 | | |
| | **Input Types** | Email only | SMS + Email + PDF + Images | P1 | | |
| | **Accuracy** | ~70% (estimated) | 95%+ (measured) | P0 | | |
| --- | |
| ## Phase 1: Foundation (Week 1-2) | |
| ### 1.1 Model Upgrade | |
| - [ ] Download Llama 3.1 8B Instruct | |
| - [ ] Download Qwen2.5 7B Instruct (backup) | |
| - [ ] Benchmark both on finance extraction task | |
| - [ ] Set up quantization pipeline (4-bit, 8-bit) | |
| ### 1.2 Training Data Expansion | |
| - [ ] Generate 100K synthetic samples (DONE β ) | |
| - [ ] Distill from GPT-4/Claude for complex cases | |
| - [ ] Add real data from user (2,419 SMS samples β ) | |
| - [ ] Create validation set (10K samples) | |
| - [ ] Create test set (5K unseen samples) | |
| ### 1.3 Instruction Format | |
| ```json | |
| { | |
| "system": "You are a financial entity extractor...", | |
| "instruction": "Extract entities from this message", | |
| "input": "<bank SMS or email>", | |
| "output": { | |
| "amount": 2500.00, | |
| "type": "debit", | |
| "merchant": "Swiggy", | |
| "category": "food", | |
| "date": "2026-01-12", | |
| "reference": "123456789012" | |
| } | |
| } | |
| ``` | |
| --- | |
| ## Phase 2: Multi-Modal Support (Week 3-4) | |
| ### 2.1 Input Types | |
| - [ ] SMS Parser (DONE β ) | |
| - [ ] Email Parser (DONE β ) | |
| - [ ] PDF Statement Parser | |
| - Use `pdfplumber` for text extraction | |
| - Table detection with `camelot` | |
| - OCR fallback with `pytesseract` | |
| - [ ] Image/Screenshot Parser | |
| - OCR with `EasyOCR` or `PaddleOCR` | |
| - Vision model for structured extraction | |
| ### 2.2 Bank Statement Processing | |
| ``` | |
| PDF Input β Text Extraction β Table Detection β | |
| Row Parsing β Entity Extraction β Transaction List | |
| ``` | |
| ### 2.3 Image Processing Pipeline | |
| ``` | |
| Image β OCR β Text Blocks β Layout Analysis β | |
| Entity Extraction β Structured Output | |
| ``` | |
| --- | |
| ## Phase 3: RAG + Knowledge Graph (Week 5-6) | |
| ### 3.1 Knowledge Base | |
| - Merchant database (10K+ Indian merchants) | |
| - Bank template patterns | |
| - Category taxonomy | |
| - UPI VPA mappings | |
| ### 3.2 RAG Architecture | |
| ``` | |
| Query β Retrieve Similar Transactions β | |
| Augment Context β Generate Extraction | |
| ``` | |
| ### 3.3 Knowledge Graph | |
| ``` | |
| [Merchant: Swiggy] --is_a--> [Category: Food Delivery] | |
| --accepts--> [Payment: UPI, Card] | |
| --typical_amount--> [Range: 100-2000] | |
| ``` | |
| ### 3.4 Vector Store | |
| - Use Qdrant/ChromaDB for transaction embeddings | |
| - Enable semantic search for similar transactions | |
| - Support for "transactions like this" queries | |
| --- | |
| ## Phase 4: Multi-Turn Agent (Week 7-8) | |
| ### 4.1 Agent Capabilities | |
| ```python | |
| class FinancialAgent: | |
| def extract_entities(self, message) -> dict | |
| def categorize_spending(self, transactions) -> dict | |
| def detect_anomalies(self, transactions) -> list | |
| def generate_report(self, period) -> str | |
| def answer_question(self, question, context) -> str | |
| ``` | |
| ### 4.2 Conversation Flow | |
| ``` | |
| User: "How much did I spend on food last month?" | |
| Agent: [Retrieves transactions] β [Filters by category] β | |
| [Aggregates amounts] β "You spent βΉ12,450 on food" | |
| User: "Compare with previous month" | |
| Agent: [Uses conversation context] β [Retrieves both months] β | |
| "December: βΉ12,450, November: βΉ9,800 (+27%)" | |
| ``` | |
| ### 4.3 Tool Use | |
| - Calculator for aggregations | |
| - Date parser for time queries | |
| - Budget tracker integration | |
| - Export to CSV/Excel | |
| --- | |
| ## Phase 5: Production Deployment (Week 9-10) | |
| ### 5.1 Model Optimization | |
| - [ ] GGUF quantization for llama.cpp | |
| - [ ] ONNX export for faster inference | |
| - [ ] vLLM for batch processing | |
| - [ ] MLX optimization for Apple Silicon | |
| ### 5.2 API Design | |
| ```python | |
| # FastAPI endpoints | |
| POST /extract # Single message extraction | |
| POST /extract/batch # Batch extraction | |
| POST /parse/pdf # PDF statement parsing | |
| POST /parse/image # Image OCR + extraction | |
| POST /chat # Multi-turn agent | |
| GET /analytics # Spending analytics | |
| ``` | |
| ### 5.3 Deployment Options | |
| - Docker container | |
| - Hugging Face Spaces (demo) | |
| - Modal/Replicate (serverless) | |
| - Self-hosted with vLLM | |
| --- | |
| ## Technical Architecture | |
| ``` | |
| βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | |
| β FinEE v2.0 Agent β | |
| βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€ | |
| β ββββββββββββ ββββββββββββ ββββββββββββ ββββββββββββ β | |
| β β SMS β β Email β β PDF β β Image β β | |
| β β Parser β β Parser β β Parser β β OCR β β | |
| β ββββββ¬ββββββ ββββββ¬ββββββ ββββββ¬ββββββ ββββββ¬ββββββ β | |
| β β β β β β | |
| β βββββββββββββββ΄βββββββ¬βββββββ΄ββββββββββββββ β | |
| β βΌ β | |
| β ββββββββββββββββββ β | |
| β β Preprocessor β β | |
| β ββββββββββ¬ββββββββ β | |
| β βΌ β | |
| β βββββββββββββββββββββββββββββββββββββββββββββββββββββββ β | |
| β β RAG Pipeline β β | |
| β β βββββββββββ βββββββββββββββ ββββββββββββββββ β β | |
| β β β Vector β β Knowledge β β Merchant β β β | |
| β β β Store β β Graph β β Database β β β | |
| β β ββββββ¬βββββ ββββββββ¬βββββββ βββββββββ¬βββββββ β β | |
| β β βββββββββββββββββΌβββββββββββββββββββ β β | |
| β βββββββββββββββββββββββββΌββββββββββββββββββββββββββββββ β | |
| β βΌ β | |
| β βββββββββββββββββββββββββ β | |
| β β Llama 3.1 8B / Qwen β β | |
| β β Instruction-Tuned β β | |
| β βββββββββββββ¬ββββββββββββ β | |
| β βΌ β | |
| β βββββββββββββββββββββββββ β | |
| β β JSON Output β β | |
| β β + Confidence Score β β | |
| β βββββββββββββββββββββββββ β | |
| βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | |
| ``` | |
| --- | |
| ## Model Selection Analysis | |
| | Model | Size | Speed | Quality | License | Choice | | |
| |-------|------|-------|---------|---------|--------| | |
| | Llama 3.1 8B | 8B | Fast | Excellent | Meta | β Primary | | |
| | Qwen2.5 7B | 7B | Fast | Excellent | Apache | β Backup | | |
| | Mistral 7B | 7B | Fast | Good | Apache | Alternative | | |
| | Phi-3 Medium | 14B | Medium | Excellent | MIT | Future | | |
| ### Why Llama 3.1 8B? | |
| 1. **Instruction following** - Best in class for its size | |
| 2. **Structured output** - Reliable JSON generation | |
| 3. **Context length** - 128K tokens (future RAG) | |
| 4. **Quantization** - Excellent 4-bit performance | |
| 5. **Ecosystem** - Wide support (vLLM, llama.cpp, MLX) | |
| --- | |
| ## Training Strategy | |
| ### Stage 1: Supervised Fine-tuning (SFT) | |
| ``` | |
| Base: Llama 3.1 8B Instruct | |
| Data: 100K synthetic + 2.4K real | |
| Method: LoRA (rank=16, alpha=32) | |
| Epochs: 3 | |
| ``` | |
| ### Stage 2: DPO (Direct Preference Optimization) | |
| ``` | |
| Create preference pairs: | |
| - Chosen: Correct extraction with confidence | |
| - Rejected: Partial/incorrect extraction | |
| Objective: Improve extraction precision | |
| ``` | |
| ### Stage 3: RLHF (Optional) | |
| ``` | |
| Reward model based on: | |
| - JSON validity | |
| - Field accuracy | |
| - Merchant identification | |
| - Category correctness | |
| ``` | |
| --- | |
| ## Metrics & Benchmarks | |
| ### Extraction Accuracy | |
| - **Amount**: Target 99%+ | |
| - **Type (debit/credit)**: Target 98%+ | |
| - **Merchant**: Target 90%+ | |
| - **Category**: Target 85%+ | |
| - **Reference**: Target 95%+ | |
| ### System Metrics | |
| - Latency: <100ms per extraction | |
| - Throughput: >100 msgs/sec | |
| - Memory: <8GB (quantized) | |
| --- | |
| ## Next Steps (Immediate) | |
| 1. [ ] Download Llama 3.1 8B Instruct | |
| 2. [ ] Create instruction-format training data | |
| 3. [ ] Set up LoRA fine-tuning pipeline | |
| 4. [ ] Run first training experiment | |
| 5. [ ] Benchmark against current Phi-3 model | |