Upload docs/UPGRADE_ROADMAP.md with huggingface_hub
Browse files- docs/UPGRADE_ROADMAP.md +269 -0
docs/UPGRADE_ROADMAP.md
ADDED
|
@@ -0,0 +1,269 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# FinEE v2.0 - Upgrade Roadmap
|
| 2 |
+
## From Extraction Engine to Intelligent Financial Agent
|
| 3 |
+
|
| 4 |
+
### Current vs Target Comparison
|
| 5 |
+
|
| 6 |
+
| Dimension | Current State | Target State | Priority |
|
| 7 |
+
|-----------|--------------|--------------|----------|
|
| 8 |
+
| **Base Model** | Phi-3 Mini (3.8B) | Llama 3.1 8B / Qwen2.5 7B | P0 |
|
| 9 |
+
| **Training Data** | 456 samples | 100K+ distilled samples | P0 |
|
| 10 |
+
| **Output Format** | Token extraction | Instruction-following JSON | P0 |
|
| 11 |
+
| **Context** | None | RAG + Knowledge Graph | P1 |
|
| 12 |
+
| **Interaction** | Single-turn | Multi-turn agent | P1 |
|
| 13 |
+
| **Input Types** | Email only | SMS + Email + PDF + Images | P1 |
|
| 14 |
+
| **Accuracy** | ~70% (estimated) | 95%+ (measured) | P0 |
|
| 15 |
+
|
| 16 |
+
---
|
| 17 |
+
|
| 18 |
+
## Phase 1: Foundation (Week 1-2)
|
| 19 |
+
### 1.1 Model Upgrade
|
| 20 |
+
- [ ] Download Llama 3.1 8B Instruct
|
| 21 |
+
- [ ] Download Qwen2.5 7B Instruct (backup)
|
| 22 |
+
- [ ] Benchmark both on finance extraction task
|
| 23 |
+
- [ ] Set up quantization pipeline (4-bit, 8-bit)
|
| 24 |
+
|
| 25 |
+
### 1.2 Training Data Expansion
|
| 26 |
+
- [ ] Generate 100K synthetic samples (DONE β
)
|
| 27 |
+
- [ ] Distill from GPT-4/Claude for complex cases
|
| 28 |
+
- [ ] Add real data from user (2,419 SMS samples β
)
|
| 29 |
+
- [ ] Create validation set (10K samples)
|
| 30 |
+
- [ ] Create test set (5K unseen samples)
|
| 31 |
+
|
| 32 |
+
### 1.3 Instruction Format
|
| 33 |
+
```json
|
| 34 |
+
{
|
| 35 |
+
"system": "You are a financial entity extractor...",
|
| 36 |
+
"instruction": "Extract entities from this message",
|
| 37 |
+
"input": "<bank SMS or email>",
|
| 38 |
+
"output": {
|
| 39 |
+
"amount": 2500.00,
|
| 40 |
+
"type": "debit",
|
| 41 |
+
"merchant": "Swiggy",
|
| 42 |
+
"category": "food",
|
| 43 |
+
"date": "2026-01-12",
|
| 44 |
+
"reference": "123456789012"
|
| 45 |
+
}
|
| 46 |
+
}
|
| 47 |
+
```
|
| 48 |
+
|
| 49 |
+
---
|
| 50 |
+
|
| 51 |
+
## Phase 2: Multi-Modal Support (Week 3-4)
|
| 52 |
+
### 2.1 Input Types
|
| 53 |
+
- [ ] SMS Parser (DONE β
)
|
| 54 |
+
- [ ] Email Parser (DONE β
)
|
| 55 |
+
- [ ] PDF Statement Parser
|
| 56 |
+
- Use `pdfplumber` for text extraction
|
| 57 |
+
- Table detection with `camelot`
|
| 58 |
+
- OCR fallback with `pytesseract`
|
| 59 |
+
- [ ] Image/Screenshot Parser
|
| 60 |
+
- OCR with `EasyOCR` or `PaddleOCR`
|
| 61 |
+
- Vision model for structured extraction
|
| 62 |
+
|
| 63 |
+
### 2.2 Bank Statement Processing
|
| 64 |
+
```
|
| 65 |
+
PDF Input β Text Extraction β Table Detection β
|
| 66 |
+
Row Parsing β Entity Extraction β Transaction List
|
| 67 |
+
```
|
| 68 |
+
|
| 69 |
+
### 2.3 Image Processing Pipeline
|
| 70 |
+
```
|
| 71 |
+
Image β OCR β Text Blocks β Layout Analysis β
|
| 72 |
+
Entity Extraction β Structured Output
|
| 73 |
+
```
|
| 74 |
+
|
| 75 |
+
---
|
| 76 |
+
|
| 77 |
+
## Phase 3: RAG + Knowledge Graph (Week 5-6)
|
| 78 |
+
### 3.1 Knowledge Base
|
| 79 |
+
- Merchant database (10K+ Indian merchants)
|
| 80 |
+
- Bank template patterns
|
| 81 |
+
- Category taxonomy
|
| 82 |
+
- UPI VPA mappings
|
| 83 |
+
|
| 84 |
+
### 3.2 RAG Architecture
|
| 85 |
+
```
|
| 86 |
+
Query β Retrieve Similar Transactions β
|
| 87 |
+
Augment Context β Generate Extraction
|
| 88 |
+
```
|
| 89 |
+
|
| 90 |
+
### 3.3 Knowledge Graph
|
| 91 |
+
```
|
| 92 |
+
[Merchant: Swiggy] --is_a--> [Category: Food Delivery]
|
| 93 |
+
--accepts--> [Payment: UPI, Card]
|
| 94 |
+
--typical_amount--> [Range: 100-2000]
|
| 95 |
+
```
|
| 96 |
+
|
| 97 |
+
### 3.4 Vector Store
|
| 98 |
+
- Use Qdrant/ChromaDB for transaction embeddings
|
| 99 |
+
- Enable semantic search for similar transactions
|
| 100 |
+
- Support for "transactions like this" queries
|
| 101 |
+
|
| 102 |
+
---
|
| 103 |
+
|
| 104 |
+
## Phase 4: Multi-Turn Agent (Week 7-8)
|
| 105 |
+
### 4.1 Agent Capabilities
|
| 106 |
+
```python
|
| 107 |
+
class FinancialAgent:
|
| 108 |
+
def extract_entities(self, message) -> dict
|
| 109 |
+
def categorize_spending(self, transactions) -> dict
|
| 110 |
+
def detect_anomalies(self, transactions) -> list
|
| 111 |
+
def generate_report(self, period) -> str
|
| 112 |
+
def answer_question(self, question, context) -> str
|
| 113 |
+
```
|
| 114 |
+
|
| 115 |
+
### 4.2 Conversation Flow
|
| 116 |
+
```
|
| 117 |
+
User: "How much did I spend on food last month?"
|
| 118 |
+
Agent: [Retrieves transactions] β [Filters by category] β
|
| 119 |
+
[Aggregates amounts] β "You spent βΉ12,450 on food"
|
| 120 |
+
|
| 121 |
+
User: "Compare with previous month"
|
| 122 |
+
Agent: [Uses conversation context] β [Retrieves both months] β
|
| 123 |
+
"December: βΉ12,450, November: βΉ9,800 (+27%)"
|
| 124 |
+
```
|
| 125 |
+
|
| 126 |
+
### 4.3 Tool Use
|
| 127 |
+
- Calculator for aggregations
|
| 128 |
+
- Date parser for time queries
|
| 129 |
+
- Budget tracker integration
|
| 130 |
+
- Export to CSV/Excel
|
| 131 |
+
|
| 132 |
+
---
|
| 133 |
+
|
| 134 |
+
## Phase 5: Production Deployment (Week 9-10)
|
| 135 |
+
### 5.1 Model Optimization
|
| 136 |
+
- [ ] GGUF quantization for llama.cpp
|
| 137 |
+
- [ ] ONNX export for faster inference
|
| 138 |
+
- [ ] vLLM for batch processing
|
| 139 |
+
- [ ] MLX optimization for Apple Silicon
|
| 140 |
+
|
| 141 |
+
### 5.2 API Design
|
| 142 |
+
```python
|
| 143 |
+
# FastAPI endpoints
|
| 144 |
+
POST /extract # Single message extraction
|
| 145 |
+
POST /extract/batch # Batch extraction
|
| 146 |
+
POST /parse/pdf # PDF statement parsing
|
| 147 |
+
POST /parse/image # Image OCR + extraction
|
| 148 |
+
POST /chat # Multi-turn agent
|
| 149 |
+
GET /analytics # Spending analytics
|
| 150 |
+
```
|
| 151 |
+
|
| 152 |
+
### 5.3 Deployment Options
|
| 153 |
+
- Docker container
|
| 154 |
+
- Hugging Face Spaces (demo)
|
| 155 |
+
- Modal/Replicate (serverless)
|
| 156 |
+
- Self-hosted with vLLM
|
| 157 |
+
|
| 158 |
+
---
|
| 159 |
+
|
| 160 |
+
## Technical Architecture
|
| 161 |
+
|
| 162 |
+
```
|
| 163 |
+
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
|
| 164 |
+
β FinEE v2.0 Agent β
|
| 165 |
+
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
|
| 166 |
+
β βοΏ½οΏ½οΏ½ββββββββββ ββββββββββββ ββββββββββββ ββββββββββββ β
|
| 167 |
+
β β SMS β β Email β β PDF β β Image β β
|
| 168 |
+
β β Parser β β Parser β β Parser β β OCR β β
|
| 169 |
+
β ββββββ¬ββββββ ββββββ¬ββββββ ββββββ¬ββββββ ββββββ¬ββββββ β
|
| 170 |
+
β β β β β β
|
| 171 |
+
β βββββββββββββββ΄βββββββ¬βββββββ΄ββββββββββββββ β
|
| 172 |
+
β βΌ β
|
| 173 |
+
β ββββββββββββββββββ β
|
| 174 |
+
β β Preprocessor β β
|
| 175 |
+
β ββββββββββ¬ββββββββ β
|
| 176 |
+
β βΌ β
|
| 177 |
+
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
|
| 178 |
+
β β RAG Pipeline β β
|
| 179 |
+
β β βββββββββββ βββββββββββββββ ββββββββββββββββ β β
|
| 180 |
+
β β β Vector β β Knowledge β β Merchant β β β
|
| 181 |
+
β β β Store β β Graph β β Database β β β
|
| 182 |
+
β β ββββββ¬βββββ ββββββββ¬βββββββ βββββββββ¬βββββββ β β
|
| 183 |
+
β β βββββββββββββββββΌβββββββββββββββββββ β β
|
| 184 |
+
β βββββββββββββββββββββββββΌββββββββββββββββββββββββββββββ β
|
| 185 |
+
β βΌ β
|
| 186 |
+
β βββββββββββββββββββββββββ β
|
| 187 |
+
β β Llama 3.1 8B / Qwen β β
|
| 188 |
+
β β Instruction-Tuned β β
|
| 189 |
+
β βββββββββββββ¬ββββββββββββ β
|
| 190 |
+
β βΌ β
|
| 191 |
+
β βββββββββββββββββββββββββ β
|
| 192 |
+
β β JSON Output β β
|
| 193 |
+
β β + Confidence Score β β
|
| 194 |
+
β βββββββββββββββββββββββββ β
|
| 195 |
+
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
|
| 196 |
+
```
|
| 197 |
+
|
| 198 |
+
---
|
| 199 |
+
|
| 200 |
+
## Model Selection Analysis
|
| 201 |
+
|
| 202 |
+
| Model | Size | Speed | Quality | License | Choice |
|
| 203 |
+
|-------|------|-------|---------|---------|--------|
|
| 204 |
+
| Llama 3.1 8B | 8B | Fast | Excellent | Meta | β Primary |
|
| 205 |
+
| Qwen2.5 7B | 7B | Fast | Excellent | Apache | β Backup |
|
| 206 |
+
| Mistral 7B | 7B | Fast | Good | Apache | Alternative |
|
| 207 |
+
| Phi-3 Medium | 14B | Medium | Excellent | MIT | Future |
|
| 208 |
+
|
| 209 |
+
### Why Llama 3.1 8B?
|
| 210 |
+
1. **Instruction following** - Best in class for its size
|
| 211 |
+
2. **Structured output** - Reliable JSON generation
|
| 212 |
+
3. **Context length** - 128K tokens (future RAG)
|
| 213 |
+
4. **Quantization** - Excellent 4-bit performance
|
| 214 |
+
5. **Ecosystem** - Wide support (vLLM, llama.cpp, MLX)
|
| 215 |
+
|
| 216 |
+
---
|
| 217 |
+
|
| 218 |
+
## Training Strategy
|
| 219 |
+
|
| 220 |
+
### Stage 1: Supervised Fine-tuning (SFT)
|
| 221 |
+
```
|
| 222 |
+
Base: Llama 3.1 8B Instruct
|
| 223 |
+
Data: 100K synthetic + 2.4K real
|
| 224 |
+
Method: LoRA (rank=16, alpha=32)
|
| 225 |
+
Epochs: 3
|
| 226 |
+
```
|
| 227 |
+
|
| 228 |
+
### Stage 2: DPO (Direct Preference Optimization)
|
| 229 |
+
```
|
| 230 |
+
Create preference pairs:
|
| 231 |
+
- Chosen: Correct extraction with confidence
|
| 232 |
+
- Rejected: Partial/incorrect extraction
|
| 233 |
+
Objective: Improve extraction precision
|
| 234 |
+
```
|
| 235 |
+
|
| 236 |
+
### Stage 3: RLHF (Optional)
|
| 237 |
+
```
|
| 238 |
+
Reward model based on:
|
| 239 |
+
- JSON validity
|
| 240 |
+
- Field accuracy
|
| 241 |
+
- Merchant identification
|
| 242 |
+
- Category correctness
|
| 243 |
+
```
|
| 244 |
+
|
| 245 |
+
---
|
| 246 |
+
|
| 247 |
+
## Metrics & Benchmarks
|
| 248 |
+
|
| 249 |
+
### Extraction Accuracy
|
| 250 |
+
- **Amount**: Target 99%+
|
| 251 |
+
- **Type (debit/credit)**: Target 98%+
|
| 252 |
+
- **Merchant**: Target 90%+
|
| 253 |
+
- **Category**: Target 85%+
|
| 254 |
+
- **Reference**: Target 95%+
|
| 255 |
+
|
| 256 |
+
### System Metrics
|
| 257 |
+
- Latency: <100ms per extraction
|
| 258 |
+
- Throughput: >100 msgs/sec
|
| 259 |
+
- Memory: <8GB (quantized)
|
| 260 |
+
|
| 261 |
+
---
|
| 262 |
+
|
| 263 |
+
## Next Steps (Immediate)
|
| 264 |
+
|
| 265 |
+
1. [ ] Download Llama 3.1 8B Instruct
|
| 266 |
+
2. [ ] Create instruction-format training data
|
| 267 |
+
3. [ ] Set up LoRA fine-tuning pipeline
|
| 268 |
+
4. [ ] Run first training experiment
|
| 269 |
+
5. [ ] Benchmark against current Phi-3 model
|