From Extraction Engine to Intelligent Financial Agent
Current vs Target Comparison
Dimension
Current State
Target State
Priority
Base Model
Phi-3 Mini (3.8B)
Llama 3.1 8B / Qwen2.5 7B
P0
Training Data
456 samples
100K+ distilled samples
P0
Output Format
Token extraction
Instruction-following JSON
P0
Context
None
RAG + Knowledge Graph
P1
Interaction
Single-turn
Multi-turn agent
P1
Input Types
Email only
SMS + Email + PDF + Images
P1
Accuracy
~70% (estimated)
95%+ (measured)
P0
Phase 1: Foundation (Week 1-2)
1.1 Model Upgrade
Download Llama 3.1 8B Instruct
Download Qwen2.5 7B Instruct (backup)
Benchmark both on finance extraction task
Set up quantization pipeline (4-bit, 8-bit)
1.2 Training Data Expansion
Generate 100K synthetic samples (DONE β )
Distill from GPT-4/Claude for complex cases
Add real data from user (2,419 SMS samples β )
Create validation set (10K samples)
Create test set (5K unseen samples)
1.3 Instruction Format
{"system":"You are a financial entity extractor...","instruction":"Extract entities from this message","input":"<bank SMS or email>","output":{"amount":2500.00,"type":"debit","merchant":"Swiggy","category":"food","date":"2026-01-12","reference":"123456789012"}}
Phase 2: Multi-Modal Support (Week 3-4)
2.1 Input Types
SMS Parser (DONE β )
Email Parser (DONE β )
PDF Statement Parser
Use pdfplumber for text extraction
Table detection with camelot
OCR fallback with pytesseract
Image/Screenshot Parser
OCR with EasyOCR or PaddleOCR
Vision model for structured extraction
2.2 Bank Statement Processing
PDF Input β Text Extraction β Table Detection β
Row Parsing β Entity Extraction β Transaction List
User: "How much did I spend on food last month?"
Agent: [Retrieves transactions] β [Filters by category] β
[Aggregates amounts] β "You spent βΉ12,450 on food"
User: "Compare with previous month"
Agent: [Uses conversation context] β [Retrieves both months] β
"December: βΉ12,450, November: βΉ9,800 (+27%)"
4.3 Tool Use
Calculator for aggregations
Date parser for time queries
Budget tracker integration
Export to CSV/Excel
Phase 5: Production Deployment (Week 9-10)
5.1 Model Optimization
GGUF quantization for llama.cpp
ONNX export for faster inference
vLLM for batch processing
MLX optimization for Apple Silicon
5.2 API Design
# FastAPI endpoints
POST /extract # Single message extraction
POST /extract/batch # Batch extraction
POST /parse/pdf # PDF statement parsing
POST /parse/image # Image OCR + extraction
POST /chat # Multi-turn agent
GET /analytics # Spending analytics