finance-entity-extractor / docs /PRODUCTION_READINESS.md
Ranjit0034's picture
Upload docs/PRODUCTION_READINESS.md with huggingface_hub
38c7018 verified
# FinEE Production Readiness Report
## Current Status vs Production Target
| Aspect | Current Status | Target | Gap |
|--------|---------------|--------|-----|
| **Training Data** | 137,267 samples | 50,000+ | ✅ **EXCEEDED** |
| **Bank Coverage** | 8 banks | 15+ banks | ⚠️ Need 7 more |
| **Document Types** | Email, SMS | Email, PDF, SMS, Images | ⚠️ PDF/Image parsers added |
| **Evaluation** | F1=56.8% (regex) | F1 > 95% | ❌ LLM fine-tuning needed |
| **Deployment** | Mac (MLX) | Cloud + Mobile + Edge | ⚠️ Export scripts ready |
| **Users** | 0 external | 10+ active | ❌ Beta testing needed |
## Benchmark Results
### Regex Extractor (Baseline)
| Field | Accuracy | Status |
|-------|----------|--------|
| Amount | 85.8% | ✅ Good |
| Type | 65.0% | ⚠️ Needs improvement |
| Bank | 100% | ✅ Excellent |
| Merchant | 28.3% | ❌ LLM needed |
| Category | 15.8% | ❌ LLM needed |
| **Overall** | **56.8%** | ❌ Below target |
### Expected with LLM Fine-tuning
| Field | With LLM | Target |
|-------|----------|--------|
| Amount | ~98% | 99% |
| Type | ~97% | 98% |
| Bank | 100% | 100% |
| Merchant | ~92% | 95% |
| Category | ~88% | 90% |
| **Overall** | **~95%** | 95% |
## Priority Actions
### High Priority (This Week)
1. **Fine-tune LLM on 137K dataset**
- Use `scripts/finetune.py` with MLX or PyTorch
- Target: Phi-3 or Llama 3.1 8B
- Expected improvement: +40% F1
2. **Add remaining banks**
- BOB, Canara, Union, IDBI, Federal, South Indian, Karur Vysya
- Update `scripts/data_pipeline/generate_synthetic.py`
3. **Test PDF parsing**
- Collect sample bank statements
- Test with `src/finee/pdf_parser.py`
### Medium Priority (This Month)
4. **Export to ONNX**
- Run `scripts/export_model.py --format onnx`
- Test inference speed
5. **Deploy to HF Inference**
- Push model to Hugging Face
- Enable Inference API
6. **Get beta users**
- Share demo: https://huggingface.co/spaces/Ranjit0034/finee-demo
- Collect feedback
### Low Priority (Next Month)
7. **Mobile deployment (GGUF)**
8. **Multi-turn agent**
9. **Knowledge graph integration**
## Files Added
| File | Description |
|------|-------------|
| `src/finee/rag.py` | RAG engine with 50+ merchants |
| `src/finee/api.py` | FastAPI backend (8 endpoints) |
| `src/finee/ui.py` | Gradio web interface |
| `src/finee/pdf_parser.py` | PDF/Image parsing |
| `scripts/benchmark.py` | Production benchmark suite |
| `scripts/export_model.py` | ONNX/GGUF/CoreML export |
| `tests/test_rag.py` | 33 comprehensive tests |
## Commands
```bash
# Run benchmark
python scripts/benchmark.py --test-file data/instruction/test.jsonl --max-samples 1000
# Fine-tune LLM
python scripts/finetune.py --backend mlx --model microsoft/phi-3-mini-4k-instruct
# Export to ONNX
python scripts/export_model.py models/finetuned --format onnx
# Start API server
python -m finee.api --port 8000
# Launch Gradio UI
python -m finee.ui --port 7860
```
## Links
- **Demo**: https://huggingface.co/spaces/Ranjit0034/finee-demo
- **Dataset**: https://huggingface.co/datasets/Ranjit0034/finee-dataset
- **Model**: https://huggingface.co/Ranjit0034/finee-phi3-4b
- **Code**: https://huggingface.co/Ranjit0034/finance-entity-extractor
---
*Last updated: 2026-01-14*