Spaces:
Sleeping
Sleeping
A newer version of the Streamlit SDK is available:
1.52.2
metadata
title: Financial QA β RAG vs FT
emoji: π
colorFrom: indigo
colorTo: blue
sdk: streamlit
sdk_version: 1.48.1
app_file: app/app.py
pinned: false
license: mit
Financial QA System: RAG vs Fine-Tuning
This project implements and compares two approaches for answering questions about Allstate's financial reports:
- Retrieval-Augmented Generation (RAG): Combines hybrid document retrieval with generative language models
- Fine-Tuned Language Model (FT): Direct fine-tuning of a small language model on financial Q&A
Quick Start
Model Files
Due to file size limitations, model files are not included in this repository. To use the system:
- The fine-tuned model is hosted on Hugging Face Hub
- The application automatically loads the model directly from Hugging Face when running
- If you want to download the model locally, you can use the Hugging Face CLI:
pip install huggingface_hub python -c "from huggingface_hub import snapshot_download; snapshot_download('jayyd/financial-qa-model', local_dir='models/fine_tuned_model')"
Key Features
RAG System
- Hybrid Retrieval:
- Dense retrieval using Sentence Transformers (all-MiniLM-L6-v2)
- Sparse retrieval using BM25
- Score fusion for optimal chunk selection
- Context-Aware Generation:
- Prompts engineered for financial accuracy
- Dynamic context window management
- Multi-chunk answer synthesis
Fine-Tuned Model
- Base Model: DistilGPT2 (small, efficient)
- Training Data: 30+ carefully curated financial Q&A pairs
- Optimization: Parameter-efficient fine-tuning
Guardrails
- Input Validation:
- Financial keyword detection
- Query complexity analysis
- Minimum length requirements
- Output Validation:
- Confidence scoring
- Hallucination detection
- Answer quality metrics
Evaluation Framework
- Response time tracking
- Confidence scoring
- Chunk relevance metrics
- Answer quality assessment
π§ Setup Instructions (Run Locally)
1. Clone the Repository
git clone <your-repo-url>
cd financial_qa_rag_ft
2. Create Virtual Environment (optional)
python -m venv venv
source venv/bin/activate # or venv\Scripts\activate on Windows
3. Install Dependencies
pip install -r requirements.txt
4. Download Financial Reports
python utils/download_reports.py
5. Extract and Clean Text
Run: notebooks/01_data_preprocessing.ipynb
6. Generate QA Pairs Automatically
python utils/generate_qa_pairs.py
7. Train Fine-Tuned Model (Optional)
Run: notebooks/03_fine_tuning.ipynb
8. Run Streamlit App
streamlit run app/app.py
The Streamlit app will automatically load the fine-tuned model from Hugging Face Hub (jayyd/financial-qa-model) when running in "Fine-Tuned" mode. No local model files are needed.
Project Structure
financial_qa_rag_ft/
βββ app/
β βββ app.py # Streamlit web interface with real-time metrics
βββ data/
β βββ processed/ # Cleaned and segmented text files
β β βββ Allstate_2022_10K.txt
β β βββ Allstate_2023_10K.txt
β βββ raw/ # Original financial reports
β βββ Allstate_2022_10K.pdf
β βββ Allstate_2023_10K.pdf
βββ models/
β βββ fine_tuned_model/ # DistilGPT2 fine-tuned on financial QA
β βββ rag_model/ # Saved embeddings and retrieval indices
βββ notebooks/
β βββ 01_data_preprocessing.ipynb # PDF parsing and text cleaning
β βββ 02_rag_pipeline.ipynb # RAG implementation and testing
β βββ 03_fine_tuning.ipynb # Model fine-tuning process
β βββ 04_evaluation.ipynb # Individual model evaluation
β βββ 05_evaluation_comparison.ipynb # Comparative analysis
βββ qa_pairs/
β βββ qa_dataset.json # Curated financial QA pairs
βββ utils/
β βββ chunking.py # Smart text segmentation
β βββ data_preprocessing.py # PDF processing pipeline
β βββ evaluation.py # Comprehensive metrics
β βββ fine_tuning.py # Training utilities
β βββ generator.py # Answer generation logic
β βββ guardrails.py # Input/output validation
β βββ retriever.py # Hybrid search implementation
βββ requirements.txt # Project dependencies
βββ README.md # Project documentation
Performance Comparison
RAG System
- Strengths:
- Higher factual accuracy
- Better source traceability
- More robust to unseen questions
- Metrics:
- Average response time: ~0.5s
- Typical confidence: 0.8-0.95
- Strong chunk relevance scores
Fine-tuned Model
- Strengths:
- Faster inference
- More natural language
- Consistent response style
- Metrics:
- Average response time: ~0.4s
- Typical confidence: 0.75-0.9
- Good performance on seen patterns
Example Questions
# High-confidence questions
"What was Allstate's total revenue in 2023?"
"How much was the net loss in 2023?"
"What were the total assets in 2022?"
# Complex analytical questions
"How did revenue change from 2022 to 2023?"
"What factors affected profitability in 2023?"
"Compare the investment portfolio returns between 2022 and 2023"
Technical Requirements
- Python 3.8+
- PyTorch 2.0+
- Transformers 4.31+
- Streamlit 1.24+
- Sentence-Transformers 2.2+
- See requirements.txt for full list
License
This project is for academic/educational use only. Financial data sourced from Allstate's public reports.
Acknowledgments
- Built using Hugging Face Transformers
- Financial data from Allstate's 10-K reports
- Streamlit for the web interface