--- title: Financial QA — RAG vs FT emoji: 📊 colorFrom: indigo colorTo: blue sdk: streamlit sdk_version: 1.48.1 # python_version: 3.1 app_file: app/app.py pinned: false license: mit --- # Financial QA System: RAG vs Fine-Tuning This project implements and compares two approaches for answering questions about Allstate's financial reports: 1. **Retrieval-Augmented Generation (RAG)**: Combines hybrid document retrieval with generative language models 2. **Fine-Tuned Language Model (FT)**: Direct fine-tuning of a small language model on financial Q&A ## Quick Start ### Model Files Due to file size limitations, model files are not included in this repository. To use the system: 1. The fine-tuned model is hosted on [Hugging Face Hub](https://huggingface.co/jayyd/financial-qa-model) 2. The application automatically loads the model directly from Hugging Face when running 3. If you want to download the model locally, you can use the Hugging Face CLI: ``` pip install huggingface_hub python -c "from huggingface_hub import snapshot_download; snapshot_download('jayyd/financial-qa-model', local_dir='models/fine_tuned_model')" ``` ## Key Features ### RAG System - **Hybrid Retrieval**: - Dense retrieval using Sentence Transformers (all-MiniLM-L6-v2) - Sparse retrieval using BM25 - Score fusion for optimal chunk selection - **Context-Aware Generation**: - Prompts engineered for financial accuracy - Dynamic context window management - Multi-chunk answer synthesis ### Fine-Tuned Model - **Base Model**: DistilGPT2 (small, efficient) - **Training Data**: 30+ carefully curated financial Q&A pairs - **Optimization**: Parameter-efficient fine-tuning ### Guardrails - **Input Validation**: - Financial keyword detection - Query complexity analysis - Minimum length requirements - **Output Validation**: - Confidence scoring - Hallucination detection - Answer quality metrics ### Evaluation Framework - Response time tracking - Confidence scoring - Chunk relevance metrics - Answer quality assessment ## 🔧 Setup Instructions (Run Locally) ### 1. Clone the Repository ```bash git clone cd financial_qa_rag_ft ``` ### 2. Create Virtual Environment (optional) ```bash python -m venv venv source venv/bin/activate # or venv\Scripts\activate on Windows ``` ### 3. Install Dependencies ```bash pip install -r requirements.txt ``` ### 4. Download Financial Reports ```bash python utils/download_reports.py ``` ### 5. Extract and Clean Text ```bash Run: notebooks/01_data_preprocessing.ipynb ``` ### 6. Generate QA Pairs Automatically ```bash python utils/generate_qa_pairs.py ``` ### 7. Train Fine-Tuned Model (Optional) ```bash Run: notebooks/03_fine_tuning.ipynb ``` ### 8. Run Streamlit App ```bash streamlit run app/app.py ``` The Streamlit app will automatically load the fine-tuned model from Hugging Face Hub (jayyd/financial-qa-model) when running in "Fine-Tuned" mode. No local model files are needed. --- ## Project Structure ``` financial_qa_rag_ft/ ├── app/ │ └── app.py # Streamlit web interface with real-time metrics ├── data/ │ ├── processed/ # Cleaned and segmented text files │ │ ├── Allstate_2022_10K.txt │ │ └── Allstate_2023_10K.txt │ └── raw/ # Original financial reports │ ├── Allstate_2022_10K.pdf │ └── Allstate_2023_10K.pdf ├── models/ │ ├── fine_tuned_model/ # DistilGPT2 fine-tuned on financial QA │ └── rag_model/ # Saved embeddings and retrieval indices ├── notebooks/ │ ├── 01_data_preprocessing.ipynb # PDF parsing and text cleaning │ ├── 02_rag_pipeline.ipynb # RAG implementation and testing │ ├── 03_fine_tuning.ipynb # Model fine-tuning process │ ├── 04_evaluation.ipynb # Individual model evaluation │ └── 05_evaluation_comparison.ipynb # Comparative analysis ├── qa_pairs/ │ └── qa_dataset.json # Curated financial QA pairs ├── utils/ │ ├── chunking.py # Smart text segmentation │ ├── data_preprocessing.py # PDF processing pipeline │ ├── evaluation.py # Comprehensive metrics │ ├── fine_tuning.py # Training utilities │ ├── generator.py # Answer generation logic │ ├── guardrails.py # Input/output validation │ └── retriever.py # Hybrid search implementation ├── requirements.txt # Project dependencies └── README.md # Project documentation ``` --- ## Performance Comparison ### RAG System - **Strengths**: - Higher factual accuracy - Better source traceability - More robust to unseen questions - **Metrics**: - Average response time: ~0.5s - Typical confidence: 0.8-0.95 - Strong chunk relevance scores ### Fine-tuned Model - **Strengths**: - Faster inference - More natural language - Consistent response style - **Metrics**: - Average response time: ~0.4s - Typical confidence: 0.75-0.9 - Good performance on seen patterns ## Example Questions ```python # High-confidence questions "What was Allstate's total revenue in 2023?" "How much was the net loss in 2023?" "What were the total assets in 2022?" # Complex analytical questions "How did revenue change from 2022 to 2023?" "What factors affected profitability in 2023?" "Compare the investment portfolio returns between 2022 and 2023" ``` ## Technical Requirements - Python 3.8+ - PyTorch 2.0+ - Transformers 4.31+ - Streamlit 1.24+ - Sentence-Transformers 2.2+ - See requirements.txt for full list ## License This project is for academic/educational use only. Financial data sourced from Allstate's public reports. ## Acknowledgments - Built using Hugging Face Transformers - Financial data from Allstate's 10-K reports - Streamlit for the web interface