Spaces:

jayyd
/

financial_qa_rag

Sleeping

App Files Files Community

financial_qa_rag / README.md

jayyd

Update README.md

b19124d verified 4 months ago

preview code

raw

history blame contribute delete

6 kB

A newer version of the Streamlit SDK is available: 1.52.2

Upgrade

metadata

title: Financial QA — RAG vs FT
emoji: 📊
colorFrom: indigo
colorTo: blue
sdk: streamlit
sdk_version: 1.48.1
app_file: app/app.py
pinned: false
license: mit

Financial QA System: RAG vs Fine-Tuning

This project implements and compares two approaches for answering questions about Allstate's financial reports:

Retrieval-Augmented Generation (RAG): Combines hybrid document retrieval with generative language models
Fine-Tuned Language Model (FT): Direct fine-tuning of a small language model on financial Q&A

Quick Start

Model Files

Due to file size limitations, model files are not included in this repository. To use the system:

The fine-tuned model is hosted on Hugging Face Hub
The application automatically loads the model directly from Hugging Face when running

If you want to download the model locally, you can use the Hugging Face CLI:

pip install huggingface_hub
python -c "from huggingface_hub import snapshot_download; snapshot_download('jayyd/financial-qa-model', local_dir='models/fine_tuned_model')"

Key Features

RAG System

Hybrid Retrieval:
- Dense retrieval using Sentence Transformers (all-MiniLM-L6-v2)
- Sparse retrieval using BM25
- Score fusion for optimal chunk selection
Context-Aware Generation:
- Prompts engineered for financial accuracy
- Dynamic context window management
- Multi-chunk answer synthesis

Fine-Tuned Model

Base Model: DistilGPT2 (small, efficient)
Training Data: 30+ carefully curated financial Q&A pairs
Optimization: Parameter-efficient fine-tuning

Guardrails

Input Validation:
- Financial keyword detection
- Query complexity analysis
- Minimum length requirements
Output Validation:
- Confidence scoring
- Hallucination detection
- Answer quality metrics

Evaluation Framework

Response time tracking
Confidence scoring
Chunk relevance metrics
Answer quality assessment

🔧 Setup Instructions (Run Locally)

1. Clone the Repository

git clone <your-repo-url>
cd financial_qa_rag_ft

2. Create Virtual Environment (optional)

python -m venv venv
source venv/bin/activate  # or venv\Scripts\activate on Windows

3. Install Dependencies

pip install -r requirements.txt

4. Download Financial Reports

python utils/download_reports.py

5. Extract and Clean Text

Run: notebooks/01_data_preprocessing.ipynb

6. Generate QA Pairs Automatically

python utils/generate_qa_pairs.py

7. Train Fine-Tuned Model (Optional)

Run: notebooks/03_fine_tuning.ipynb

8. Run Streamlit App

streamlit run app/app.py

The Streamlit app will automatically load the fine-tuned model from Hugging Face Hub (jayyd/financial-qa-model) when running in "Fine-Tuned" mode. No local model files are needed.

Project Structure

financial_qa_rag_ft/
├── app/
│   └── app.py                 # Streamlit web interface with real-time metrics
├── data/
│   ├── processed/             # Cleaned and segmented text files
│   │   ├── Allstate_2022_10K.txt
│   │   └── Allstate_2023_10K.txt
│   └── raw/                   # Original financial reports
│       ├── Allstate_2022_10K.pdf
│       └── Allstate_2023_10K.pdf
├── models/
│   ├── fine_tuned_model/     # DistilGPT2 fine-tuned on financial QA
│   └── rag_model/            # Saved embeddings and retrieval indices
├── notebooks/
│   ├── 01_data_preprocessing.ipynb  # PDF parsing and text cleaning
│   ├── 02_rag_pipeline.ipynb       # RAG implementation and testing
│   ├── 03_fine_tuning.ipynb       # Model fine-tuning process
│   ├── 04_evaluation.ipynb        # Individual model evaluation
│   └── 05_evaluation_comparison.ipynb  # Comparative analysis
├── qa_pairs/
│   └── qa_dataset.json       # Curated financial QA pairs
├── utils/
│   ├── chunking.py           # Smart text segmentation
│   ├── data_preprocessing.py # PDF processing pipeline
│   ├── evaluation.py        # Comprehensive metrics
│   ├── fine_tuning.py      # Training utilities
│   ├── generator.py        # Answer generation logic
│   ├── guardrails.py      # Input/output validation
│   └── retriever.py       # Hybrid search implementation
├── requirements.txt          # Project dependencies
└── README.md                # Project documentation

Performance Comparison

RAG System

Strengths:
- Higher factual accuracy
- Better source traceability
- More robust to unseen questions
Metrics:
- Average response time: ~0.5s
- Typical confidence: 0.8-0.95
- Strong chunk relevance scores

Fine-tuned Model

Strengths:
- Faster inference
- More natural language
- Consistent response style
Metrics:
- Average response time: ~0.4s
- Typical confidence: 0.75-0.9
- Good performance on seen patterns

Example Questions

# High-confidence questions
"What was Allstate's total revenue in 2023?"
"How much was the net loss in 2023?"
"What were the total assets in 2022?"

# Complex analytical questions
"How did revenue change from 2022 to 2023?"
"What factors affected profitability in 2023?"
"Compare the investment portfolio returns between 2022 and 2023"

Technical Requirements

Python 3.8+
PyTorch 2.0+
Transformers 4.31+
Streamlit 1.24+
Sentence-Transformers 2.2+
See requirements.txt for full list

License

This project is for academic/educational use only. Financial data sourced from Allstate's public reports.

Acknowledgments

Built using Hugging Face Transformers
Financial data from Allstate's 10-K reports
Streamlit for the web interface