Financial_bot / README.md
Pushkya's picture
Upload 30 files
8299003 verified
|
Raw
History Blame Contribute Delete
2.67 kB
---
title: Financial Intelligence RAG Pipeline
emoji: πŸ“Š
colorFrom: blue
colorTo: green
sdk: streamlit
sdk_version: 1.39.0
app_file: app.py
pinned: false
---
# Financial Intelligence RAG Pipeline
A production-grade Retrieval-Augmented Generation (RAG) system over Apple SEC filings and Morningstar research reports.
## What It Does
Ask natural language questions about Apple's financials and get answers grounded in source documents with full citations.
**Example questions:**
- What was Apple's total net sales in FY2024?
- What are Apple's main risk factors from the 2024 10-K?
- How did the Services segment perform compared to Products?
- What is Apple's gross margin trend over the last 3 years?
## Architecture
```
SEC EDGAR (10-K, 10-Q, 8-K) + Morningstar PDFs
|
Docling Processing (HTML/PDF parser)
|
HybridChunker (tokenizer-aware segmentation)
|
all-MiniLM-L6-v2 Embeddings (384-dim)
|
Qdrant Cloud (1,234 vectors)
|
Two-Stage Retrieval:
Dense ANN (50 candidates) β†’ ms-marco Cross-Encoder Reranking (top 8)
|
Google Gemini 1.5 Flash (streaming generation)
|
4-Layer Guardrails (input / retrieval / output / compliance)
|
Streamlit Chat UI
```
## Tech Stack
| Component | Technology |
|-----------|------------|
| Document Processing | Docling |
| Embeddings | sentence-transformers/all-MiniLM-L6-v2 |
| Reranker | cross-encoder/ms-marco-MiniLM-L-6-v2 |
| NLI Verifier | cross-encoder/nli-deberta-v3-small |
| Vector Store | Qdrant Cloud |
| LLM | Google Gemini 2.5 Flash |
| RAG Chain | LangChain LCEL |
| UI | Streamlit |
| CI/CD | GitHub Actions β†’ Hugging Face Spaces |
## Evaluation Results
| Metric | Score |
|--------|-------|
| Retrieval Hit Rate | 100% |
| Context Recall | 93.3% |
| Answer Relevancy | 75.5% |
| Faithfulness | 52.7% |
| Aggregate | 55.7% |
## Local Setup
```bash
# 1. Clone and install
git clone <repo-url>
pip install -r requirements.txt
# 2. Configure environment
cp .env.example .env
# Fill in GOOGLE_API_KEY, QDRANT_URL, QDRANT_API_KEY
# 3. Run the app
streamlit run app.py
```
## Environment Variables
| Variable | Required | Description |
|----------|----------|-------------|
| `GOOGLE_API_KEY` | Yes (for deployment) | Google AI Studio API key |
| `QDRANT_URL` | Yes (for deployment) | Qdrant Cloud cluster URL |
| `QDRANT_API_KEY` | Yes (for deployment) | Qdrant Cloud API key |
Without these set, the app falls back to local Ollama + ChromaDB for development.
## Disclaimer
This application is for informational and educational purposes only. It does not constitute investment advice.