Spaces:
Sleeping
Sleeping
File size: 6,469 Bytes
9767384 7248d39 9767384 7248d39 9767384 942389c 9767384 7248d39 b8ca451 7248d39 b8ca451 7248d39 b8ca451 7248d39 b8ca451 7248d39 b8ca451 7248d39 b8ca451 7248d39 b8ca451 7248d39 b8ca451 7248d39 b8ca451 7248d39 b8ca451 7248d39 b8ca451 7248d39 b8ca451 7248d39 b8ca451 7248d39 b8ca451 7248d39 b8ca451 7248d39 b8ca451 7248d39 b8ca451 7248d39 b8ca451 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 | ---
title: FinSight AI
emoji: π
colorFrom: blue
colorTo: green
sdk: gradio
app_file: app.py
python_version: "3.11"
pinned: false
tags:
- track:backyard
- sponsor:openbmb
- sponsor:modal
- achievement:offgrid
---
# FinSight AI
Finance-domain **Retrieval-Augmented Generation (RAG)** assistant built with **OpenBMB MiniCPM** models. Upload earnings reports, bank statements, and filings β then chat, summarize, run OCR, and extract entities with cited answers.
Inference runs on **Modal** serverless GPUs; the Gradio UI, FAISS vector index, and document store stay local (or on Hugging Face Spaces). No 32B+ models β everything fits comfortably under the Build Small / SLM hackathon limits.
---
## What it does
| Tab | Description |
|-----|-------------|
| **Finance QA Chatbot** | Streaming RAG chat with source citations and confidence |
| **Financial Summary** | Executive, financial, or risk-focused summaries |
| **Document OCR** | Structured OCR for scanned PDFs and images |
| **Entity Extraction** | Companies, tickers, dates, and key figures |
| **Upload Documents** | Ingest, list, delete, and scope search to one file |
Search modes: **Hybrid RAG** (semantic + BM25 across all docs) or **Single Document** (chat scoped to one upload).
---
## Architecture
| Component | Model | Where it runs | VRAM |
|-----------|-------|---------------|------|
| **Embeddings** | MiniCPM-Embedding (4-bit NF4) | Modal T4 | ~1.6 GB |
| **LLM** | MiniCPM4.1-8B (Q4_K_M GGUF) | Modal T4 | ~5 GB |
| **OCR / Vision** | MiniCPM-V 4.6 | Modal A10G | ~2 GB |
| **Vector search** | FAISS + BM25 hybrid | Local / HF Space | CPU |
| **UI** | Gradio 6 | `:7860` | CPU |
| **REST API** *(optional)* | FastAPI | `:8000` | CPU |
Models download automatically on first Modal cold start into a persistent volume (`finsight-hf-cache`).
---
## Quick Start
### 1. Deploy Modal workers (one-time)
```bash
pip install modal
modal setup
modal deploy finsight_modal/app.py
```
Smoke test:
```bash
modal run finsight_modal/app.py
```
View deployment: [modal.com/apps](https://modal.com/apps) β **finsight-ai**
### 2. Run locally
```bash
cp .env.example .env
python -m venv .venv
.\.venv\Scripts\Activate.ps1 # Windows
# source .venv/bin/activate # macOS / Linux
pip install -r requirements.txt -r backend/requirements.txt
python app.py
```
Open **http://localhost:7860**
Optional REST API:
```bash
cd backend && uvicorn main:app --reload --port 8000
```
Docker:
```bash
docker compose up gradio -d
# optional API:
docker compose up backend -d
```
---
## Hugging Face Spaces
The Space entry point is `app.py` at the repo root (Gradio SDK).
Add these **Secrets** in Space settings:
| Secret | Description |
|--------|-------------|
| `MODAL_TOKEN_ID` | From `~/.modal.toml` after `modal setup` (starts with `ak-`) |
| `MODAL_TOKEN_SECRET` | Paired secret (starts with `as-`) |
| `MODAL_APP_NAME` | `finsight-ai` (must match deployed Modal app) |
Get tokens locally:
```powershell
# Windows
Get-Content $env:USERPROFILE\.modal.toml
```
Or create new tokens at [modal.com/settings](https://modal.com/settings).
> **Note:** FAISS indexes and uploaded documents persist under `./data/` locally. On HF Spaces, storage is ephemeral unless you attach a persistent volume β re-upload docs after restarts.
---
## Modal credentials (Docker / CI)
After `modal setup`, credentials live in `~/.modal.toml`:
```toml
[default]
token_id = "ak-..."
token_secret = "as-..."
```
Set as environment variables (overrides the file):
```bash
export MODAL_TOKEN_ID="ak-..."
export MODAL_TOKEN_SECRET="as-..."
export MODAL_APP_NAME="finsight-ai"
```
See [Modal token docs](https://modal.com/docs/reference/modal.config) for CI and Docker setup.
---
## Environment Variables
| Variable | Default | Description |
|----------|---------|-------------|
| `MODAL_APP_NAME` | `finsight-ai` | Deployed Modal app name |
| `FAISS_DATA_DIR` | `./data/faiss` | FAISS index + chunk metadata |
| `CHAT_DB_PATH` | `./data/chat_sessions.db` | SQLite chat sessions |
| `TOP_K` | `6` | Retrieved chunks per query |
| `CHUNK_SIZE` | `512` | Ingestion chunk size (tokens) |
| `CHUNK_OVERLAP` | `64` | Chunk overlap |
| `HYBRID_ALPHA` | `0.6` | Semantic vs BM25 blend (0β1) |
---
## Model Summary
| Model | Size | Quantization | Source |
|-------|------|--------------|--------|
| MiniCPM-Embedding | 0.4B | 4-bit NF4 (BnB) | [openbmb/MiniCPM-Embedding](https://huggingface.co/openbmb/MiniCPM-Embedding) |
| MiniCPM4.1-8B | 8B | Q4_K_M GGUF | [openbmb/MiniCPM4.1-8B](https://huggingface.co/openbmb/MiniCPM4.1-8B) |
| MiniCPM-V 4.6 | 1B | fp16 | [openbmb/MiniCPM-V-4.6](https://huggingface.co/openbmb/MiniCPM-V-4.6) |
All OpenBMB models: **Apache 2.0** Β· Hugging Face Hub
Total stack stays well below the **32B Build Small** parameter limit.
---
## REST API *(optional)*
| Endpoint | Method | Description |
|----------|--------|-------------|
| `/api/chat` | POST | SSE streaming RAG chat |
| `/api/documents/upload` | POST | Upload PDF / image |
| `/api/documents/list` | GET | List ingested documents |
| `/api/summarize` | POST | Financial summary |
| `/api/ocr` | POST | OCR extraction |
| `/api/extract-entities` | POST | Entity extraction |
| `/api/sessions` | GET / POST | Chat session management |
---
## Repository Structure
```text
app.py # HF Space entry (Gradio)
backend/
gradio_ui/ # Tabs, theme, custom CSS
services/ # RAG, ingestion, summarizer
models/ # Modal client wrappers
db/ # FAISS + SQLite
routers/ # FastAPI routes
finsight_modal/
app.py # Modal GPU workers (deploy separately)
data/ # FAISS index + uploads (gitignored)
requirements.txt
docker-compose.yml
```
---
## Hackathon Context
Built for the **Hugging Face Build Small Hackathon** and the **SLM Hackathon** track (Project 09 β FinSight Statement Auditor lineage). Uses efficient OpenBMB models with Modal offload so the UI runs on CPU while GPUs spin up only for inference.
| Badge | How FinSight qualifies |
|-------|------------------------|
| **Build Small** | All models combined βͺ 32B params |
| **Off the Grid** | Document index + FAISS stay on-device; only inference hits Modal |
| **Off-Brand** | Custom FinSight Gradio theme (gold accent, finance-first layout) |
---
## License
Apache-2.0 (application code and OpenBMB model weights)
|