File size: 6,469 Bytes
9767384
7248d39
 
 
 
9767384
 
7248d39
9767384
942389c
 
 
 
 
9767384
 
7248d39
 
b8ca451
7248d39
b8ca451
7248d39
b8ca451
 
 
7248d39
b8ca451
 
 
 
 
 
 
7248d39
b8ca451
7248d39
b8ca451
7248d39
 
 
b8ca451
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
7248d39
b8ca451
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
7248d39
b8ca451
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
7248d39
b8ca451
7248d39
b8ca451
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
7248d39
 
 
 
b8ca451
7248d39
b8ca451
 
 
 
 
7248d39
b8ca451
 
7248d39
b8ca451
7248d39
 
b8ca451
 
 
7248d39
b8ca451
 
 
 
 
 
 
 
 
7248d39
 
 
b8ca451
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
---
title: FinSight AI
emoji: πŸ“Š
colorFrom: blue
colorTo: green
sdk: gradio
app_file: app.py
python_version: "3.11"
pinned: false
tags:
  - track:backyard
  - sponsor:openbmb
  - sponsor:modal
  - achievement:offgrid
---

# FinSight AI

Finance-domain **Retrieval-Augmented Generation (RAG)** assistant built with **OpenBMB MiniCPM** models. Upload earnings reports, bank statements, and filings β€” then chat, summarize, run OCR, and extract entities with cited answers.

Inference runs on **Modal** serverless GPUs; the Gradio UI, FAISS vector index, and document store stay local (or on Hugging Face Spaces). No 32B+ models β€” everything fits comfortably under the Build Small / SLM hackathon limits.

---

## What it does

| Tab | Description |
|-----|-------------|
| **Finance QA Chatbot** | Streaming RAG chat with source citations and confidence |
| **Financial Summary** | Executive, financial, or risk-focused summaries |
| **Document OCR** | Structured OCR for scanned PDFs and images |
| **Entity Extraction** | Companies, tickers, dates, and key figures |
| **Upload Documents** | Ingest, list, delete, and scope search to one file |

Search modes: **Hybrid RAG** (semantic + BM25 across all docs) or **Single Document** (chat scoped to one upload).

---

## Architecture

| Component | Model | Where it runs | VRAM |
|-----------|-------|---------------|------|
| **Embeddings** | MiniCPM-Embedding (4-bit NF4) | Modal T4 | ~1.6 GB |
| **LLM** | MiniCPM4.1-8B (Q4_K_M GGUF) | Modal T4 | ~5 GB |
| **OCR / Vision** | MiniCPM-V 4.6 | Modal A10G | ~2 GB |
| **Vector search** | FAISS + BM25 hybrid | Local / HF Space | CPU |
| **UI** | Gradio 6 | `:7860` | CPU |
| **REST API** *(optional)* | FastAPI | `:8000` | CPU |

Models download automatically on first Modal cold start into a persistent volume (`finsight-hf-cache`).

---

## Quick Start

### 1. Deploy Modal workers (one-time)

```bash
pip install modal
modal setup
modal deploy finsight_modal/app.py
```

Smoke test:

```bash
modal run finsight_modal/app.py
```

View deployment: [modal.com/apps](https://modal.com/apps) β†’ **finsight-ai**

### 2. Run locally

```bash
cp .env.example .env
python -m venv .venv
.\.venv\Scripts\Activate.ps1   # Windows
# source .venv/bin/activate    # macOS / Linux

pip install -r requirements.txt -r backend/requirements.txt
python app.py
```

Open **http://localhost:7860**

Optional REST API:

```bash
cd backend && uvicorn main:app --reload --port 8000
```

Docker:

```bash
docker compose up gradio -d
# optional API:
docker compose up backend -d
```

---

## Hugging Face Spaces

The Space entry point is `app.py` at the repo root (Gradio SDK).

Add these **Secrets** in Space settings:

| Secret | Description |
|--------|-------------|
| `MODAL_TOKEN_ID` | From `~/.modal.toml` after `modal setup` (starts with `ak-`) |
| `MODAL_TOKEN_SECRET` | Paired secret (starts with `as-`) |
| `MODAL_APP_NAME` | `finsight-ai` (must match deployed Modal app) |

Get tokens locally:

```powershell
# Windows
Get-Content $env:USERPROFILE\.modal.toml
```

Or create new tokens at [modal.com/settings](https://modal.com/settings).

> **Note:** FAISS indexes and uploaded documents persist under `./data/` locally. On HF Spaces, storage is ephemeral unless you attach a persistent volume β€” re-upload docs after restarts.

---

## Modal credentials (Docker / CI)

After `modal setup`, credentials live in `~/.modal.toml`:

```toml
[default]
token_id = "ak-..."
token_secret = "as-..."
```

Set as environment variables (overrides the file):

```bash
export MODAL_TOKEN_ID="ak-..."
export MODAL_TOKEN_SECRET="as-..."
export MODAL_APP_NAME="finsight-ai"
```

See [Modal token docs](https://modal.com/docs/reference/modal.config) for CI and Docker setup.

---

## Environment Variables

| Variable | Default | Description |
|----------|---------|-------------|
| `MODAL_APP_NAME` | `finsight-ai` | Deployed Modal app name |
| `FAISS_DATA_DIR` | `./data/faiss` | FAISS index + chunk metadata |
| `CHAT_DB_PATH` | `./data/chat_sessions.db` | SQLite chat sessions |
| `TOP_K` | `6` | Retrieved chunks per query |
| `CHUNK_SIZE` | `512` | Ingestion chunk size (tokens) |
| `CHUNK_OVERLAP` | `64` | Chunk overlap |
| `HYBRID_ALPHA` | `0.6` | Semantic vs BM25 blend (0–1) |

---

## Model Summary

| Model | Size | Quantization | Source |
|-------|------|--------------|--------|
| MiniCPM-Embedding | 0.4B | 4-bit NF4 (BnB) | [openbmb/MiniCPM-Embedding](https://huggingface.co/openbmb/MiniCPM-Embedding) |
| MiniCPM4.1-8B | 8B | Q4_K_M GGUF | [openbmb/MiniCPM4.1-8B](https://huggingface.co/openbmb/MiniCPM4.1-8B) |
| MiniCPM-V 4.6 | 1B | fp16 | [openbmb/MiniCPM-V-4.6](https://huggingface.co/openbmb/MiniCPM-V-4.6) |

All OpenBMB models: **Apache 2.0** Β· Hugging Face Hub

Total stack stays well below the **32B Build Small** parameter limit.

---

## REST API *(optional)*

| Endpoint | Method | Description |
|----------|--------|-------------|
| `/api/chat` | POST | SSE streaming RAG chat |
| `/api/documents/upload` | POST | Upload PDF / image |
| `/api/documents/list` | GET | List ingested documents |
| `/api/summarize` | POST | Financial summary |
| `/api/ocr` | POST | OCR extraction |
| `/api/extract-entities` | POST | Entity extraction |
| `/api/sessions` | GET / POST | Chat session management |

---

## Repository Structure

```text
app.py                  # HF Space entry (Gradio)
backend/
  gradio_ui/            # Tabs, theme, custom CSS
  services/             # RAG, ingestion, summarizer
  models/               # Modal client wrappers
  db/                   # FAISS + SQLite
  routers/              # FastAPI routes
finsight_modal/
  app.py                # Modal GPU workers (deploy separately)
data/                   # FAISS index + uploads (gitignored)
requirements.txt
docker-compose.yml
```

---

## Hackathon Context

Built for the **Hugging Face Build Small Hackathon** and the **SLM Hackathon** track (Project 09 β€” FinSight Statement Auditor lineage). Uses efficient OpenBMB models with Modal offload so the UI runs on CPU while GPUs spin up only for inference.

| Badge | How FinSight qualifies |
|-------|------------------------|
| **Build Small** | All models combined β‰ͺ 32B params |
| **Off the Grid** | Document index + FAISS stay on-device; only inference hits Modal |
| **Off-Brand** | Custom FinSight Gradio theme (gold accent, finance-first layout) |

---

## License

Apache-2.0 (application code and OpenBMB model weights)