Spaces:

build-small-hackathon
/

FinSightAI

Sleeping

File size: 6,469 Bytes

---
title: FinSight AI
emoji: 📊
colorFrom: blue
colorTo: green
sdk: gradio
app_file: app.py
python_version: "3.11"
pinned: false
tags:
  - track:backyard
  - sponsor:openbmb
  - sponsor:modal
  - achievement:offgrid
---

# FinSight AI

Finance-domain **Retrieval-Augmented Generation (RAG)** assistant built with **OpenBMB MiniCPM** models. Upload earnings reports, bank statements, and filings — then chat, summarize, run OCR, and extract entities with cited answers.

Inference runs on **Modal** serverless GPUs; the Gradio UI, FAISS vector index, and document store stay local (or on Hugging Face Spaces). No 32B+ models — everything fits comfortably under the Build Small / SLM hackathon limits.

---

## What it does

| Tab | Description |
|-----|-------------|
| **Finance QA Chatbot** | Streaming RAG chat with source citations and confidence |
| **Financial Summary** | Executive, financial, or risk-focused summaries |
| **Document OCR** | Structured OCR for scanned PDFs and images |
| **Entity Extraction** | Companies, tickers, dates, and key figures |
| **Upload Documents** | Ingest, list, delete, and scope search to one file |

Search modes: **Hybrid RAG** (semantic + BM25 across all docs) or **Single Document** (chat scoped to one upload).

---

## Architecture

| Component | Model | Where it runs | VRAM |
|-----------|-------|---------------|------|
| **Embeddings** | MiniCPM-Embedding (4-bit NF4) | Modal T4 | ~1.6 GB |
| **LLM** | MiniCPM4.1-8B (Q4_K_M GGUF) | Modal T4 | ~5 GB |
| **OCR / Vision** | MiniCPM-V 4.6 | Modal A10G | ~2 GB |
| **Vector search** | FAISS + BM25 hybrid | Local / HF Space | CPU |
| **UI** | Gradio 6 | `:7860` | CPU |
| **REST API** *(optional)* | FastAPI | `:8000` | CPU |

Models download automatically on first Modal cold start into a persistent volume (`finsight-hf-cache`).

---

## Quick Start

### 1. Deploy Modal workers (one-time)

```bash
pip install modal
modal setup
modal deploy finsight_modal/app.py
```

Smoke test:

```bash
modal run finsight_modal/app.py
```

View deployment: [modal.com/apps](https://modal.com/apps) → **finsight-ai**

### 2. Run locally

```bash
cp .env.example .env
python -m venv .venv
.\.venv\Scripts\Activate.ps1   # Windows
# source .venv/bin/activate    # macOS / Linux

pip install -r requirements.txt -r backend/requirements.txt
python app.py
```

Open **http://localhost:7860**

Optional REST API:

```bash
cd backend && uvicorn main:app --reload --port 8000
```

Docker:

```bash
docker compose up gradio -d
# optional API:
docker compose up backend -d
```

---

## Hugging Face Spaces

The Space entry point is `app.py` at the repo root (Gradio SDK).

Add these **Secrets** in Space settings:

| Secret | Description |
|--------|-------------|
| `MODAL_TOKEN_ID` | From `~/.modal.toml` after `modal setup` (starts with `ak-`) |
| `MODAL_TOKEN_SECRET` | Paired secret (starts with `as-`) |
| `MODAL_APP_NAME` | `finsight-ai` (must match deployed Modal app) |

Get tokens locally:

```powershell
# Windows
Get-Content $env:USERPROFILE\.modal.toml
```

Or create new tokens at [modal.com/settings](https://modal.com/settings).

> **Note:** FAISS indexes and uploaded documents persist under `./data/` locally. On HF Spaces, storage is ephemeral unless you attach a persistent volume — re-upload docs after restarts.

---

## Modal credentials (Docker / CI)

After `modal setup`, credentials live in `~/.modal.toml`:

```toml
[default]
token_id = "ak-..."
token_secret = "as-..."
```

Set as environment variables (overrides the file):

```bash
export MODAL_TOKEN_ID="ak-..."
export MODAL_TOKEN_SECRET="as-..."
export MODAL_APP_NAME="finsight-ai"
```

See [Modal token docs](https://modal.com/docs/reference/modal.config) for CI and Docker setup.

---

## Environment Variables

| Variable | Default | Description |
|----------|---------|-------------|
| `MODAL_APP_NAME` | `finsight-ai` | Deployed Modal app name |
| `FAISS_DATA_DIR` | `./data/faiss` | FAISS index + chunk metadata |
| `CHAT_DB_PATH` | `./data/chat_sessions.db` | SQLite chat sessions |
| `TOP_K` | `6` | Retrieved chunks per query |
| `CHUNK_SIZE` | `512` | Ingestion chunk size (tokens) |
| `CHUNK_OVERLAP` | `64` | Chunk overlap |
| `HYBRID_ALPHA` | `0.6` | Semantic vs BM25 blend (0–1) |

---

## Model Summary

| Model | Size | Quantization | Source |
|-------|------|--------------|--------|
| MiniCPM-Embedding | 0.4B | 4-bit NF4 (BnB) | [openbmb/MiniCPM-Embedding](https://huggingface.co/openbmb/MiniCPM-Embedding) |
| MiniCPM4.1-8B | 8B | Q4_K_M GGUF | [openbmb/MiniCPM4.1-8B](https://huggingface.co/openbmb/MiniCPM4.1-8B) |
| MiniCPM-V 4.6 | 1B | fp16 | [openbmb/MiniCPM-V-4.6](https://huggingface.co/openbmb/MiniCPM-V-4.6) |

All OpenBMB models: **Apache 2.0** · Hugging Face Hub

Total stack stays well below the **32B Build Small** parameter limit.

---

## REST API *(optional)*

| Endpoint | Method | Description |
|----------|--------|-------------|
| `/api/chat` | POST | SSE streaming RAG chat |
| `/api/documents/upload` | POST | Upload PDF / image |
| `/api/documents/list` | GET | List ingested documents |
| `/api/summarize` | POST | Financial summary |
| `/api/ocr` | POST | OCR extraction |
| `/api/extract-entities` | POST | Entity extraction |
| `/api/sessions` | GET / POST | Chat session management |

---

## Repository Structure

```text
app.py                  # HF Space entry (Gradio)
backend/
  gradio_ui/            # Tabs, theme, custom CSS
  services/             # RAG, ingestion, summarizer
  models/               # Modal client wrappers
  db/                   # FAISS + SQLite
  routers/              # FastAPI routes
finsight_modal/
  app.py                # Modal GPU workers (deploy separately)
data/                   # FAISS index + uploads (gitignored)
requirements.txt
docker-compose.yml
```

---

## Hackathon Context

Built for the **Hugging Face Build Small Hackathon** and the **SLM Hackathon** track (Project 09 — FinSight Statement Auditor lineage). Uses efficient OpenBMB models with Modal offload so the UI runs on CPU while GPUs spin up only for inference.

| Badge | How FinSight qualifies |
|-------|------------------------|
| **Build Small** | All models combined ≪ 32B params |
| **Off the Grid** | Document index + FAISS stay on-device; only inference hits Modal |
| **Off-Brand** | Custom FinSight Gradio theme (gold accent, finance-first layout) |

---

## License

Apache-2.0 (application code and OpenBMB model weights)