Spaces:
Sleeping
A newer version of the Gradio SDK is available: 6.19.0
title: FinSight AI
emoji: π
colorFrom: blue
colorTo: green
sdk: gradio
app_file: app.py
python_version: '3.11'
pinned: false
tags:
- track:backyard
- sponsor:openbmb
- sponsor:modal
- achievement:offgrid
FinSight AI
Finance-domain Retrieval-Augmented Generation (RAG) assistant built with OpenBMB MiniCPM models. Upload earnings reports, bank statements, and filings β then chat, summarize, run OCR, and extract entities with cited answers.
Inference runs on Modal serverless GPUs; the Gradio UI, FAISS vector index, and document store stay local (or on Hugging Face Spaces). No 32B+ models β everything fits comfortably under the Build Small / SLM hackathon limits.
What it does
| Tab | Description |
|---|---|
| Finance QA Chatbot | Streaming RAG chat with source citations and confidence |
| Financial Summary | Executive, financial, or risk-focused summaries |
| Document OCR | Structured OCR for scanned PDFs and images |
| Entity Extraction | Companies, tickers, dates, and key figures |
| Upload Documents | Ingest, list, delete, and scope search to one file |
Search modes: Hybrid RAG (semantic + BM25 across all docs) or Single Document (chat scoped to one upload).
Architecture
| Component | Model | Where it runs | VRAM |
|---|---|---|---|
| Embeddings | MiniCPM-Embedding (4-bit NF4) | Modal T4 | ~1.6 GB |
| LLM | MiniCPM4.1-8B (Q4_K_M GGUF) | Modal T4 | ~5 GB |
| OCR / Vision | MiniCPM-V 4.6 | Modal A10G | ~2 GB |
| Vector search | FAISS + BM25 hybrid | Local / HF Space | CPU |
| UI | Gradio 6 | :7860 |
CPU |
| REST API (optional) | FastAPI | :8000 |
CPU |
Models download automatically on first Modal cold start into a persistent volume (finsight-hf-cache).
Quick Start
1. Deploy Modal workers (one-time)
pip install modal
modal setup
modal deploy finsight_modal/app.py
Smoke test:
modal run finsight_modal/app.py
View deployment: modal.com/apps β finsight-ai
2. Run locally
cp .env.example .env
python -m venv .venv
.\.venv\Scripts\Activate.ps1 # Windows
# source .venv/bin/activate # macOS / Linux
pip install -r requirements.txt -r backend/requirements.txt
python app.py
Optional REST API:
cd backend && uvicorn main:app --reload --port 8000
Docker:
docker compose up gradio -d
# optional API:
docker compose up backend -d
Hugging Face Spaces
The Space entry point is app.py at the repo root (Gradio SDK).
Add these Secrets in Space settings:
| Secret | Description |
|---|---|
MODAL_TOKEN_ID |
From ~/.modal.toml after modal setup (starts with ak-) |
MODAL_TOKEN_SECRET |
Paired secret (starts with as-) |
MODAL_APP_NAME |
finsight-ai (must match deployed Modal app) |
Get tokens locally:
# Windows
Get-Content $env:USERPROFILE\.modal.toml
Or create new tokens at modal.com/settings.
Note: FAISS indexes and uploaded documents persist under
./data/locally. On HF Spaces, storage is ephemeral unless you attach a persistent volume β re-upload docs after restarts.
Modal credentials (Docker / CI)
After modal setup, credentials live in ~/.modal.toml:
[default]
token_id = "ak-..."
token_secret = "as-..."
Set as environment variables (overrides the file):
export MODAL_TOKEN_ID="ak-..."
export MODAL_TOKEN_SECRET="as-..."
export MODAL_APP_NAME="finsight-ai"
See Modal token docs for CI and Docker setup.
Environment Variables
| Variable | Default | Description |
|---|---|---|
MODAL_APP_NAME |
finsight-ai |
Deployed Modal app name |
FAISS_DATA_DIR |
./data/faiss |
FAISS index + chunk metadata |
CHAT_DB_PATH |
./data/chat_sessions.db |
SQLite chat sessions |
TOP_K |
6 |
Retrieved chunks per query |
CHUNK_SIZE |
512 |
Ingestion chunk size (tokens) |
CHUNK_OVERLAP |
64 |
Chunk overlap |
HYBRID_ALPHA |
0.6 |
Semantic vs BM25 blend (0β1) |
Model Summary
| Model | Size | Quantization | Source |
|---|---|---|---|
| MiniCPM-Embedding | 0.4B | 4-bit NF4 (BnB) | openbmb/MiniCPM-Embedding |
| MiniCPM4.1-8B | 8B | Q4_K_M GGUF | openbmb/MiniCPM4.1-8B |
| MiniCPM-V 4.6 | 1B | fp16 | openbmb/MiniCPM-V-4.6 |
All OpenBMB models: Apache 2.0 Β· Hugging Face Hub
Total stack stays well below the 32B Build Small parameter limit.
REST API (optional)
| Endpoint | Method | Description |
|---|---|---|
/api/chat |
POST | SSE streaming RAG chat |
/api/documents/upload |
POST | Upload PDF / image |
/api/documents/list |
GET | List ingested documents |
/api/summarize |
POST | Financial summary |
/api/ocr |
POST | OCR extraction |
/api/extract-entities |
POST | Entity extraction |
/api/sessions |
GET / POST | Chat session management |
Repository Structure
app.py # HF Space entry (Gradio)
backend/
gradio_ui/ # Tabs, theme, custom CSS
services/ # RAG, ingestion, summarizer
models/ # Modal client wrappers
db/ # FAISS + SQLite
routers/ # FastAPI routes
finsight_modal/
app.py # Modal GPU workers (deploy separately)
data/ # FAISS index + uploads (gitignored)
requirements.txt
docker-compose.yml
Hackathon Context
Built for the Hugging Face Build Small Hackathon and the SLM Hackathon track (Project 09 β FinSight Statement Auditor lineage). Uses efficient OpenBMB models with Modal offload so the UI runs on CPU while GPUs spin up only for inference.
| Badge | How FinSight qualifies |
|---|---|
| Build Small | All models combined βͺ 32B params |
| Off the Grid | Document index + FAISS stay on-device; only inference hits Modal |
| Off-Brand | Custom FinSight Gradio theme (gold accent, finance-first layout) |
License
Apache-2.0 (application code and OpenBMB model weights)