--- title: FinSight AI emoji: ๐Ÿ“Š colorFrom: blue colorTo: green sdk: gradio app_file: app.py python_version: "3.11" pinned: false tags: - track:backyard - sponsor:openbmb - sponsor:modal - achievement:offgrid --- # FinSight AI Finance-domain **Retrieval-Augmented Generation (RAG)** assistant built with **OpenBMB MiniCPM** models. Upload earnings reports, bank statements, and filings โ€” then chat, summarize, run OCR, and extract entities with cited answers. Inference runs on **Modal** serverless GPUs; the Gradio UI, FAISS vector index, and document store stay local (or on Hugging Face Spaces). No 32B+ models โ€” everything fits comfortably under the Build Small / SLM hackathon limits. --- ## What it does | Tab | Description | |-----|-------------| | **Finance QA Chatbot** | Streaming RAG chat with source citations and confidence | | **Financial Summary** | Executive, financial, or risk-focused summaries | | **Document OCR** | Structured OCR for scanned PDFs and images | | **Entity Extraction** | Companies, tickers, dates, and key figures | | **Upload Documents** | Ingest, list, delete, and scope search to one file | Search modes: **Hybrid RAG** (semantic + BM25 across all docs) or **Single Document** (chat scoped to one upload). --- ## Architecture | Component | Model | Where it runs | VRAM | |-----------|-------|---------------|------| | **Embeddings** | MiniCPM-Embedding (4-bit NF4) | Modal T4 | ~1.6 GB | | **LLM** | MiniCPM4.1-8B (Q4_K_M GGUF) | Modal T4 | ~5 GB | | **OCR / Vision** | MiniCPM-V 4.6 | Modal A10G | ~2 GB | | **Vector search** | FAISS + BM25 hybrid | Local / HF Space | CPU | | **UI** | Gradio 6 | `:7860` | CPU | | **REST API** *(optional)* | FastAPI | `:8000` | CPU | Models download automatically on first Modal cold start into a persistent volume (`finsight-hf-cache`). --- ## Quick Start ### 1. Deploy Modal workers (one-time) ```bash pip install modal modal setup modal deploy finsight_modal/app.py ``` Smoke test: ```bash modal run finsight_modal/app.py ``` View deployment: [modal.com/apps](https://modal.com/apps) โ†’ **finsight-ai** ### 2. Run locally ```bash cp .env.example .env python -m venv .venv .\.venv\Scripts\Activate.ps1 # Windows # source .venv/bin/activate # macOS / Linux pip install -r requirements.txt -r backend/requirements.txt python app.py ``` Open **http://localhost:7860** Optional REST API: ```bash cd backend && uvicorn main:app --reload --port 8000 ``` Docker: ```bash docker compose up gradio -d # optional API: docker compose up backend -d ``` --- ## Hugging Face Spaces The Space entry point is `app.py` at the repo root (Gradio SDK). Add these **Secrets** in Space settings: | Secret | Description | |--------|-------------| | `MODAL_TOKEN_ID` | From `~/.modal.toml` after `modal setup` (starts with `ak-`) | | `MODAL_TOKEN_SECRET` | Paired secret (starts with `as-`) | | `MODAL_APP_NAME` | `finsight-ai` (must match deployed Modal app) | Get tokens locally: ```powershell # Windows Get-Content $env:USERPROFILE\.modal.toml ``` Or create new tokens at [modal.com/settings](https://modal.com/settings). > **Note:** FAISS indexes and uploaded documents persist under `./data/` locally. On HF Spaces, storage is ephemeral unless you attach a persistent volume โ€” re-upload docs after restarts. --- ## Modal credentials (Docker / CI) After `modal setup`, credentials live in `~/.modal.toml`: ```toml [default] token_id = "ak-..." token_secret = "as-..." ``` Set as environment variables (overrides the file): ```bash export MODAL_TOKEN_ID="ak-..." export MODAL_TOKEN_SECRET="as-..." export MODAL_APP_NAME="finsight-ai" ``` See [Modal token docs](https://modal.com/docs/reference/modal.config) for CI and Docker setup. --- ## Environment Variables | Variable | Default | Description | |----------|---------|-------------| | `MODAL_APP_NAME` | `finsight-ai` | Deployed Modal app name | | `FAISS_DATA_DIR` | `./data/faiss` | FAISS index + chunk metadata | | `CHAT_DB_PATH` | `./data/chat_sessions.db` | SQLite chat sessions | | `TOP_K` | `6` | Retrieved chunks per query | | `CHUNK_SIZE` | `512` | Ingestion chunk size (tokens) | | `CHUNK_OVERLAP` | `64` | Chunk overlap | | `HYBRID_ALPHA` | `0.6` | Semantic vs BM25 blend (0โ€“1) | --- ## Model Summary | Model | Size | Quantization | Source | |-------|------|--------------|--------| | MiniCPM-Embedding | 0.4B | 4-bit NF4 (BnB) | [openbmb/MiniCPM-Embedding](https://huggingface.co/openbmb/MiniCPM-Embedding) | | MiniCPM4.1-8B | 8B | Q4_K_M GGUF | [openbmb/MiniCPM4.1-8B](https://huggingface.co/openbmb/MiniCPM4.1-8B) | | MiniCPM-V 4.6 | 1B | fp16 | [openbmb/MiniCPM-V-4.6](https://huggingface.co/openbmb/MiniCPM-V-4.6) | All OpenBMB models: **Apache 2.0** ยท Hugging Face Hub Total stack stays well below the **32B Build Small** parameter limit. --- ## REST API *(optional)* | Endpoint | Method | Description | |----------|--------|-------------| | `/api/chat` | POST | SSE streaming RAG chat | | `/api/documents/upload` | POST | Upload PDF / image | | `/api/documents/list` | GET | List ingested documents | | `/api/summarize` | POST | Financial summary | | `/api/ocr` | POST | OCR extraction | | `/api/extract-entities` | POST | Entity extraction | | `/api/sessions` | GET / POST | Chat session management | --- ## Repository Structure ```text app.py # HF Space entry (Gradio) backend/ gradio_ui/ # Tabs, theme, custom CSS services/ # RAG, ingestion, summarizer models/ # Modal client wrappers db/ # FAISS + SQLite routers/ # FastAPI routes finsight_modal/ app.py # Modal GPU workers (deploy separately) data/ # FAISS index + uploads (gitignored) requirements.txt docker-compose.yml ``` --- ## Hackathon Context Built for the **Hugging Face Build Small Hackathon** and the **SLM Hackathon** track (Project 09 โ€” FinSight Statement Auditor lineage). Uses efficient OpenBMB models with Modal offload so the UI runs on CPU while GPUs spin up only for inference. | Badge | How FinSight qualifies | |-------|------------------------| | **Build Small** | All models combined โ‰ช 32B params | | **Off the Grid** | Document index + FAISS stay on-device; only inference hits Modal | | **Off-Brand** | Custom FinSight Gradio theme (gold accent, finance-first layout) | --- ## License Apache-2.0 (application code and OpenBMB model weights)