Spaces:
Sleeping
Sleeping
| title: FinSight AI | |
| emoji: 📊 | |
| colorFrom: blue | |
| colorTo: green | |
| sdk: gradio | |
| app_file: app.py | |
| python_version: "3.11" | |
| pinned: false | |
| tags: | |
| - track:backyard | |
| - sponsor:openbmb | |
| - sponsor:modal | |
| - achievement:offgrid | |
| # FinSight AI | |
| Finance-domain **Retrieval-Augmented Generation (RAG)** assistant built with **OpenBMB MiniCPM** models. Upload earnings reports, bank statements, and filings — then chat, summarize, run OCR, and extract entities with cited answers. | |
| Inference runs on **Modal** serverless GPUs; the Gradio UI, FAISS vector index, and document store stay local (or on Hugging Face Spaces). No 32B+ models — everything fits comfortably under the Build Small / SLM hackathon limits. | |
| --- | |
| ## What it does | |
| | Tab | Description | | |
| |-----|-------------| | |
| | **Finance QA Chatbot** | Streaming RAG chat with source citations and confidence | | |
| | **Financial Summary** | Executive, financial, or risk-focused summaries | | |
| | **Document OCR** | Structured OCR for scanned PDFs and images | | |
| | **Entity Extraction** | Companies, tickers, dates, and key figures | | |
| | **Upload Documents** | Ingest, list, delete, and scope search to one file | | |
| Search modes: **Hybrid RAG** (semantic + BM25 across all docs) or **Single Document** (chat scoped to one upload). | |
| --- | |
| ## Architecture | |
| | Component | Model | Where it runs | VRAM | | |
| |-----------|-------|---------------|------| | |
| | **Embeddings** | MiniCPM-Embedding (4-bit NF4) | Modal T4 | ~1.6 GB | | |
| | **LLM** | MiniCPM4.1-8B (Q4_K_M GGUF) | Modal T4 | ~5 GB | | |
| | **OCR / Vision** | MiniCPM-V 4.6 | Modal A10G | ~2 GB | | |
| | **Vector search** | FAISS + BM25 hybrid | Local / HF Space | CPU | | |
| | **UI** | Gradio 6 | `:7860` | CPU | | |
| | **REST API** *(optional)* | FastAPI | `:8000` | CPU | | |
| Models download automatically on first Modal cold start into a persistent volume (`finsight-hf-cache`). | |
| --- | |
| ## Quick Start | |
| ### 1. Deploy Modal workers (one-time) | |
| ```bash | |
| pip install modal | |
| modal setup | |
| modal deploy finsight_modal/app.py | |
| ``` | |
| Smoke test: | |
| ```bash | |
| modal run finsight_modal/app.py | |
| ``` | |
| View deployment: [modal.com/apps](https://modal.com/apps) → **finsight-ai** | |
| ### 2. Run locally | |
| ```bash | |
| cp .env.example .env | |
| python -m venv .venv | |
| .\.venv\Scripts\Activate.ps1 # Windows | |
| # source .venv/bin/activate # macOS / Linux | |
| pip install -r requirements.txt -r backend/requirements.txt | |
| python app.py | |
| ``` | |
| Open **http://localhost:7860** | |
| Optional REST API: | |
| ```bash | |
| cd backend && uvicorn main:app --reload --port 8000 | |
| ``` | |
| Docker: | |
| ```bash | |
| docker compose up gradio -d | |
| # optional API: | |
| docker compose up backend -d | |
| ``` | |
| --- | |
| ## Hugging Face Spaces | |
| The Space entry point is `app.py` at the repo root (Gradio SDK). | |
| Add these **Secrets** in Space settings: | |
| | Secret | Description | | |
| |--------|-------------| | |
| | `MODAL_TOKEN_ID` | From `~/.modal.toml` after `modal setup` (starts with `ak-`) | | |
| | `MODAL_TOKEN_SECRET` | Paired secret (starts with `as-`) | | |
| | `MODAL_APP_NAME` | `finsight-ai` (must match deployed Modal app) | | |
| Get tokens locally: | |
| ```powershell | |
| # Windows | |
| Get-Content $env:USERPROFILE\.modal.toml | |
| ``` | |
| Or create new tokens at [modal.com/settings](https://modal.com/settings). | |
| > **Note:** FAISS indexes and uploaded documents persist under `./data/` locally. On HF Spaces, storage is ephemeral unless you attach a persistent volume — re-upload docs after restarts. | |
| --- | |
| ## Modal credentials (Docker / CI) | |
| After `modal setup`, credentials live in `~/.modal.toml`: | |
| ```toml | |
| [default] | |
| token_id = "ak-..." | |
| token_secret = "as-..." | |
| ``` | |
| Set as environment variables (overrides the file): | |
| ```bash | |
| export MODAL_TOKEN_ID="ak-..." | |
| export MODAL_TOKEN_SECRET="as-..." | |
| export MODAL_APP_NAME="finsight-ai" | |
| ``` | |
| See [Modal token docs](https://modal.com/docs/reference/modal.config) for CI and Docker setup. | |
| --- | |
| ## Environment Variables | |
| | Variable | Default | Description | | |
| |----------|---------|-------------| | |
| | `MODAL_APP_NAME` | `finsight-ai` | Deployed Modal app name | | |
| | `FAISS_DATA_DIR` | `./data/faiss` | FAISS index + chunk metadata | | |
| | `CHAT_DB_PATH` | `./data/chat_sessions.db` | SQLite chat sessions | | |
| | `TOP_K` | `6` | Retrieved chunks per query | | |
| | `CHUNK_SIZE` | `512` | Ingestion chunk size (tokens) | | |
| | `CHUNK_OVERLAP` | `64` | Chunk overlap | | |
| | `HYBRID_ALPHA` | `0.6` | Semantic vs BM25 blend (0–1) | | |
| --- | |
| ## Model Summary | |
| | Model | Size | Quantization | Source | | |
| |-------|------|--------------|--------| | |
| | MiniCPM-Embedding | 0.4B | 4-bit NF4 (BnB) | [openbmb/MiniCPM-Embedding](https://huggingface.co/openbmb/MiniCPM-Embedding) | | |
| | MiniCPM4.1-8B | 8B | Q4_K_M GGUF | [openbmb/MiniCPM4.1-8B](https://huggingface.co/openbmb/MiniCPM4.1-8B) | | |
| | MiniCPM-V 4.6 | 1B | fp16 | [openbmb/MiniCPM-V-4.6](https://huggingface.co/openbmb/MiniCPM-V-4.6) | | |
| All OpenBMB models: **Apache 2.0** · Hugging Face Hub | |
| Total stack stays well below the **32B Build Small** parameter limit. | |
| --- | |
| ## REST API *(optional)* | |
| | Endpoint | Method | Description | | |
| |----------|--------|-------------| | |
| | `/api/chat` | POST | SSE streaming RAG chat | | |
| | `/api/documents/upload` | POST | Upload PDF / image | | |
| | `/api/documents/list` | GET | List ingested documents | | |
| | `/api/summarize` | POST | Financial summary | | |
| | `/api/ocr` | POST | OCR extraction | | |
| | `/api/extract-entities` | POST | Entity extraction | | |
| | `/api/sessions` | GET / POST | Chat session management | | |
| --- | |
| ## Repository Structure | |
| ```text | |
| app.py # HF Space entry (Gradio) | |
| backend/ | |
| gradio_ui/ # Tabs, theme, custom CSS | |
| services/ # RAG, ingestion, summarizer | |
| models/ # Modal client wrappers | |
| db/ # FAISS + SQLite | |
| routers/ # FastAPI routes | |
| finsight_modal/ | |
| app.py # Modal GPU workers (deploy separately) | |
| data/ # FAISS index + uploads (gitignored) | |
| requirements.txt | |
| docker-compose.yml | |
| ``` | |
| --- | |
| ## Hackathon Context | |
| Built for the **Hugging Face Build Small Hackathon** and the **SLM Hackathon** track (Project 09 — FinSight Statement Auditor lineage). Uses efficient OpenBMB models with Modal offload so the UI runs on CPU while GPUs spin up only for inference. | |
| | Badge | How FinSight qualifies | | |
| |-------|------------------------| | |
| | **Build Small** | All models combined ≪ 32B params | | |
| | **Off the Grid** | Document index + FAISS stay on-device; only inference hits Modal | | |
| | **Off-Brand** | Custom FinSight Gradio theme (gold accent, finance-first layout) | | |
| --- | |
| ## License | |
| Apache-2.0 (application code and OpenBMB model weights) | |