Spaces:
Sleeping
Sleeping
A newer version of the Gradio SDK is available: 6.19.0
FinSight AI
Finance-domain RAG QA system powered by MiniCPM models on Modal (serverless GPU), FAISS hybrid search, Gradio UI, and an optional FastAPI API.
Architecture
| Component | Where it runs |
|---|---|
| Embeddings (MiniCPM-Embedding NF4) | Modal T4 GPU |
| LLM (MiniCPM4.1-8B GGUF) | Modal T4 GPU |
| OCR (MiniCPM-V 4.6) | Modal A10G GPU |
| FAISS | Local persisted index (./data/faiss) |
| Gradio UI | Local :7860 (calls Python services in-process) |
| FastAPI backend | Local :8000 (optional REST API) |
Prerequisites
- Python 3.11+
- Docker & Docker Compose (optional, for containerized Gradio/API)
- Modal account (
pip install modal && modal setup)
Setup
1. Deploy Modal inference workers
From the project root:
pip install modal
modal setup
modal deploy finsight_modal/app.py
This deploys three GPU classes to Modal:
Embedder— MiniCPM-Embedding NF4 4-bit (~1.6 GB VRAM)LLM— MiniCPM4.1-8B Q4_K_M GGUFOCR— MiniCPM-V 4.6
Models are downloaded automatically on first cold start into a persistent Modal Volume.
Smoke test:
modal run finsight_modal/app.py
2. Start local services
cp .env.example .env
python -m venv .venv
.\.venv\Scripts\Activate.ps1 # Windows
# source .venv/bin/activate # macOS/Linux
pip install -r requirements.txt -r backend/requirements.txt
# Gradio UI (FAISS index stored under ./data/faiss)
# Gradio UI (primary)
cd backend && python -m gradio_ui.app
Optional REST API for scripts and integrations:
cd backend && uvicorn main:app --reload --port 8000
Or run Gradio + API with Docker:
docker compose up gradio -d
# optional API:
docker compose up backend -d
3. Modal credentials for Docker
Mount your Modal token so containers can call deployed workers:
# After modal setup, token is at ~/.modal.toml
# Set MODAL_TOKEN_ID and MODAL_TOKEN_SECRET env vars, or mount the config file
See Modal docs for token setup in CI/Docker.
Gradio UI features
| Tab | Description |
|---|---|
| QA | Streaming finance chat with sources and confidence |
| Summary | Financial / executive / risk summaries |
| OCR | Structured document OCR with page preview |
| Entities | Company, ticker, and figure extraction |
| Documents | Upload, list, delete, and single-doc selection |
Use Hybrid RAG to search all indexed documents, or Single Document to scope chat to one selected document.
Environment Variables
| Variable | Default | Description |
|---|---|---|
MODAL_APP_NAME |
finsight-ai |
Deployed Modal app name |
FAISS_DATA_DIR |
./data/faiss |
FAISS index + metadata path |
CHAT_DB_PATH |
./data/chat_sessions.db |
SQLite session store |
API Endpoints (optional)
| Endpoint | Method | Description |
|---|---|---|
/api/chat |
POST | SSE streaming RAG chat |
/api/documents/upload |
POST | Upload PDF/image |
/api/documents/list |
GET | List ingested documents |
/api/summarize |
POST | Financial summary |
/api/ocr |
POST | OCR extraction |
/api/extract-entities |
POST | Entity extraction |
/api/sessions |
GET/POST | Chat session management |
License
Apache-2.0 (models from OpenBMB)