Aniket2003333333's picture
start
7248d39
|
Raw
History Blame Contribute Delete
3.45 kB

A newer version of the Gradio SDK is available: 6.19.0

Upgrade

FinSight AI

Finance-domain RAG QA system powered by MiniCPM models on Modal (serverless GPU), FAISS hybrid search, Gradio UI, and an optional FastAPI API.

Architecture

Component Where it runs
Embeddings (MiniCPM-Embedding NF4) Modal T4 GPU
LLM (MiniCPM4.1-8B GGUF) Modal T4 GPU
OCR (MiniCPM-V 4.6) Modal A10G GPU
FAISS Local persisted index (./data/faiss)
Gradio UI Local :7860 (calls Python services in-process)
FastAPI backend Local :8000 (optional REST API)

Prerequisites

  • Python 3.11+
  • Docker & Docker Compose (optional, for containerized Gradio/API)
  • Modal account (pip install modal && modal setup)

Setup

1. Deploy Modal inference workers

From the project root:

pip install modal
modal setup
modal deploy finsight_modal/app.py

This deploys three GPU classes to Modal:

  • Embedder — MiniCPM-Embedding NF4 4-bit (~1.6 GB VRAM)
  • LLM — MiniCPM4.1-8B Q4_K_M GGUF
  • OCR — MiniCPM-V 4.6

Models are downloaded automatically on first cold start into a persistent Modal Volume.

Smoke test:

modal run finsight_modal/app.py

2. Start local services

cp .env.example .env
python -m venv .venv
.\.venv\Scripts\Activate.ps1   # Windows
# source .venv/bin/activate    # macOS/Linux
pip install -r requirements.txt -r backend/requirements.txt

# Gradio UI (FAISS index stored under ./data/faiss)

# Gradio UI (primary)
cd backend && python -m gradio_ui.app

Open http://localhost:7860

Optional REST API for scripts and integrations:

cd backend && uvicorn main:app --reload --port 8000

Or run Gradio + API with Docker:

docker compose up gradio -d
# optional API:
docker compose up backend -d

3. Modal credentials for Docker

Mount your Modal token so containers can call deployed workers:

# After modal setup, token is at ~/.modal.toml
# Set MODAL_TOKEN_ID and MODAL_TOKEN_SECRET env vars, or mount the config file

See Modal docs for token setup in CI/Docker.

Gradio UI features

Tab Description
QA Streaming finance chat with sources and confidence
Summary Financial / executive / risk summaries
OCR Structured document OCR with page preview
Entities Company, ticker, and figure extraction
Documents Upload, list, delete, and single-doc selection

Use Hybrid RAG to search all indexed documents, or Single Document to scope chat to one selected document.

Environment Variables

Variable Default Description
MODAL_APP_NAME finsight-ai Deployed Modal app name
FAISS_DATA_DIR ./data/faiss FAISS index + metadata path
CHAT_DB_PATH ./data/chat_sessions.db SQLite session store

API Endpoints (optional)

Endpoint Method Description
/api/chat POST SSE streaming RAG chat
/api/documents/upload POST Upload PDF/image
/api/documents/list GET List ingested documents
/api/summarize POST Financial summary
/api/ocr POST OCR extraction
/api/extract-entities POST Entity extraction
/api/sessions GET/POST Chat session management

License

Apache-2.0 (models from OpenBMB)