# FinSight AI Finance-domain RAG QA system powered by MiniCPM models on **Modal** (serverless GPU), FAISS hybrid search, **Gradio UI**, and an optional FastAPI API. ## Architecture | Component | Where it runs | |-----------|---------------| | **Embeddings** (MiniCPM-Embedding NF4) | Modal T4 GPU | | **LLM** (MiniCPM4.1-8B GGUF) | Modal T4 GPU | | **OCR** (MiniCPM-V 4.6) | Modal A10G GPU | | **FAISS** | Local persisted index (`./data/faiss`) | | **Gradio UI** | Local `:7860` (calls Python services in-process) | | **FastAPI backend** | Local `:8000` (optional REST API) | ## Prerequisites - Python 3.11+ - Docker & Docker Compose (optional, for containerized Gradio/API) - [Modal](https://modal.com) account (`pip install modal && modal setup`) ## Setup ### 1. Deploy Modal inference workers From the project root: ```bash pip install modal modal setup modal deploy finsight_modal/app.py ``` This deploys three GPU classes to Modal: - `Embedder` — MiniCPM-Embedding NF4 4-bit (~1.6 GB VRAM) - `LLM` — MiniCPM4.1-8B Q4_K_M GGUF - `OCR` — MiniCPM-V 4.6 Models are downloaded automatically on first cold start into a persistent Modal Volume. Smoke test: ```bash modal run finsight_modal/app.py ``` ### 2. Start local services ```bash cp .env.example .env python -m venv .venv .\.venv\Scripts\Activate.ps1 # Windows # source .venv/bin/activate # macOS/Linux pip install -r requirements.txt -r backend/requirements.txt # Gradio UI (FAISS index stored under ./data/faiss) # Gradio UI (primary) cd backend && python -m gradio_ui.app ``` Open **http://localhost:7860** Optional REST API for scripts and integrations: ```bash cd backend && uvicorn main:app --reload --port 8000 ``` Or run Gradio + API with Docker: ```bash docker compose up gradio -d # optional API: docker compose up backend -d ``` ### 3. Modal credentials for Docker Mount your Modal token so containers can call deployed workers: ```bash # After modal setup, token is at ~/.modal.toml # Set MODAL_TOKEN_ID and MODAL_TOKEN_SECRET env vars, or mount the config file ``` See [Modal docs](https://modal.com/docs/guide/secrets) for token setup in CI/Docker. ## Gradio UI features | Tab | Description | |-----|-------------| | **QA** | Streaming finance chat with sources and confidence | | **Summary** | Financial / executive / risk summaries | | **OCR** | Structured document OCR with page preview | | **Entities** | Company, ticker, and figure extraction | | **Documents** | Upload, list, delete, and single-doc selection | Use **Hybrid RAG** to search all indexed documents, or **Single Document** to scope chat to one selected document. ## Environment Variables | Variable | Default | Description | |----------|---------|-------------| | `MODAL_APP_NAME` | `finsight-ai` | Deployed Modal app name | | `FAISS_DATA_DIR` | `./data/faiss` | FAISS index + metadata path | | `CHAT_DB_PATH` | `./data/chat_sessions.db` | SQLite session store | ## API Endpoints (optional) | Endpoint | Method | Description | |----------|--------|-------------| | `/api/chat` | POST | SSE streaming RAG chat | | `/api/documents/upload` | POST | Upload PDF/image | | `/api/documents/list` | GET | List ingested documents | | `/api/summarize` | POST | Financial summary | | `/api/ocr` | POST | OCR extraction | | `/api/extract-entities` | POST | Entity extraction | | `/api/sessions` | GET/POST | Chat session management | ## License Apache-2.0 (models from OpenBMB)