Spaces:
Sleeping
Sleeping
| # FinSight AI | |
| Finance-domain RAG QA system powered by MiniCPM models on **Modal** (serverless GPU), FAISS hybrid search, **Gradio UI**, and an optional FastAPI API. | |
| ## Architecture | |
| | Component | Where it runs | | |
| |-----------|---------------| | |
| | **Embeddings** (MiniCPM-Embedding NF4) | Modal T4 GPU | | |
| | **LLM** (MiniCPM4.1-8B GGUF) | Modal T4 GPU | | |
| | **OCR** (MiniCPM-V 4.6) | Modal A10G GPU | | |
| | **FAISS** | Local persisted index (`./data/faiss`) | | |
| | **Gradio UI** | Local `:7860` (calls Python services in-process) | | |
| | **FastAPI backend** | Local `:8000` (optional REST API) | | |
| ## Prerequisites | |
| - Python 3.11+ | |
| - Docker & Docker Compose (optional, for containerized Gradio/API) | |
| - [Modal](https://modal.com) account (`pip install modal && modal setup`) | |
| ## Setup | |
| ### 1. Deploy Modal inference workers | |
| From the project root: | |
| ```bash | |
| pip install modal | |
| modal setup | |
| modal deploy finsight_modal/app.py | |
| ``` | |
| This deploys three GPU classes to Modal: | |
| - `Embedder` β MiniCPM-Embedding NF4 4-bit (~1.6 GB VRAM) | |
| - `LLM` β MiniCPM4.1-8B Q4_K_M GGUF | |
| - `OCR` β MiniCPM-V 4.6 | |
| Models are downloaded automatically on first cold start into a persistent Modal Volume. | |
| Smoke test: | |
| ```bash | |
| modal run finsight_modal/app.py | |
| ``` | |
| ### 2. Start local services | |
| ```bash | |
| cp .env.example .env | |
| python -m venv .venv | |
| .\.venv\Scripts\Activate.ps1 # Windows | |
| # source .venv/bin/activate # macOS/Linux | |
| pip install -r requirements.txt -r backend/requirements.txt | |
| # Gradio UI (FAISS index stored under ./data/faiss) | |
| # Gradio UI (primary) | |
| cd backend && python -m gradio_ui.app | |
| ``` | |
| Open **http://localhost:7860** | |
| Optional REST API for scripts and integrations: | |
| ```bash | |
| cd backend && uvicorn main:app --reload --port 8000 | |
| ``` | |
| Or run Gradio + API with Docker: | |
| ```bash | |
| docker compose up gradio -d | |
| # optional API: | |
| docker compose up backend -d | |
| ``` | |
| ### 3. Modal credentials for Docker | |
| Mount your Modal token so containers can call deployed workers: | |
| ```bash | |
| # After modal setup, token is at ~/.modal.toml | |
| # Set MODAL_TOKEN_ID and MODAL_TOKEN_SECRET env vars, or mount the config file | |
| ``` | |
| See [Modal docs](https://modal.com/docs/guide/secrets) for token setup in CI/Docker. | |
| ## Gradio UI features | |
| | Tab | Description | | |
| |-----|-------------| | |
| | **QA** | Streaming finance chat with sources and confidence | | |
| | **Summary** | Financial / executive / risk summaries | | |
| | **OCR** | Structured document OCR with page preview | | |
| | **Entities** | Company, ticker, and figure extraction | | |
| | **Documents** | Upload, list, delete, and single-doc selection | | |
| Use **Hybrid RAG** to search all indexed documents, or **Single Document** to scope chat to one selected document. | |
| ## Environment Variables | |
| | Variable | Default | Description | | |
| |----------|---------|-------------| | |
| | `MODAL_APP_NAME` | `finsight-ai` | Deployed Modal app name | | |
| | `FAISS_DATA_DIR` | `./data/faiss` | FAISS index + metadata path | | |
| | `CHAT_DB_PATH` | `./data/chat_sessions.db` | SQLite session store | | |
| ## API Endpoints (optional) | |
| | Endpoint | Method | Description | | |
| |----------|--------|-------------| | |
| | `/api/chat` | POST | SSE streaming RAG chat | | |
| | `/api/documents/upload` | POST | Upload PDF/image | | |
| | `/api/documents/list` | GET | List ingested documents | | |
| | `/api/summarize` | POST | Financial summary | | |
| | `/api/ocr` | POST | OCR extraction | | |
| | `/api/extract-entities` | POST | Entity extraction | | |
| | `/api/sessions` | GET/POST | Chat session management | | |
| ## License | |
| Apache-2.0 (models from OpenBMB) | |