Aniket2003333333's picture
start
7248d39
|
Raw
History Blame Contribute Delete
3.45 kB
# FinSight AI
Finance-domain RAG QA system powered by MiniCPM models on **Modal** (serverless GPU), FAISS hybrid search, **Gradio UI**, and an optional FastAPI API.
## Architecture
| Component | Where it runs |
|-----------|---------------|
| **Embeddings** (MiniCPM-Embedding NF4) | Modal T4 GPU |
| **LLM** (MiniCPM4.1-8B GGUF) | Modal T4 GPU |
| **OCR** (MiniCPM-V 4.6) | Modal A10G GPU |
| **FAISS** | Local persisted index (`./data/faiss`) |
| **Gradio UI** | Local `:7860` (calls Python services in-process) |
| **FastAPI backend** | Local `:8000` (optional REST API) |
## Prerequisites
- Python 3.11+
- Docker & Docker Compose (optional, for containerized Gradio/API)
- [Modal](https://modal.com) account (`pip install modal && modal setup`)
## Setup
### 1. Deploy Modal inference workers
From the project root:
```bash
pip install modal
modal setup
modal deploy finsight_modal/app.py
```
This deploys three GPU classes to Modal:
- `Embedder` β€” MiniCPM-Embedding NF4 4-bit (~1.6 GB VRAM)
- `LLM` β€” MiniCPM4.1-8B Q4_K_M GGUF
- `OCR` β€” MiniCPM-V 4.6
Models are downloaded automatically on first cold start into a persistent Modal Volume.
Smoke test:
```bash
modal run finsight_modal/app.py
```
### 2. Start local services
```bash
cp .env.example .env
python -m venv .venv
.\.venv\Scripts\Activate.ps1 # Windows
# source .venv/bin/activate # macOS/Linux
pip install -r requirements.txt -r backend/requirements.txt
# Gradio UI (FAISS index stored under ./data/faiss)
# Gradio UI (primary)
cd backend && python -m gradio_ui.app
```
Open **http://localhost:7860**
Optional REST API for scripts and integrations:
```bash
cd backend && uvicorn main:app --reload --port 8000
```
Or run Gradio + API with Docker:
```bash
docker compose up gradio -d
# optional API:
docker compose up backend -d
```
### 3. Modal credentials for Docker
Mount your Modal token so containers can call deployed workers:
```bash
# After modal setup, token is at ~/.modal.toml
# Set MODAL_TOKEN_ID and MODAL_TOKEN_SECRET env vars, or mount the config file
```
See [Modal docs](https://modal.com/docs/guide/secrets) for token setup in CI/Docker.
## Gradio UI features
| Tab | Description |
|-----|-------------|
| **QA** | Streaming finance chat with sources and confidence |
| **Summary** | Financial / executive / risk summaries |
| **OCR** | Structured document OCR with page preview |
| **Entities** | Company, ticker, and figure extraction |
| **Documents** | Upload, list, delete, and single-doc selection |
Use **Hybrid RAG** to search all indexed documents, or **Single Document** to scope chat to one selected document.
## Environment Variables
| Variable | Default | Description |
|----------|---------|-------------|
| `MODAL_APP_NAME` | `finsight-ai` | Deployed Modal app name |
| `FAISS_DATA_DIR` | `./data/faiss` | FAISS index + metadata path |
| `CHAT_DB_PATH` | `./data/chat_sessions.db` | SQLite session store |
## API Endpoints (optional)
| Endpoint | Method | Description |
|----------|--------|-------------|
| `/api/chat` | POST | SSE streaming RAG chat |
| `/api/documents/upload` | POST | Upload PDF/image |
| `/api/documents/list` | GET | List ingested documents |
| `/api/summarize` | POST | Financial summary |
| `/api/ocr` | POST | OCR extraction |
| `/api/extract-entities` | POST | Entity extraction |
| `/api/sessions` | GET/POST | Chat session management |
## License
Apache-2.0 (models from OpenBMB)