# FinSight AI

Finance-domain RAG QA system powered by MiniCPM models on **Modal** (serverless GPU), FAISS hybrid search, **Gradio UI**, and an optional FastAPI API.

## Architecture

| Component | Where it runs |
|-----------|---------------|
| **Embeddings** (MiniCPM-Embedding NF4) | Modal T4 GPU |
| **LLM** (MiniCPM4.1-8B GGUF) | Modal T4 GPU |
| **OCR** (MiniCPM-V 4.6) | Modal A10G GPU |
| **FAISS** | Local persisted index (`./data/faiss`) |
| **Gradio UI** | Local `:7860` (calls Python services in-process) |
| **FastAPI backend** | Local `:8000` (optional REST API) |

## Prerequisites

- Python 3.11+
- Docker & Docker Compose (optional, for containerized Gradio/API)
- [Modal](https://modal.com) account (`pip install modal && modal setup`)

## Setup

### 1. Deploy Modal inference workers

From the project root:

```bash
pip install modal
modal setup
modal deploy finsight_modal/app.py
```

This deploys three GPU classes to Modal:
- `Embedder` — MiniCPM-Embedding NF4 4-bit (~1.6 GB VRAM)
- `LLM` — MiniCPM4.1-8B Q4_K_M GGUF
- `OCR` — MiniCPM-V 4.6

Models are downloaded automatically on first cold start into a persistent Modal Volume.

Smoke test:

```bash
modal run finsight_modal/app.py
```

### 2. Start local services

```bash
cp .env.example .env
python -m venv .venv
.\.venv\Scripts\Activate.ps1   # Windows
# source .venv/bin/activate    # macOS/Linux
pip install -r requirements.txt -r backend/requirements.txt

# Gradio UI (FAISS index stored under ./data/faiss)

# Gradio UI (primary)
cd backend && python -m gradio_ui.app
```

Open **http://localhost:7860**

Optional REST API for scripts and integrations:

```bash
cd backend && uvicorn main:app --reload --port 8000
```

Or run Gradio + API with Docker:

```bash
docker compose up gradio -d
# optional API:
docker compose up backend -d
```

### 3. Modal credentials for Docker

Mount your Modal token so containers can call deployed workers:

```bash
# After modal setup, token is at ~/.modal.toml
# Set MODAL_TOKEN_ID and MODAL_TOKEN_SECRET env vars, or mount the config file
```

See [Modal docs](https://modal.com/docs/guide/secrets) for token setup in CI/Docker.

## Gradio UI features

| Tab | Description |
|-----|-------------|
| **QA** | Streaming finance chat with sources and confidence |
| **Summary** | Financial / executive / risk summaries |
| **OCR** | Structured document OCR with page preview |
| **Entities** | Company, ticker, and figure extraction |
| **Documents** | Upload, list, delete, and single-doc selection |

Use **Hybrid RAG** to search all indexed documents, or **Single Document** to scope chat to one selected document.

## Environment Variables

| Variable | Default | Description |
|----------|---------|-------------|
| `MODAL_APP_NAME` | `finsight-ai` | Deployed Modal app name |
| `FAISS_DATA_DIR` | `./data/faiss` | FAISS index + metadata path |
| `CHAT_DB_PATH` | `./data/chat_sessions.db` | SQLite session store |

## API Endpoints (optional)

| Endpoint | Method | Description |
|----------|--------|-------------|
| `/api/chat` | POST | SSE streaming RAG chat |
| `/api/documents/upload` | POST | Upload PDF/image |
| `/api/documents/list` | GET | List ingested documents |
| `/api/summarize` | POST | Financial summary |
| `/api/ocr` | POST | OCR extraction |
| `/api/extract-entities` | POST | Entity extraction |
| `/api/sessions` | GET/POST | Chat session management |

## License

Apache-2.0 (models from OpenBMB)