Spaces:

build-small-hackathon
/

FinSightAI

Sleeping

App Files Files Community

FinSightAI / README.md

Aniket2003333333

Update README.md

942389c verified 14 days ago

preview code

Raw

History Blame Contribute Delete

6.47 kB

	---
	title: FinSight AI
	emoji: 📊
	colorFrom: blue
	colorTo: green
	sdk: gradio
	app_file: app.py
	python_version: "3.11"
	pinned: false
	tags:
	- track:backyard
	- sponsor:openbmb
	- sponsor:modal
	- achievement:offgrid
	---

	# FinSight AI

	Finance-domain Retrieval-Augmented Generation (RAG) assistant built with OpenBMB MiniCPM models. Upload earnings reports, bank statements, and filings — then chat, summarize, run OCR, and extract entities with cited answers.

	Inference runs on Modal serverless GPUs; the Gradio UI, FAISS vector index, and document store stay local (or on Hugging Face Spaces). No 32B+ models — everything fits comfortably under the Build Small / SLM hackathon limits.

	---

	## What it does

	\| Tab \| Description \|
	\|-----\|-------------\|
	\| Finance QA Chatbot \| Streaming RAG chat with source citations and confidence \|
	\| Financial Summary \| Executive, financial, or risk-focused summaries \|
	\| Document OCR \| Structured OCR for scanned PDFs and images \|
	\| Entity Extraction \| Companies, tickers, dates, and key figures \|
	\| Upload Documents \| Ingest, list, delete, and scope search to one file \|

	Search modes: Hybrid RAG (semantic + BM25 across all docs) or Single Document (chat scoped to one upload).

	---

	## Architecture

	\| Component \| Model \| Where it runs \| VRAM \|
	\|-----------\|-------\|---------------\|------\|
	\| Embeddings \| MiniCPM-Embedding (4-bit NF4) \| Modal T4 \| ~1.6 GB \|
	\| LLM \| MiniCPM4.1-8B (Q4_K_M GGUF) \| Modal T4 \| ~5 GB \|
	\| OCR / Vision \| MiniCPM-V 4.6 \| Modal A10G \| ~2 GB \|
	\| Vector search \| FAISS + BM25 hybrid \| Local / HF Space \| CPU \|
	\| UI \| Gradio 6 \| `:7860` \| CPU \|
	\| REST API (optional) \| FastAPI \| `:8000` \| CPU \|

	Models download automatically on first Modal cold start into a persistent volume (`finsight-hf-cache`).

	---

	## Quick Start

	### 1. Deploy Modal workers (one-time)

	```bash
	pip install modal
	modal setup
	modal deploy finsight_modal/app.py
	```

	Smoke test:

	```bash
	modal run finsight_modal/app.py
	```

	View deployment: [modal.com/apps](https://modal.com/apps) → finsight-ai

	### 2. Run locally

	```bash
	cp .env.example .env
	python -m venv .venv
	.\.venv\Scripts\Activate.ps1 # Windows
	# source .venv/bin/activate # macOS / Linux

	pip install -r requirements.txt -r backend/requirements.txt
	python app.py
	```

	Open http://localhost:7860

	Optional REST API:

	```bash
	cd backend && uvicorn main:app --reload --port 8000
	```

	Docker:

	```bash
	docker compose up gradio -d
	# optional API:
	docker compose up backend -d
	```

	---

	## Hugging Face Spaces

	The Space entry point is `app.py` at the repo root (Gradio SDK).

	Add these Secrets in Space settings:

	\| Secret \| Description \|
	\|--------\|-------------\|
	\| `MODAL_TOKEN_ID` \| From `~/.modal.toml` after `modal setup` (starts with `ak-`) \|
	\| `MODAL_TOKEN_SECRET` \| Paired secret (starts with `as-`) \|
	\| `MODAL_APP_NAME` \| `finsight-ai` (must match deployed Modal app) \|

	Get tokens locally:

	```powershell
	# Windows
	Get-Content $env:USERPROFILE\.modal.toml
	```

	Or create new tokens at [modal.com/settings](https://modal.com/settings).

	> Note: FAISS indexes and uploaded documents persist under `./data/` locally. On HF Spaces, storage is ephemeral unless you attach a persistent volume — re-upload docs after restarts.

	---

	## Modal credentials (Docker / CI)

	After `modal setup`, credentials live in `~/.modal.toml`:

	```toml
	[default]
	token_id = "ak-..."
	token_secret = "as-..."
	```

	Set as environment variables (overrides the file):

	```bash
	export MODAL_TOKEN_ID="ak-..."
	export MODAL_TOKEN_SECRET="as-..."
	export MODAL_APP_NAME="finsight-ai"
	```

	See [Modal token docs](https://modal.com/docs/reference/modal.config) for CI and Docker setup.

	---

	## Environment Variables

	\| Variable \| Default \| Description \|
	\|----------\|---------\|-------------\|
	\| `MODAL_APP_NAME` \| `finsight-ai` \| Deployed Modal app name \|
	\| `FAISS_DATA_DIR` \| `./data/faiss` \| FAISS index + chunk metadata \|
	\| `CHAT_DB_PATH` \| `./data/chat_sessions.db` \| SQLite chat sessions \|
	\| `TOP_K` \| `6` \| Retrieved chunks per query \|
	\| `CHUNK_SIZE` \| `512` \| Ingestion chunk size (tokens) \|
	\| `CHUNK_OVERLAP` \| `64` \| Chunk overlap \|
	\| `HYBRID_ALPHA` \| `0.6` \| Semantic vs BM25 blend (0–1) \|

	---

	## Model Summary

	\| Model \| Size \| Quantization \| Source \|
	\|-------\|------\|--------------\|--------\|
	\| MiniCPM-Embedding \| 0.4B \| 4-bit NF4 (BnB) \| [openbmb/MiniCPM-Embedding](https://huggingface.co/openbmb/MiniCPM-Embedding) \|
	\| MiniCPM4.1-8B \| 8B \| Q4_K_M GGUF \| [openbmb/MiniCPM4.1-8B](https://huggingface.co/openbmb/MiniCPM4.1-8B) \|
	\| MiniCPM-V 4.6 \| 1B \| fp16 \| [openbmb/MiniCPM-V-4.6](https://huggingface.co/openbmb/MiniCPM-V-4.6) \|

	All OpenBMB models: Apache 2.0 · Hugging Face Hub

	Total stack stays well below the 32B Build Small parameter limit.

	---

	## REST API (optional)

	\| Endpoint \| Method \| Description \|
	\|----------\|--------\|-------------\|
	\| `/api/chat` \| POST \| SSE streaming RAG chat \|
	\| `/api/documents/upload` \| POST \| Upload PDF / image \|
	\| `/api/documents/list` \| GET \| List ingested documents \|
	\| `/api/summarize` \| POST \| Financial summary \|
	\| `/api/ocr` \| POST \| OCR extraction \|
	\| `/api/extract-entities` \| POST \| Entity extraction \|
	\| `/api/sessions` \| GET / POST \| Chat session management \|

	---

	## Repository Structure

	```text
	app.py # HF Space entry (Gradio)
	backend/
	gradio_ui/ # Tabs, theme, custom CSS
	services/ # RAG, ingestion, summarizer
	models/ # Modal client wrappers
	db/ # FAISS + SQLite
	routers/ # FastAPI routes
	finsight_modal/
	app.py # Modal GPU workers (deploy separately)
	data/ # FAISS index + uploads (gitignored)
	requirements.txt
	docker-compose.yml
	```

	---

	## Hackathon Context

	Built for the Hugging Face Build Small Hackathon and the SLM Hackathon track (Project 09 — FinSight Statement Auditor lineage). Uses efficient OpenBMB models with Modal offload so the UI runs on CPU while GPUs spin up only for inference.

	\| Badge \| How FinSight qualifies \|
	\|-------\|------------------------\|
	\| Build Small \| All models combined ≪ 32B params \|
	\| Off the Grid \| Document index + FAISS stay on-device; only inference hits Modal \|
	\| Off-Brand \| Custom FinSight Gradio theme (gold accent, finance-first layout) \|

	---

	## License

	Apache-2.0 (application code and OpenBMB model weights)