Spaces:

build-small-hackathon
/

FinSightAI

Sleeping

App Files Files Community

FinSightAI / README.md

Aniket2003333333

Update README.md

942389c verified 14 days ago

preview code

Raw

History Blame Contribute Delete

6.47 kB

A newer version of the Gradio SDK is available: 6.19.0

Upgrade

metadata

title: FinSight AI
emoji: 📊
colorFrom: blue
colorTo: green
sdk: gradio
app_file: app.py
python_version: '3.11'
pinned: false
tags:
  - track:backyard
  - sponsor:openbmb
  - sponsor:modal
  - achievement:offgrid

FinSight AI

Finance-domain Retrieval-Augmented Generation (RAG) assistant built with OpenBMB MiniCPM models. Upload earnings reports, bank statements, and filings — then chat, summarize, run OCR, and extract entities with cited answers.

Inference runs on Modal serverless GPUs; the Gradio UI, FAISS vector index, and document store stay local (or on Hugging Face Spaces). No 32B+ models — everything fits comfortably under the Build Small / SLM hackathon limits.

What it does

Tab	Description
Finance QA Chatbot	Streaming RAG chat with source citations and confidence
Financial Summary	Executive, financial, or risk-focused summaries
Document OCR	Structured OCR for scanned PDFs and images
Entity Extraction	Companies, tickers, dates, and key figures
Upload Documents	Ingest, list, delete, and scope search to one file

Search modes: Hybrid RAG (semantic + BM25 across all docs) or Single Document (chat scoped to one upload).

Architecture

Component	Model	Where it runs	VRAM
Embeddings	MiniCPM-Embedding (4-bit NF4)	Modal T4	~1.6 GB
LLM	MiniCPM4.1-8B (Q4_K_M GGUF)	Modal T4	~5 GB
OCR / Vision	MiniCPM-V 4.6	Modal A10G	~2 GB
Vector search	FAISS + BM25 hybrid	Local / HF Space	CPU
UI	Gradio 6	`:7860`	CPU
REST API (optional)	FastAPI	`:8000`	CPU

Models download automatically on first Modal cold start into a persistent volume (finsight-hf-cache).

Quick Start

1. Deploy Modal workers (one-time)

pip install modal
modal setup
modal deploy finsight_modal/app.py

Smoke test:

modal run finsight_modal/app.py

View deployment: modal.com/apps → finsight-ai

2. Run locally

cp .env.example .env
python -m venv .venv
.\.venv\Scripts\Activate.ps1   # Windows
# source .venv/bin/activate    # macOS / Linux

pip install -r requirements.txt -r backend/requirements.txt
python app.py

Open http://localhost:7860

Optional REST API:

cd backend && uvicorn main:app --reload --port 8000

Docker:

docker compose up gradio -d
# optional API:
docker compose up backend -d

Hugging Face Spaces

The Space entry point is app.py at the repo root (Gradio SDK).

Add these Secrets in Space settings:

Secret	Description
`MODAL_TOKEN_ID`	From `~/.modal.toml` after `modal setup` (starts with `ak-`)
`MODAL_TOKEN_SECRET`	Paired secret (starts with `as-`)
`MODAL_APP_NAME`	`finsight-ai` (must match deployed Modal app)

Get tokens locally:

# Windows
Get-Content $env:USERPROFILE\.modal.toml

Or create new tokens at modal.com/settings.

Note: FAISS indexes and uploaded documents persist under ./data/ locally. On HF Spaces, storage is ephemeral unless you attach a persistent volume — re-upload docs after restarts.

Modal credentials (Docker / CI)

After modal setup, credentials live in ~/.modal.toml:

[default]
token_id = "ak-..."
token_secret = "as-..."

Set as environment variables (overrides the file):

export MODAL_TOKEN_ID="ak-..."
export MODAL_TOKEN_SECRET="as-..."
export MODAL_APP_NAME="finsight-ai"

See Modal token docs for CI and Docker setup.

Environment Variables

Variable	Default	Description
`MODAL_APP_NAME`	`finsight-ai`	Deployed Modal app name
`FAISS_DATA_DIR`	`./data/faiss`	FAISS index + chunk metadata
`CHAT_DB_PATH`	`./data/chat_sessions.db`	SQLite chat sessions
`TOP_K`	`6`	Retrieved chunks per query
`CHUNK_SIZE`	`512`	Ingestion chunk size (tokens)
`CHUNK_OVERLAP`	`64`	Chunk overlap
`HYBRID_ALPHA`	`0.6`	Semantic vs BM25 blend (0–1)

Model Summary

Model	Size	Quantization	Source
MiniCPM-Embedding	0.4B	4-bit NF4 (BnB)	openbmb/MiniCPM-Embedding
MiniCPM4.1-8B	8B	Q4_K_M GGUF	openbmb/MiniCPM4.1-8B
MiniCPM-V 4.6	1B	fp16	openbmb/MiniCPM-V-4.6

All OpenBMB models: Apache 2.0 · Hugging Face Hub

Total stack stays well below the 32B Build Small parameter limit.

REST API (optional)

Endpoint	Method	Description
`/api/chat`	POST	SSE streaming RAG chat
`/api/documents/upload`	POST	Upload PDF / image
`/api/documents/list`	GET	List ingested documents
`/api/summarize`	POST	Financial summary
`/api/ocr`	POST	OCR extraction
`/api/extract-entities`	POST	Entity extraction
`/api/sessions`	GET / POST	Chat session management

Repository Structure

app.py                  # HF Space entry (Gradio)
backend/
  gradio_ui/            # Tabs, theme, custom CSS
  services/             # RAG, ingestion, summarizer
  models/               # Modal client wrappers
  db/                   # FAISS + SQLite
  routers/              # FastAPI routes
finsight_modal/
  app.py                # Modal GPU workers (deploy separately)
data/                   # FAISS index + uploads (gitignored)
requirements.txt
docker-compose.yml

Hackathon Context

Built for the Hugging Face Build Small Hackathon and the SLM Hackathon track (Project 09 — FinSight Statement Auditor lineage). Uses efficient OpenBMB models with Modal offload so the UI runs on CPU while GPUs spin up only for inference.

Badge	How FinSight qualifies
Build Small	All models combined ≪ 32B params
Off the Grid	Document index + FAISS stay on-device; only inference hits Modal
Off-Brand	Custom FinSight Gradio theme (gold accent, finance-first layout)

License

Apache-2.0 (application code and OpenBMB model weights)