FinSightAI / README.md
Aniket2003333333's picture
Update README.md
942389c verified
|
Raw
History Blame Contribute Delete
6.47 kB

A newer version of the Gradio SDK is available: 6.19.0

Upgrade
metadata
title: FinSight AI
emoji: πŸ“Š
colorFrom: blue
colorTo: green
sdk: gradio
app_file: app.py
python_version: '3.11'
pinned: false
tags:
  - track:backyard
  - sponsor:openbmb
  - sponsor:modal
  - achievement:offgrid

FinSight AI

Finance-domain Retrieval-Augmented Generation (RAG) assistant built with OpenBMB MiniCPM models. Upload earnings reports, bank statements, and filings β€” then chat, summarize, run OCR, and extract entities with cited answers.

Inference runs on Modal serverless GPUs; the Gradio UI, FAISS vector index, and document store stay local (or on Hugging Face Spaces). No 32B+ models β€” everything fits comfortably under the Build Small / SLM hackathon limits.


What it does

Tab Description
Finance QA Chatbot Streaming RAG chat with source citations and confidence
Financial Summary Executive, financial, or risk-focused summaries
Document OCR Structured OCR for scanned PDFs and images
Entity Extraction Companies, tickers, dates, and key figures
Upload Documents Ingest, list, delete, and scope search to one file

Search modes: Hybrid RAG (semantic + BM25 across all docs) or Single Document (chat scoped to one upload).


Architecture

Component Model Where it runs VRAM
Embeddings MiniCPM-Embedding (4-bit NF4) Modal T4 ~1.6 GB
LLM MiniCPM4.1-8B (Q4_K_M GGUF) Modal T4 ~5 GB
OCR / Vision MiniCPM-V 4.6 Modal A10G ~2 GB
Vector search FAISS + BM25 hybrid Local / HF Space CPU
UI Gradio 6 :7860 CPU
REST API (optional) FastAPI :8000 CPU

Models download automatically on first Modal cold start into a persistent volume (finsight-hf-cache).


Quick Start

1. Deploy Modal workers (one-time)

pip install modal
modal setup
modal deploy finsight_modal/app.py

Smoke test:

modal run finsight_modal/app.py

View deployment: modal.com/apps β†’ finsight-ai

2. Run locally

cp .env.example .env
python -m venv .venv
.\.venv\Scripts\Activate.ps1   # Windows
# source .venv/bin/activate    # macOS / Linux

pip install -r requirements.txt -r backend/requirements.txt
python app.py

Open http://localhost:7860

Optional REST API:

cd backend && uvicorn main:app --reload --port 8000

Docker:

docker compose up gradio -d
# optional API:
docker compose up backend -d

Hugging Face Spaces

The Space entry point is app.py at the repo root (Gradio SDK).

Add these Secrets in Space settings:

Secret Description
MODAL_TOKEN_ID From ~/.modal.toml after modal setup (starts with ak-)
MODAL_TOKEN_SECRET Paired secret (starts with as-)
MODAL_APP_NAME finsight-ai (must match deployed Modal app)

Get tokens locally:

# Windows
Get-Content $env:USERPROFILE\.modal.toml

Or create new tokens at modal.com/settings.

Note: FAISS indexes and uploaded documents persist under ./data/ locally. On HF Spaces, storage is ephemeral unless you attach a persistent volume β€” re-upload docs after restarts.


Modal credentials (Docker / CI)

After modal setup, credentials live in ~/.modal.toml:

[default]
token_id = "ak-..."
token_secret = "as-..."

Set as environment variables (overrides the file):

export MODAL_TOKEN_ID="ak-..."
export MODAL_TOKEN_SECRET="as-..."
export MODAL_APP_NAME="finsight-ai"

See Modal token docs for CI and Docker setup.


Environment Variables

Variable Default Description
MODAL_APP_NAME finsight-ai Deployed Modal app name
FAISS_DATA_DIR ./data/faiss FAISS index + chunk metadata
CHAT_DB_PATH ./data/chat_sessions.db SQLite chat sessions
TOP_K 6 Retrieved chunks per query
CHUNK_SIZE 512 Ingestion chunk size (tokens)
CHUNK_OVERLAP 64 Chunk overlap
HYBRID_ALPHA 0.6 Semantic vs BM25 blend (0–1)

Model Summary

Model Size Quantization Source
MiniCPM-Embedding 0.4B 4-bit NF4 (BnB) openbmb/MiniCPM-Embedding
MiniCPM4.1-8B 8B Q4_K_M GGUF openbmb/MiniCPM4.1-8B
MiniCPM-V 4.6 1B fp16 openbmb/MiniCPM-V-4.6

All OpenBMB models: Apache 2.0 Β· Hugging Face Hub

Total stack stays well below the 32B Build Small parameter limit.


REST API (optional)

Endpoint Method Description
/api/chat POST SSE streaming RAG chat
/api/documents/upload POST Upload PDF / image
/api/documents/list GET List ingested documents
/api/summarize POST Financial summary
/api/ocr POST OCR extraction
/api/extract-entities POST Entity extraction
/api/sessions GET / POST Chat session management

Repository Structure

app.py                  # HF Space entry (Gradio)
backend/
  gradio_ui/            # Tabs, theme, custom CSS
  services/             # RAG, ingestion, summarizer
  models/               # Modal client wrappers
  db/                   # FAISS + SQLite
  routers/              # FastAPI routes
finsight_modal/
  app.py                # Modal GPU workers (deploy separately)
data/                   # FAISS index + uploads (gitignored)
requirements.txt
docker-compose.yml

Hackathon Context

Built for the Hugging Face Build Small Hackathon and the SLM Hackathon track (Project 09 β€” FinSight Statement Auditor lineage). Uses efficient OpenBMB models with Modal offload so the UI runs on CPU while GPUs spin up only for inference.

Badge How FinSight qualifies
Build Small All models combined β‰ͺ 32B params
Off the Grid Document index + FAISS stay on-device; only inference hits Modal
Off-Brand Custom FinSight Gradio theme (gold accent, finance-first layout)

License

Apache-2.0 (application code and OpenBMB model weights)