Blum / README.md

Upload folder using huggingface_hub

2deb2c5 verified about 24 hours ago

9.54 kB

title: Blum AI Financial Intelligence
emoji: 📈
colorFrom: yellow
colorTo: gray
sdk: docker
app_port: 7860
short_description: Open-source AI market intelligence case study.
tags:
  - financial-analysis
  - finance
  - stock-market
  - ai
  - fastapi
  - nextjs
  - postgresql
  - sentiment-analysis
  - time-series
  - data-visualization
pinned: false

Blum AI Financial Intelligence

Blum is an open-source technical case study for AI financial intelligence. It is designed to analyze equities and ETFs, filter watchlist candidates, explain market narratives, build transparent signals and validate signal behavior historically.

This is not a consumer trading app and not a simple dashboard. The project is a full-stack platform that demonstrates how specialized AI modules, quantitative finance features, semantic news analysis and explainable research workflows can be assembled into a credible market intelligence system.

Architecture

Layer	Stack
Frontend	Next.js, React, Plotly, dark financial intelligence UI
Backend	FastAPI, Pydantic, APScheduler live services
Database	PostgreSQL, SQLAlchemy, Alembic
Market data	yfinance, Yahoo Chart API and Stooq provider chain
News ingestion	RSS feeds, public web-search RSS, deduplication, ticker linking
AI sentiment	FinBERT primary, VADER baseline
Semantic layer	sentence-transformers embeddings, semantic search, theme discovery
Reasoning	lightweight Qwen-compatible LLM evidence-only explanation layer
Time-series intelligence	statistical fallback compatible with future Chronos, TimesFM or PatchTST adapters
Deployment	Hugging Face Docker Space

AI Model Routing

Blum does not use one generic AI model for everything.

FinBERT: financial sentiment for headlines, article summaries and company-linked news.
VADER: baseline comparator and fallback.
sentence-transformers: embeddings for semantic search, narrative clustering, recurring themes and links between assets, sectors and macro trends.
Qwen-compatible lightweight LLM: structured explanations from retrieved evidence only.
Statistical time-series module: anomalies, volatility regimes and scenario bands, ready for Chronos, TimesFM or PatchTST integration.
Rule-based quantitative engine: scoring, ranking, risk controls and classifications.

Data Workflow

Seed the asset universe with stocks, ETFs, sectors, countries, industries and descriptions.
Download OHLCV price history from yfinance, Yahoo Chart API and Stooq public daily data, using maximum available history when requested.
Store prices in PostgreSQL.
Start the live pipeline on application boot.
Fetch public RSS news plus dynamic public web-search RSS queries for assets and financial themes.
Deduplicate articles.
Link articles to tickers and sectors.
Run FinBERT sentiment and VADER baseline.
Generate embeddings for semantic retrieval.
Compute technical indicators and time-series anomalies.
Generate signal snapshots with a Blum Intelligence Score.
Produce AI explanations using only retrieved evidence.

Live Runtime

When the FastAPI application starts, APScheduler launches a background intelligence worker:

startup_pipeline: news ingestion, historical price collection, signal generation and ETF trend update.
news_refresh: public news refresh every 10 minutes by default.
market_refresh: recent OHLCV refresh and signal regeneration every 45 minutes by default.

The dashboard polls live JSON endpoints every 30 seconds and shows worker state, latest public news, sentiment distribution, source/model diagnostics and signal readiness. No generated headlines, generated prices or fabricated sentiment are shown.

Every equity and ETF surface includes an explicit market snapshot when real OHLCV data is available: last price, currency, date, provider, volume and 1D/5D/1M performance. If public providers have not returned usable prices yet, the UI shows a real-data pending state instead of a fabricated value.

Signal Methodology

The signal engine combines:

momentum: 1D, 5D, 1M, 3M, 6M, YTD and relative strength;
trend quality: SMA/EMA structure, slopes, ADX, persistence and drawdown;
volatility and risk: historical volatility, ATR, beta, downside volatility, gaps and volume spikes;
technical indicators: RSI, MACD, Bollinger Bands, support and resistance;
news and sentiment: FinBERT sentiment, VADER baseline, 7D/30D sentiment trend and news intensity;
semantic themes: recurring narratives such as AI, rates, earnings, guidance, geopolitics, M&A, regulation, supply chain and innovation;
ETF intelligence: ETF momentum, thematic confirmation and rotation;
anomaly detection: price, volume, news and narrative divergences.

The final score is called the Blum Intelligence Score. It produces explainable classifications:

Strong Watch
Watch
Neutral
Avoid / Too Risky
Contrarian Setup
Narrative Breakout
Technical Breakout
Sentiment Divergence

API Endpoints

FastAPI exposes clean JSON endpoints:

GET /assets
GET /assets/{ticker}
POST /market/update
POST /news/update
GET /news/live
GET /sentiment/market
POST /signals/run
POST /pipeline/run
GET /pipeline/status
GET /signals/top
GET /signals/{ticker}
GET /sentiment/{ticker}
POST /semantic-search
GET /related-news?ticker=NVDA
GET /themes
GET /etf-trends
GET /dashboard/overview
GET /ai/explain/{ticker}
POST /backtest/{ticker}

Interactive API docs are available at /docs.

GET /ai/explain/{ticker} is auto-hydrating: if no signal snapshot exists yet, the backend attempts on-demand real public price hydration, ticker-specific news ingestion and signal generation before returning an explanation. If verified data is still insufficient, it returns an Insufficient Evidence explanation with provider diagnostics instead of fabricating a signal.

Frontend Pages

Case Study Home
Intelligence Dashboard
Asset Detail
ETF Radar
Theme Explorer
Signal Lab
Backtest
Methodology

The UI is intentionally dense, dark and technical: Bloomberg-style information density, Linear/Vercel-style cleanliness, TradingView-style chart clarity and OpenBB-style open-source posture.

Local Setup

cd hf-blum-mvp
python -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt
npm --prefix frontend install
npm --prefix frontend run build
export DATABASE_URL=postgresql+psycopg2://postgres:postgres@127.0.0.1:5432/blum
PYTHONPATH=backend uvicorn app.main:app --host 0.0.0.0 --port 7860

Docker

cd hf-blum-mvp
docker build -t blum-ai-financial-intelligence .
docker run --rm -p 7860:7860 blum-ai-financial-intelligence

If DATABASE_URL is not set, the Docker demo starts an embedded PostgreSQL instance inside the container. For production-like use, provide an external PostgreSQL database:

docker run --rm -p 7860:7860 \
  -e DATABASE_URL=postgresql+psycopg2://user:password@host:5432/blum \
  blum-ai-financial-intelligence

Hugging Face Spaces Deployment

Use a Docker Space. Upload the repository with:

Dockerfile
requirements.txt
backend/
frontend/
scripts/
package.json
README.md

The Space serves the FastAPI backend and the exported Next.js frontend on port 7860.

Backtesting and Validation

Backtesting is included for research validation only. It reports historical hit rate, average forward return over 5D/20D/60D, max adverse excursion, max favorable excursion and false positives. It does not predict or guarantee future returns.

Limitations

Public RSS, Google News RSS search, Yahoo and Stooq are demo-grade public data sources, not licensed institutional feeds.
The system does not generate synthetic prices. If public providers fail or rate-limit, the affected assets are reported as missing instead of being filled with fake data.
FinBERT, embeddings and LLM model loading depend on runtime memory and Hugging Face model availability.
The reasoning layer must not invent data; it is constrained to retrieved evidence.
Signal classifications are research triage outputs, not investment recommendations.
PostgreSQL is the database layer; the Docker demo can start an embedded PostgreSQL instance for Hugging Face convenience.

Financial Disclaimer

This project is for educational, research and technical case-study purposes only. It does not constitute financial advice, investment advice, a recommendation, a trading signal, portfolio guidance or an offer to buy or sell any security. Always perform independent research and consult qualified professionals before making financial decisions.

Roadmap

The execution roadmap is tracked in ROADMAP.md. It covers Docker Space stabilization, data ingestion reliability, AI model productionization, semantic intelligence, signal engine upgrades, ETF intelligence, backtesting, frontend UX, provider architecture, testing and open-source polish.

Engineering Standards

Development standards are tracked in ENGINEERING_STANDARDS.md. The project explicitly rejects placeholders, fabricated data and synthetic market-data fallbacks. Every shipped increment should be evidence-bound, efficient, explainable and verified.

Contributing

Contributions should preserve the project philosophy: transparent evidence, modular models, explainable scoring, no fabricated data and no investment recommendations.