Blum / README.md

Upload folder using huggingface_hub

2deb2c5 verified 1 day ago

9.54 kB

	---
	title: Blum AI Financial Intelligence
	emoji: 📈
	colorFrom: yellow
	colorTo: gray
	sdk: docker
	app_port: 7860
	short_description: Open-source AI market intelligence case study.
	tags: [financial-analysis, finance, stock-market, ai, fastapi, nextjs, postgresql, sentiment-analysis, time-series, data-visualization]
	pinned: false
	---

	# Blum AI Financial Intelligence

	Blum is an open-source technical case study for AI financial intelligence. It is designed to analyze equities and ETFs, filter watchlist candidates, explain market narratives, build transparent signals and validate signal behavior historically.

	This is not a consumer trading app and not a simple dashboard. The project is a full-stack platform that demonstrates how specialized AI modules, quantitative finance features, semantic news analysis and explainable research workflows can be assembled into a credible market intelligence system.

	## Architecture

	\| Layer \| Stack \|
	\| --- \| --- \|
	\| Frontend \| Next.js, React, Plotly, dark financial intelligence UI \|
	\| Backend \| FastAPI, Pydantic, APScheduler live services \|
	\| Database \| PostgreSQL, SQLAlchemy, Alembic \|
	\| Market data \| yfinance, Yahoo Chart API and Stooq provider chain \|
	\| News ingestion \| RSS feeds, public web-search RSS, deduplication, ticker linking \|
	\| AI sentiment \| FinBERT primary, VADER baseline \|
	\| Semantic layer \| sentence-transformers embeddings, semantic search, theme discovery \|
	\| Reasoning \| lightweight Qwen-compatible LLM evidence-only explanation layer \|
	\| Time-series intelligence \| statistical fallback compatible with future Chronos, TimesFM or PatchTST adapters \|
	\| Deployment \| Hugging Face Docker Space \|

	## AI Model Routing

	Blum does not use one generic AI model for everything.

	- FinBERT: financial sentiment for headlines, article summaries and company-linked news.
	- VADER: baseline comparator and fallback.
	- sentence-transformers: embeddings for semantic search, narrative clustering, recurring themes and links between assets, sectors and macro trends.
	- Qwen-compatible lightweight LLM: structured explanations from retrieved evidence only.
	- Statistical time-series module: anomalies, volatility regimes and scenario bands, ready for Chronos, TimesFM or PatchTST integration.
	- Rule-based quantitative engine: scoring, ranking, risk controls and classifications.

	## Data Workflow

	1. Seed the asset universe with stocks, ETFs, sectors, countries, industries and descriptions.
	2. Download OHLCV price history from yfinance, Yahoo Chart API and Stooq public daily data, using maximum available history when requested.
	3. Store prices in PostgreSQL.
	4. Start the live pipeline on application boot.
	5. Fetch public RSS news plus dynamic public web-search RSS queries for assets and financial themes.
	6. Deduplicate articles.
	7. Link articles to tickers and sectors.
	8. Run FinBERT sentiment and VADER baseline.
	9. Generate embeddings for semantic retrieval.
	10. Compute technical indicators and time-series anomalies.
	11. Generate signal snapshots with a Blum Intelligence Score.
	12. Produce AI explanations using only retrieved evidence.

	## Live Runtime

	When the FastAPI application starts, APScheduler launches a background intelligence worker:

	- `startup_pipeline`: news ingestion, historical price collection, signal generation and ETF trend update.
	- `news_refresh`: public news refresh every 10 minutes by default.
	- `market_refresh`: recent OHLCV refresh and signal regeneration every 45 minutes by default.

	The dashboard polls live JSON endpoints every 30 seconds and shows worker state, latest public news, sentiment distribution, source/model diagnostics and signal readiness. No generated headlines, generated prices or fabricated sentiment are shown.

	Every equity and ETF surface includes an explicit market snapshot when real OHLCV data is available: last price, currency, date, provider, volume and 1D/5D/1M performance. If public providers have not returned usable prices yet, the UI shows a real-data pending state instead of a fabricated value.

	## Signal Methodology

	The signal engine combines:

	- momentum: 1D, 5D, 1M, 3M, 6M, YTD and relative strength;
	- trend quality: SMA/EMA structure, slopes, ADX, persistence and drawdown;
	- volatility and risk: historical volatility, ATR, beta, downside volatility, gaps and volume spikes;
	- technical indicators: RSI, MACD, Bollinger Bands, support and resistance;
	- news and sentiment: FinBERT sentiment, VADER baseline, 7D/30D sentiment trend and news intensity;
	- semantic themes: recurring narratives such as AI, rates, earnings, guidance, geopolitics, M&A, regulation, supply chain and innovation;
	- ETF intelligence: ETF momentum, thematic confirmation and rotation;
	- anomaly detection: price, volume, news and narrative divergences.

	The final score is called the Blum Intelligence Score. It produces explainable classifications:

	- Strong Watch
	- Watch
	- Neutral
	- Avoid / Too Risky
	- Contrarian Setup
	- Narrative Breakout
	- Technical Breakout
	- Sentiment Divergence

	## API Endpoints

	FastAPI exposes clean JSON endpoints:

	- `GET /assets`
	- `GET /assets/{ticker}`
	- `POST /market/update`
	- `POST /news/update`
	- `GET /news/live`
	- `GET /sentiment/market`
	- `POST /signals/run`
	- `POST /pipeline/run`
	- `GET /pipeline/status`
	- `GET /signals/top`
	- `GET /signals/{ticker}`
	- `GET /sentiment/{ticker}`
	- `POST /semantic-search`
	- `GET /related-news?ticker=NVDA`
	- `GET /themes`
	- `GET /etf-trends`
	- `GET /dashboard/overview`
	- `GET /ai/explain/{ticker}`
	- `POST /backtest/{ticker}`

	Interactive API docs are available at `/docs`.

	`GET /ai/explain/{ticker}` is auto-hydrating: if no signal snapshot exists yet, the backend attempts on-demand real public price hydration, ticker-specific news ingestion and signal generation before returning an explanation. If verified data is still insufficient, it returns an `Insufficient Evidence` explanation with provider diagnostics instead of fabricating a signal.

	## Frontend Pages

	- Case Study Home
	- Intelligence Dashboard
	- Asset Detail
	- ETF Radar
	- Theme Explorer
	- Signal Lab
	- Backtest
	- Methodology

	The UI is intentionally dense, dark and technical: Bloomberg-style information density, Linear/Vercel-style cleanliness, TradingView-style chart clarity and OpenBB-style open-source posture.

	## Local Setup

	```bash
	cd hf-blum-mvp
	python -m venv .venv
	source .venv/bin/activate
	pip install -r requirements.txt
	npm --prefix frontend install
	npm --prefix frontend run build
	export DATABASE_URL=postgresql+psycopg2://postgres:postgres@127.0.0.1:5432/blum
	PYTHONPATH=backend uvicorn app.main:app --host 0.0.0.0 --port 7860
	```

	## Docker

	```bash
	cd hf-blum-mvp
	docker build -t blum-ai-financial-intelligence .
	docker run --rm -p 7860:7860 blum-ai-financial-intelligence
	```

	If `DATABASE_URL` is not set, the Docker demo starts an embedded PostgreSQL instance inside the container. For production-like use, provide an external PostgreSQL database:

	```bash
	docker run --rm -p 7860:7860 \
	-e DATABASE_URL=postgresql+psycopg2://user:password@host:5432/blum \
	blum-ai-financial-intelligence
	```

	## Hugging Face Spaces Deployment

	Use a Docker Space. Upload the repository with:

	- `Dockerfile`
	- `requirements.txt`
	- `backend/`
	- `frontend/`
	- `scripts/`
	- `package.json`
	- `README.md`

	The Space serves the FastAPI backend and the exported Next.js frontend on port `7860`.

	## Backtesting and Validation

	Backtesting is included for research validation only. It reports historical hit rate, average forward return over 5D/20D/60D, max adverse excursion, max favorable excursion and false positives. It does not predict or guarantee future returns.

	## Limitations

	- Public RSS, Google News RSS search, Yahoo and Stooq are demo-grade public data sources, not licensed institutional feeds.
	- The system does not generate synthetic prices. If public providers fail or rate-limit, the affected assets are reported as missing instead of being filled with fake data.
	- FinBERT, embeddings and LLM model loading depend on runtime memory and Hugging Face model availability.
	- The reasoning layer must not invent data; it is constrained to retrieved evidence.
	- Signal classifications are research triage outputs, not investment recommendations.
	- PostgreSQL is the database layer; the Docker demo can start an embedded PostgreSQL instance for Hugging Face convenience.

	## Financial Disclaimer

	This project is for educational, research and technical case-study purposes only. It does not constitute financial advice, investment advice, a recommendation, a trading signal, portfolio guidance or an offer to buy or sell any security. Always perform independent research and consult qualified professionals before making financial decisions.

	## Roadmap

	The execution roadmap is tracked in [`ROADMAP.md`](ROADMAP.md). It covers Docker Space stabilization, data ingestion reliability, AI model productionization, semantic intelligence, signal engine upgrades, ETF intelligence, backtesting, frontend UX, provider architecture, testing and open-source polish.

	## Engineering Standards

	Development standards are tracked in [`ENGINEERING_STANDARDS.md`](ENGINEERING_STANDARDS.md). The project explicitly rejects placeholders, fabricated data and synthetic market-data fallbacks. Every shipped increment should be evidence-bound, efficient, explainable and verified.

	## Contributing

	Contributions should preserve the project philosophy: transparent evidence, modular models, explainable scoring, no fabricated data and no investment recommendations.