Blum / ROADMAP.md

Upload folder using huggingface_hub

2deb2c5 verified 1 day ago

8.93 kB

	# Blum AI Financial Intelligence Roadmap

	This roadmap is the execution plan for turning Blum into a credible open-source AI financial intelligence platform. Each phase should ship with working code, API coverage, UI coverage, documentation and explicit limitations.

	## Current Increment - Live Intelligence Runtime

	Shipped in the current architecture:

	- FastAPI startup launches a background APScheduler worker.
	- Startup pipeline ingests public news first, then historical OHLCV, signals and ETF trends.
	- Public news ingestion combines publisher RSS feeds, thematic Google News RSS queries and asset-specific public web-search RSS queries.
	- `/news/live`, `/sentiment/market` and `/pipeline/status` expose live news, market sentiment and worker state.
	- Dashboard polls live endpoints every 30 seconds and surfaces news tape, sentiment mix, source/model state and signal readiness.
	- Price pipeline remains real-data-only through yfinance, Yahoo Chart API and Stooq; no synthetic OHLCV fallback is allowed.

	## Phase 0 - Stabilize Docker Space Deployment

	Goal: make the Hugging Face Docker Space build and boot reliably.

	Deliverables:

	- Confirm Docker build succeeds on Hugging Face Spaces.
	- Verify FastAPI serves the exported Next.js frontend on port `7860`.
	- Verify embedded PostgreSQL starts when `DATABASE_URL` is not provided.
	- Verify `/health`, `/docs`, `/assets` and `/dashboard/overview`.
	- Add lightweight demo mode if model dependencies exceed Space resources.
	- Add clear build/runtime troubleshooting notes.

	Exit criteria:

	- Public Space loads without manual intervention.
	- API docs are reachable.
	- Seed asset universe is visible.
	- No Gradio remnants remain in runtime metadata.

	## Phase 1 - Data Ingestion Reliability

	Goal: make prices, assets and news ingestion reliable enough for a serious demo.

	Deliverables:

	- Harden yfinance provider with retries, partial failures and provider status.
	- Keep real-data-only behavior: no synthetic OHLCV fallback in production or demo paths.
	- Maintain a public provider chain for prices: yfinance, Yahoo Chart API and Stooq.
	- Add incremental OHLCV updates instead of replacing recent rows blindly.
	- Improve RSS source health diagnostics.
	- Add article text extraction fallback with `newspaper3k` and BeautifulSoup.
	- Add stronger duplicate detection across titles, URLs and canonical keys.
	- Add source reliability and stale-data scoring.
	- Add ingestion audit logs.

	Exit criteria:

	- `/market/update` reports per-ticker success/failure.
	- `/news/update` reports source-level diagnostics.
	- Duplicate articles are materially reduced.
	- Failed feeds do not break the pipeline.

	## Phase 2 - AI Model Productionization

	Goal: make AI model usage explicit, measurable and robust under limited compute.

	Deliverables:

	- Add model availability endpoint.
	- Add lazy loading and memory-aware fallback for FinBERT, embeddings and LLM.
	- Persist model run metadata for every sentiment and AI insight.
	- Add prompt templates with strict evidence-only formatting.
	- Add deterministic fallback explanations for low-resource mode.
	- Add batch sentiment processing for article updates.
	- Add configurable model names through Space variables.

	Exit criteria:

	- `/ai/explain/{ticker}` returns structured JSON with models used.
	- FinBERT is the primary sentiment engine when available.
	- VADER is clearly labeled as baseline/fallback.
	- No explanation claims facts absent from retrieved evidence.

	## Phase 3 - Semantic Intelligence Layer

	Goal: turn news embeddings into useful narrative intelligence, not just search.

	Deliverables:

	- Persist FAISS indexes by namespace or rebuild them efficiently at startup.
	- Add semantic cluster snapshots in `ThemeCluster`.
	- Link themes to assets, sectors, ETFs and macro drivers.
	- Add theme trend over time.
	- Add narrative intensity, recurrence and polarization metrics.
	- Add `/themes/{label}` detail endpoint.
	- Add UI for theme drill-down and related assets.

	Exit criteria:

	- Theme Explorer shows real cluster metadata.
	- Semantic search returns relevant articles with similarity scores.
	- Each high-scoring signal can cite related semantic themes.

	## Phase 4 - Signal Engine Upgrade

	Goal: make the Blum Intelligence Score more defensible and auditable.

	Deliverables:

	- Split score modules into separate files: momentum, trend, risk, technicals, sentiment, semantics, ETF and anomaly.
	- Add score versioning.
	- Store full score inputs and normalized factors.
	- Add factor weights in config.
	- Add confidence score distinct from signal score.
	- Add signal lifecycle states: new, confirmed, faded, invalidated.
	- Add price/sentiment divergence logic with explicit thresholds.

	Exit criteria:

	- Every score is reproducible from stored inputs.
	- UI can show why each score changed.
	- `/signals/{ticker}` includes score version and factor weights.

	## Phase 5 - ETF And Sector Intelligence

	Goal: make ETF confirmation a first-class intelligence layer.

	Deliverables:

	- Map stocks to confirming ETFs and sector proxies.
	- Add stock/ETF correlation analysis.
	- Add ETF rotation rankings by sector and theme.
	- Add ETF confirmation score into every asset signal.
	- Add sector heatmap and rotation charts.
	- Add ETF vs benchmark comparison views.

	Exit criteria:

	- ETF Radar identifies rotation leaders.
	- Asset Detail shows confirming or contradicting ETFs.
	- Signal explanations reference ETF confirmation when available.

	## Phase 6 - Backtesting And Validation

	Goal: validate historical signal behavior without implying prediction or advice.

	Deliverables:

	- Add walk-forward validation by score threshold and classification.
	- Add benchmark-relative forward returns.
	- Add false positive analysis by signal type.
	- Add max adverse/favorable excursion distributions.
	- Add result storage by run configuration.
	- Add backtest charts: equity curve, drawdown, forward return distribution.
	- Add methodology caveats in UI and API.

	Exit criteria:

	- Backtest page can compare classifications historically.
	- API reports hit rate, average forward return and benchmark-relative behavior.
	- Disclaimer is visible on every validation output.

	## Phase 7 - Frontend Intelligence UX

	Goal: elevate the UI from technical demo to portfolio-grade case study.

	Deliverables:

	- Add skeleton loading and refined error states on every page.
	- Add richer Plotly charts: RSI, MACD, Bollinger Bands, sentiment timeline, score history and correlation heatmap.
	- Add sortable/filterable tables.
	- Add advanced Signal Lab filters.
	- Add exportable signal results.
	- Add responsive refinements for tablet/mobile.
	- Add real screenshots generated from a running Space build to README.

	Exit criteria:

	- Dashboard immediately communicates what to watch and why.
	- Asset Detail explains price, sentiment, risk, news and ETF confirmation in one flow.
	- UI remains dense but readable.

	## Phase 8 - Provider Architecture

	Goal: make the platform extensible beyond yfinance and RSS.

	Deliverables:

	- Define provider interfaces for market data, news, filings, transcripts, estimates and ownership.
	- Add provider registry.
	- Add mock provider for tests.
	- Add optional adapters for filings and public company IR pages.
	- Prepare connectors for future licensed data sources.

	Exit criteria:

	- New providers can be added without changing signal code.
	- Provider status appears in API and UI.

	## Phase 9 - Testing, Observability And Quality

	Goal: make the codebase credible as an open-source engineering project.

	Deliverables:

	- Add backend unit tests for indicators, scoring, ingestion and API.
	- Add frontend component tests where practical.
	- Add smoke tests for Docker startup.
	- Add structured logging.
	- Add error telemetry fields in API responses.
	- Add CI-ready commands in README.

	Exit criteria:

	- Core scoring and ingestion logic are covered by tests.
	- Docker smoke test catches startup regressions.
	- Contributors can run checks locally.

	## Phase 10 - Open Source Polish

	Goal: make the repository compelling as a public case study.

	Deliverables:

	- Add architecture diagram.
	- Add screenshots.
	- Add contribution guide.
	- Add issue templates.
	- Add model/data limitation section.
	- Add changelog.
	- Add clear demo setup instructions for low-resource and full-AI modes.

	Exit criteria:

	- A reviewer can understand the project, run it, inspect the architecture and contribute without needing private context.

	## Working Rule

	Every roadmap step should ship as a complete increment:

	- backend logic;
	- API endpoint or schema update when needed;
	- frontend visibility when user-facing;
	- documentation update;
	- verification command;
	- explicit limitation or disclaimer when relevant.

	All implementation work must follow [`ENGINEERING_STANDARDS.md`](ENGINEERING_STANDARDS.md): no placeholders, no fabricated data, no synthetic market-data fallback, evidence-bound AI, efficient provider calls and explicit verification.