Engineering Standards

Blum must be developed as a serious open-source technical case study. Every shipped increment should be designed for correctness, efficiency, transparency and maintainability.

Non-Negotiable Rules

Do not ship placeholders as working functionality.
Do not fabricate data.
Do not generate synthetic market prices, synthetic news, synthetic sentiment, synthetic backtest results or synthetic AI evidence.
If a data source fails, report the failure clearly instead of filling the gap with fake values.
If a feature cannot be fully implemented in the current increment, mark it as unavailable, document the limitation and do not present it as complete.
Keep code, comments, UI copy and documentation in English.
Treat every score as research triage, not financial advice or a trading recommendation.

Efficiency Standard

Every implementation should aim for the best practical efficiency available within the current architecture.

Required practices:

Prefer batch operations over per-row network or database calls.
Use incremental updates where possible.
Avoid recomputing indicators, embeddings or model outputs when persisted results are fresh.
Keep provider calls bounded, retry-aware and observable.
Keep AI models lazy-loaded and task-specific.
Cache expensive model or vector operations only when the cache is evidence-preserving and invalidation is clear.
Avoid blocking frontend rendering on long-running ingestion or model tasks.
Keep API responses structured, compact and explicit.

Data Integrity Standard

Every stored data row must preserve its source.

Required practices:

Persist provider names for OHLCV data.
Persist model names for sentiment, embeddings and AI explanations.
Persist timestamps for ingestion, scoring and insight generation.
Distinguish missing data from zero values.
Distinguish provider failure from no matching data.
Never silently downgrade from real data to synthetic data.

AI Standard

AI modules must be specialized and evidence-bound.

Required practices:

Use FinBERT or equivalent financial NLP for financial sentiment when available.
Keep VADER as a baseline or fallback, not the primary engine when FinBERT is available.
Use sentence-transformers for semantic retrieval and clustering.
Use the LLM only for structured explanation from retrieved evidence.
The LLM must not invent facts, prices, events, forward returns, recommendations or catalysts.
Store model metadata with each AI output.

Signal Standard

Signals must be explainable and auditable.

Required practices:

Store factor inputs and normalized score components.
Version score logic when weights or formulas change.
Separate signal score, confidence, risk and classification.
Explain why an asset surfaced.
Explain what confirms the signal.
Explain what contradicts the signal.
Explain what to monitor next.
Report missing evidence as part of the decision.

UI Standard

The frontend must behave like a financial intelligence platform, not a decorative landing page.

Required practices:

Prioritize dense but readable information.
Show what to watch and why.
Make loading, empty and error states explicit.
Use charts and tables to support decisions, not decoration.
Keep dark professional styling, clear hierarchy and responsive layouts.
Avoid consumer-style visual filler.

Verification Standard

Every meaningful increment should include verification.

Minimum checks:

Python syntax or unit checks for backend changes.
Type/build checks for frontend changes when dependencies are available.
API smoke checks for changed endpoints when runtime is available.
Documentation update when behavior, architecture or limitations change.
Clear statement of what was and was not verified.