Blum / ENGINEERING_STANDARDS.md

Upload folder using huggingface_hub

2deb2c5 verified about 19 hours ago

3.83 kB

	# Engineering Standards

	Blum must be developed as a serious open-source technical case study. Every shipped increment should be designed for correctness, efficiency, transparency and maintainability.

	## Non-Negotiable Rules

	- Do not ship placeholders as working functionality.
	- Do not fabricate data.
	- Do not generate synthetic market prices, synthetic news, synthetic sentiment, synthetic backtest results or synthetic AI evidence.
	- If a data source fails, report the failure clearly instead of filling the gap with fake values.
	- If a feature cannot be fully implemented in the current increment, mark it as unavailable, document the limitation and do not present it as complete.
	- Keep code, comments, UI copy and documentation in English.
	- Treat every score as research triage, not financial advice or a trading recommendation.

	## Efficiency Standard

	Every implementation should aim for the best practical efficiency available within the current architecture.

	Required practices:

	- Prefer batch operations over per-row network or database calls.
	- Use incremental updates where possible.
	- Avoid recomputing indicators, embeddings or model outputs when persisted results are fresh.
	- Keep provider calls bounded, retry-aware and observable.
	- Keep AI models lazy-loaded and task-specific.
	- Cache expensive model or vector operations only when the cache is evidence-preserving and invalidation is clear.
	- Avoid blocking frontend rendering on long-running ingestion or model tasks.
	- Keep API responses structured, compact and explicit.

	## Data Integrity Standard

	Every stored data row must preserve its source.

	Required practices:

	- Persist provider names for OHLCV data.
	- Persist model names for sentiment, embeddings and AI explanations.
	- Persist timestamps for ingestion, scoring and insight generation.
	- Distinguish missing data from zero values.
	- Distinguish provider failure from no matching data.
	- Never silently downgrade from real data to synthetic data.

	## AI Standard

	AI modules must be specialized and evidence-bound.

	Required practices:

	- Use FinBERT or equivalent financial NLP for financial sentiment when available.
	- Keep VADER as a baseline or fallback, not the primary engine when FinBERT is available.
	- Use sentence-transformers for semantic retrieval and clustering.
	- Use the LLM only for structured explanation from retrieved evidence.
	- The LLM must not invent facts, prices, events, forward returns, recommendations or catalysts.
	- Store model metadata with each AI output.

	## Signal Standard

	Signals must be explainable and auditable.

	Required practices:

	- Store factor inputs and normalized score components.
	- Version score logic when weights or formulas change.
	- Separate signal score, confidence, risk and classification.
	- Explain why an asset surfaced.
	- Explain what confirms the signal.
	- Explain what contradicts the signal.
	- Explain what to monitor next.
	- Report missing evidence as part of the decision.

	## UI Standard

	The frontend must behave like a financial intelligence platform, not a decorative landing page.

	Required practices:

	- Prioritize dense but readable information.
	- Show what to watch and why.
	- Make loading, empty and error states explicit.
	- Use charts and tables to support decisions, not decoration.
	- Keep dark professional styling, clear hierarchy and responsive layouts.
	- Avoid consumer-style visual filler.

	## Verification Standard

	Every meaningful increment should include verification.

	Minimum checks:

	- Python syntax or unit checks for backend changes.
	- Type/build checks for frontend changes when dependencies are available.
	- API smoke checks for changed endpoints when runtime is available.
	- Documentation update when behavior, architecture or limitations change.
	- Clear statement of what was and was not verified.