Buckets:
| Name | Size | Uploaded | Xet hash |
|---|---|---|---|
| .codex | 6 items | ||
| .pytest_cache | 5 items | ||
| approvals | 12 items | ||
| bootstrap | 6 items | ||
| budget | 6 items | ||
| config | 8 items | ||
| configs | 20 items | ||
| contracts | 8 items | ||
| data_pipeline | 61 items | ||
| docs | 8 items | ||
| eval | 45 items | ||
| inference | 8 items | ||
| manifests | 11 items | ||
| model_policy | 17 items | ||
| monitoring | 6 items | ||
| n21 | 20 items | ||
| observability | 12 items | ||
| orchestrator | 26 items | ||
| prompt_packs | 7 items | ||
| providers | 18 items | ||
| release_packaging | 6 items | ||
| repair | 16 items | ||
| reporting | 6 items | ||
| risk | 4 items | ||
| rollback | 6 items | ||
| security | 6 items | ||
| tests | 26 items | ||
| training | 27 items | ||
| README.md | 56.6 kB xet | 412843eb | |
| pyproject.toml | 259 Bytes xet | 0f68d107 |
Linvest21 SHFT Platform
This is the dry-run-first implementation of the Linvest21 Self-Healing Fine-Tuning Platform.
Default behavior is safe:
- no live provider calls unless
--liveis explicitly passed; - model-profile policy using
fingptas the default current FinGPT bootstrap profile, withqwen3_32bavailable as a clean Apache-2.0 foundation profile; - explicit
train_providerandinfer_providerrouting; - heartbeat and audit JSONL events;
- per-iteration evidence and improvement reports;
- secret scanning and provider environment validation.
Run locally from this directory:
python -m n21.cli validate-config --provider hf_managed
python -m n21.cli select-model --task finance_qa --env dev
python -m n21.cli train --train-provider hf_managed
python -m n21.cli eval --run-id <run_id>
python -m n21.cli deploy --run-id <run_id> --infer-provider hf_managed --env stage
Model selection can be controlled by CLI or environment:
python -m n21.cli select-model --task finance_qa --env dev --model-profile fingpt
python -m n21.cli select-model --task finance_qa --env dev --model-profile qwen3_32b
SHFT_MODEL_PROFILE=qwen3_32b python -m n21.cli train --train-provider hf_managed
Runtime Boundaries
SHFT platform state and generated implementation products are separate:
impl_codex/self_healing_finetuning platform code and configuration
impl_codex/shft_workspace run evidence, registries, evals, logs, and verification reports
impl_codex/implementation_products versioned runnable model products
Current SHFT runs must write evidence under impl_codex/shft_workspace/runs/<run_id>. The generated implementation is an output of SHFT, not an input dependency for the SHFT run.
Current exports write versioned products under impl_codex/implementation_products/<model_id>. Active SHFT code does not use the retired unversioned implementation tree.
Controlled Super-Agent Flexibility
The 18 target Linvest21 FinGPT submodels are controlled by:
configs/super_agent_matrix.json
The matrix is:
3 asset classes: equity, fixed_income, multi_asset
6 roles: chief_investment_officer, client_portfolio_manager, performance_manager, portfolio_manager, researcher, risk_manager
The model ID template is:
linvest21_fingpt_<asset_class>_<role>_<version>
The versioned implementation product directory template is:
impl_codex/implementation_products/linvest21_fingpt_<asset_class>_<role>_<version>
Use the generic batch wrapper to keep operator commands simple while still parameterizing the asset class, role, and version:
impl_codex\scripts\run_linvest21fingpt_super_agent_to_implementation.bat equity researcher all v1_000
For an interactive single-entry dispatcher that lists all 18 super-agents and routes to the wrapper above, use the menu driver:
impl_codex\scripts\run_linvest21fingpt_super_agent_menu.bat REM interactive
impl_codex\scripts\run_linvest21fingpt_super_agent_menu.bat --list
impl_codex\scripts\run_linvest21fingpt_super_agent_menu.bat --index 5
impl_codex\scripts\run_linvest21fingpt_super_agent_menu.bat --asset equity --role researcher --version v1_000 --mode all
Current e2e Equity Researcher certification run:
set HF_TOKEN=<your-huggingface-token>
set SHFT_SUBMIT_HF_JOB=true
set SHFT_RUN_OWNER_EMAIL=<your-review-email>
set SHFT_HUMAN_REVIEW_EMAIL=true
set SHFT_HUMAN_REVIEW_TIMEOUT_SECONDS=1800
REM Optional SMTP/IMAP when actual email delivery/reply is desired.
set SHFT_EMAIL_DELIVERY=auto
set SHFT_SMTP_HOST=<smtp-host>
set SHFT_SMTP_PORT=587
set SHFT_SMTP_FROM=<your-review-email>
set SHFT_SMTP_USERNAME=<your-review-email>
set SHFT_SMTP_PASSWORD=<smtp-or-app-password>
set SHFT_IMAP_HOST=<imap-host>
set SHFT_IMAP_PORT=993
set SHFT_IMAP_USERNAME=<your-review-email>
set SHFT_IMAP_PASSWORD=<imap-or-app-password>
impl_codex\scripts\run_linvest21fingpt_super_agent_menu.bat --asset equity --role researcher --version v1_001 --mode all-until-certified --continue-best
Watch stdout/stderr for [SHFT heartbeat], [SHFT hf-status], [SHFT hf-log], [SHFT HUMAN REVIEW EMAIL], [SHFT HUMAN REVIEW ASK], and [SHFT VITAL MODEL QUALITY]. If email delivery is not configured, approve by writing the printed eval\human_spot_check_response.json file with decision=approve, reviewed_samples>=10, and critical_failures=0.
Model Profiles And Base-Model Switching
The base model is selected through impl_codex/self_healing_finetuning/configs/model_profiles.json. Profile resolution is implemented in model_policy/profiles.py, then applied into the model-selection, launch, Hugging Face provider, trainer, and paired-proof layers.
Current supported profiles:
| Profile | Model candidate | Base model | Start behavior | Proof baseline | Licensing posture | Best use |
|---|---|---|---|---|---|---|
fingpt |
linvest21/linvest21_fingpt_v1_000 |
meta-llama/Meta-Llama-3-8B |
bootstrap from approved Linvest21 FinGPT adapter | baseline adapter | Meta Llama 3 community license plus FinGPT adapter terms; commercial review required | Current finance-specialized bootstrap and continuity path |
qwen3_32b |
Qwen/Qwen3-32B |
Qwen/Qwen3-32B |
fresh QLoRA adapter from the base model | raw base model, no baseline adapter | Apache-2.0; commercial use allowed | Cleaner open commercial foundation candidate |
Examples:
set SHFT_MODEL_PROFILE=fingpt
impl_codex\scripts\run_linvest21fingpt_super_agent_menu.bat --asset equity --role researcher --version v1_001 --mode all-until-certified --continue-best
set SHFT_MODEL_PROFILE=qwen3_32b
impl_codex\scripts\run_linvest21fingpt_super_agent_menu.bat --asset equity --role researcher --version v1_001 --mode all-until-certified --finetune-start-policy bootstrap
SHFT_MODEL_CANDIDATE and SHFT_BASE_MODEL_ID remain emergency explicit overrides, but SHFT_MODEL_PROFILE is the clean operator interface. Adding a future model should be done by adding a profile entry with model_candidate, base_model_id, license metadata, provider overrides, and adapter_bootstrap semantics instead of hardcoding IDs in scripts.
The trainer currently uses LoRA target modules compatible with Llama/Qwen decoder blocks: q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, and down_proj. If a future architecture does not expose these module names, that profile needs a profile-specific target-module override before live training.
By default, each new super-agent fine-tune starts from the approved Linvest21 FinGPT bootstrap adapter:
Meta-Llama-3-8B base
+ linvest21/linvest21_fingpt_v1_000 bootstrap adapter
+ role-specific adapter being trained
This is controlled by --finetune-start-policy:
impl_codex\scripts\run_linvest21fingpt_super_agent_menu.bat --asset equity --role researcher --version v1_001 --mode all --finetune-start-policy bootstrap
impl_codex\scripts\run_linvest21fingpt_super_agent_menu.bat --asset equity --role researcher --version v1_001 --mode all --finetune-start-policy continue-best
bootstrap is the default. It starts from linvest21/linvest21_fingpt_v1_000 and avoids compounding bias from failed prior role adapters. continue-best is opt-in. It reads impl_codex/shft_workspace/best_runs/<release_id>.json and continues only from the best measured checkpoint for that exact asset/role/version when one exists. If no best-run record exists yet, continue-best records source=no_best_recorded_fallback_bootstrap and starts from the approved bootstrap adapter instead of entering a source-recovery retry loop.
For qwen3_32b, bootstrap means fresh QLoRA from Qwen/Qwen3-32B, because that profile intentionally has no bootstrap adapter. continue-best still uses the best measured checkpoint for the release when present.
2026-05-24 continuation audit:
- Equity has recorded best-run checkpoints for all six
v1_001roles, so--continue-bestresolves tosource=best_measured_checkpointfor Equity. - Fixed Income and Multi Asset currently have no recorded best-run checkpoint for any of their six
v1_001roles, so--continue-bestresolves tosource=no_best_recorded_fallback_bootstrapfor those first measured runs. - This fallback is first-run behavior, not certification. Each role still needs live HF training, fetch-proof, paired-eval proof, model-quality gate success, model card, and promotion evidence before certification.
Live provider checks are bounded so six parallel runs do not look stalled while waiting on external tooling. Hugging Face CLI calls use SHFT_HF_CLI_TIMEOUT_SECONDS with a default of 120 seconds, and repository secret scanning skips generated SHFT workspace/product directories and files larger than 1 MB.
For parallel role launches, use:
impl_codex\scripts\run_linvest21fingpt_parallel_super_agents.bat
impl_codex\scripts\run_linvest21fingpt_parallel_super_agents.bat --asset fixed_income --role researcher --role risk_manager --version v1_001 --mode status
impl_codex\scripts\run_linvest21fingpt_parallel_super_agents.bat --asset multi_asset --roles researcher,portfolio_manager,risk_manager --version v1_001 --mode all
impl_codex\scripts\run_linvest21fingpt_parallel_super_agents.bat --asset equity --version v1_001 --mode all-until-certified --continue-best
The parallel launcher defaults to the six equity roles, titles each window as SHFT <asset_class> <role> <version>, and cascades the windows so all runs remain visible. It only dispatches to the existing menu driver; all validation, paid-job guardrails, quality gates, and output paths remain centralized there.
For the full 18-agent matrix, use the all-asset launcher. It defaults to status mode so it is safe for coverage checks:
impl_codex\scripts\run_linvest21fingpt_all_super_agents.bat
impl_codex\scripts\run_linvest21fingpt_all_super_agents.bat --mode status --dry-run
impl_codex\scripts\run_linvest21fingpt_all_super_agents.bat --asset fixed_income --mode all-until-certified --continue-best
impl_codex\scripts\run_linvest21fingpt_all_super_agents.bat --mode all-until-certified --continue-best
For implementation evidence, audit every (asset_class, role) against the Equity Researcher package contract:
impl_codex\scripts\audit_super_agent_implementation_parity.bat v1_001 nofail
The audit writes JSON and Markdown evidence under impl_codex\shft_workspace\verification and refreshes super_agent_implementation_parity_latest.json and .md. The audit checks runnable packaging and identity consistency only; model-quality certification still requires paired proof and a passing eval/model_quality_gate.json.
For model-quality parity across the full 18-agent matrix, use the live certification loop. Start with a safe no-submit check:
impl_codex\scripts\run_linvest21fingpt_all_super_agents.bat --mode status --dry-run
Then set live-job credentials and submit only when budget and access are intentional:
set HF_TOKEN=your_huggingface_token
set SHFT_SUBMIT_HF_JOB=true
set LINVEST21_API_TOKEN=your_local_api_token
impl_codex\scripts\run_linvest21fingpt_all_super_agents.bat --mode all-until-certified --continue-best
One asset class or one role can be certified independently:
impl_codex\scripts\run_linvest21fingpt_all_super_agents.bat --asset fixed_income --mode all-until-certified --continue-best
impl_codex\scripts\run_linvest21fingpt_super_agent_menu.bat --asset fixed_income --role researcher --version v1_001 --mode all-until-certified --continue-best
The proof artifacts for each role are:
impl_codex\shft_workspace\runs\<run_id>\eval\paired_eval_report.json
impl_codex\shft_workspace\runs\<run_id>\eval\model_quality_gate.json
The role is quality-proven only when model_quality_gate.json has both ok=true and eligible_for_promotion=true. A matching implementation package or a declining training loss is not enough.
SHFT-IQ Score
The hard certification rule remains eval/model_quality_gate.json with ok=true and eligible_for_promotion=true. SHFT-IQ is the recommended weighted operator score for comparing candidate intelligence across runs; it should not override a failed hard gate.
Recommended SHFT-IQ composition:
| Factor | Weight | Measured by |
|---|---|---|
| Paired task accuracy | 35% | paired_eval_report.json candidate aggregate and task scores |
| Critical reasoning and safety | 20% | candidate critical pass rate and critical-pass delta |
| Baseline-relative improvement | 15% | pairwise win rate, aggregate delta, and pairwise loss rate |
| Generalization and overfit control | 10% | train/eval gap, late eval-loss regression, selected checkpoint, overfit flags |
| Corpus and repair coverage | 10% | dataset manifest, train/valid/test retention, repair coverage categories |
| Model-as-judge quality | 5% | model_judge_report.json mean score and rubric pass rate |
| Human spot-check quality | 5% | human_spot_check_report.json approval, reviewed samples, critical failures |
Suggested interpretation:
SHFT-IQ < 70 internal learning signal only
SHFT-IQ 70-79 research candidate; not production-facing without exception
SHFT-IQ >= 80 production candidate only if the hard model-quality gate also passes
The existing platform already measures the underlying factors during the self-healing loop. A future eval/shft_iq_report.json should persist the weighted scalar and the factor breakdown, while promotion continues to fail closed on the explicit gate checks in configs/thresholds/model_quality.yaml.
After paired proof has been fetched, rank the failure modes across all 18 roles before generating the next repair wave:
cd impl_codex\self_healing_finetuning
python -m n21.cli rank-paired-eval-defects
The ranker writes:
impl_codex\shft_workspace\verification\paired_eval_defect_ranking_latest.json
impl_codex\shft_workspace\verification\paired_eval_defect_ranking_latest.md
It uses the seven repair taxonomy buckets: numeric reasoning, fact/inference separation, role discipline, risk/tradeoff framing, hallucination or unsupported claim, weak source grounding, and overfit or memorized answer style. If a role has no eval\paired_predictions.jsonl yet, the ranker records proof_missing for that role instead of fabricating defects.
After ranking, build the defect-led repair files for the next training wave without launching training:
cd impl_codex\self_healing_finetuning
python -m n21.cli build-all-role-defect-repair
The generator reuses the paired-proof predictions and ranking taxonomy. It writes one role-specific JSONL under data\learning\<asset_class>\<role>\targeted_paired_proof_repair_<timestamp>.hf_finetune.jsonl, plus:
impl_codex\shft_workspace\verification\all_18_defect_repair_manifest_latest.json
impl_codex\shft_workspace\verification\all_18_defect_repair_manifest_latest.md
The coverage gate fails closed unless all 18 roles have non-empty repair files, valid Hugging Face chat-message JSONL records, and coverage for each role's top measured defects. This is the intended bridge from proof failure to the next SHFT training wave; it does not promote or train any role by itself.
The anti-overfit refactor and all-role TODO review is tracked in:
impl_codex\self_healing_finetuning\docs\refector_for_reduce_over_fit_v2_all_roles_review.md
The formal A-to-Z operating publication for the full self-healing and self-improving fine-tuning loop across all 18 asset/role pairs is:
impl_codex\self_healing_finetuning\docs\shft_a_to_z_all_roles_publication_v1.md
impl_codex\self_healing_finetuning\docs\shft_platform_review_all_18_roles_20260527.md
The A-to-Z publication is the current operator contract for source intake, training selection, live HF training, artifact sync, paired proof, quality gates, defect ranking, defect-led repair data, promotion rules, final stats, and known remaining flaws. The platform review reconciles the contract against the current scripts, thresholds, human approval path, transparent logs, proper exits, and all 18 role/model IDs.
Use that spec before launching another broad training wave; it separates package parity from quality parity and lists the checkpointing, metrics, holdout, and failure-ledger changes still needed.
Current 2026-05-27 all-role repair state:
impl_codex\shft_workspace\verification\paired_eval_defect_ranking_latest.md
impl_codex\shft_workspace\verification\all_18_defect_repair_manifest_latest.md
impl_codex\shft_workspace\verification\all_18_defect_repair_validation_latest.json
The latest all-role repair manifest shows roles_ok=18, roles_failed=0, output_file_count=18, and total_repair_rows=2095. This means the defect-led repair bridge is ready for the next selected-training build across all 18 roles. It does not mean any model is promoted or production-certified.
Recommended next move: pause long enough to complete trainer-side anti-overfit hardening before spending another all-role training wave. The highest-value work is structured trainer metrics, checkpoint-level validation, selected-checkpoint export, and stronger holdout proof. After that, launch the next wave using the generated role-specific defect repair files.
The parallel launcher uses one Windows Terminal window with one cmd.exe tab per role when wt.exe is available, and falls back to separate visible Command Prompt windows otherwise. The menu reads configs/super_agent_matrix.json, validates that data/learning/<asset_class>/<role> exists (exits 2 with a clear message otherwise), resolves the unique versioned Model ID (linvest21_fingpt_<asset_class>_<role>_<version>), then delegates to the per-agent wrapper, which packages the portable runtime (chat console + token-protected JSON API via LINVEST21_API_TOKEN) under impl_codex/implementation_products/<model_id>.
Live submit modes are intentionally guarded:
HF_TOKEN must be present.
SHFT_SUBMIT_HF_JOB must equal true.
Use --mode status or --list for safe inspection without job submission.
Validate the configured matrix, learning-data directories, and any generated implementation folders with:
impl_codex\scripts\validate_super_agent_matrix.bat
The bootstrap model linvest21/linvest21_fingpt_v1_000 is lineage and starting point. Exported runtime identity must be the super-agent model ID, and must appear in release_manifest.json, runtime/chat_config.json, chat output, /health, /v1/models, and /v1/chat/completions.
Production Inference Architecture
For Linvest21 production inference, use base model plus adapters as the default architecture:
shared approved base model
+ certified adapter for each asset_class/role/version
This is superior for the current 18-super-agent platform because the agents should share the same approved financial base capability while each adapter captures the role-specific investment behavior. It reduces storage, supports faster per-role retraining, gives each adapter separate training/eval/promotion evidence, and allows one role to roll back without touching the rest.
Publish each certified super-agent as an independent adapter model repo or artifact, for example:
linvest21_fingpt_equity_researcher_v1_001
linvest21_fingpt_equity_risk_manager_v1_001
linvest21_fingpt_fixed_income_risk_manager_v1_001
Merged or quantized full-model artifacts are optional deployment builds. Use them when a role needs offline serving, GGUF/local desktop deployment, different base-model lineage, or simpler single-model loading. They should not replace the certified adapter as the governance source of truth.
Step 0 Public-Source Intake Gate
Default policy:
Download public material automatically, but train only on material that passes the configured source-policy gate.
The policy and catalog are controlled by configuration, not hardcoded operator choices:
configs/data/source_policy.yaml
configs/data/public_source_catalog.json
configs/data/reasoning_frames.json
The 2026-05-19 to 2026-05-24 equity researcher and portfolio-manager logs showed a platform-level input failure: repeated public-source breakout cycles produced trainable_new_source_count: 0, while duckduckgo_html discovery repeatedly returned timeout, 403, or no-candidate results. SHFT now treats that as a recovery problem, not as a reason to keep sleeping and retrying the same query loop. If a blocked_after_breakout run has zero trainable new sources, the latest live-discovery retry has candidate_count=0, and the candidate did not improve the protected best checkpoint, continuous-status moves convergence to NEEDS_REASONING_DATA and writes no_candidate_retry_exhausted=true.
Source discovery is API-first for SEC-backed equity material. configs/data/source_policy.yaml sets live_discovery.provider: sec_api_first and live_discovery.duckduckgo_fallback_enabled: false, and data_pipeline/live_source_discovery.py uses the SEC submissions JSON endpoint without generic-search fallback unless that fallback is explicitly enabled. The external API reference is the SEC EDGAR API documentation: https://www.sec.gov/edgar/sec-api-documentation, which documents https://data.sec.gov/submissions/CIK##########.json as the submissions-history endpoint and states that data.sec.gov provides JSON APIs for EDGAR data. This transport change does not relax license policy, source-quality certification, train/verify splitting, or model promotion thresholds.
Stall breakout also has a second recovery leg. When source discovery downloads normalized verification-eligible material but no direct training source, orchestrator/stall_breakout.py calls generate_grounded_reasoning_examples_from_intake_records, writes synthetic_<asset>_<role>_critical_reasoning.hf_finetune.jsonl into the role corpus, and certifies every generated row before marking the run ready for the next training attempt. The breakout plan records reasoning_generation and intake.generated_reasoning_count so this is visible in evidence rather than hidden as a silent fallback.
If recovery still reaches a paid-retraining guard, n21.cli continuous-status --enforce-convergence calls orchestrator/human_owner_decision.py. The owner ask is sent to SHFT_RUN_OWNER_EMAIL (default david.d.lin@linvest21.com) and printed to stdout/stderr. The first valid human instruction wins, whether it arrives from stdin, runs/<run_id>/human_owner_response.json, human_owner_decisions/responses/<run_id>.json, or an IMAP email reply whose subject contains the run id; valid instructions are continue and exit. Delivery uses SHFT_EMAIL_DELIVERY=auto by default: SMTP is used when SHFT_SMTP_HOST or SHCG SMTP settings are configured, otherwise local Outlook COM is attempted, otherwise the request is written to impl_codex/shft_workspace/human_owner_decisions/outbox for audit. Use SHFT_EMAIL_DELIVERY=smtp, outlook, or outbox to force a path. SHFT falls back to SHCG_ALERT_SMTP_PASSWORD and SHCG_INBOUND_IMAP_PASSWORD; when those exist without explicit host/user values it defaults to Gmail SMTP/IMAP for david.d.lin@linvest21.com.
Run public-source intake for one super-agent:
cd impl_codex\self_healing_finetuning
python -m n21.cli intake-public-sources --asset-class equity --role risk_manager
Run training-data validation and optional quarantine recommendation:
cd impl_codex\self_healing_finetuning
python -m n21.cli validate-training-data --source ..\..\data\learning\equity\risk_manager --output-dir ..\shft_workspace\runs\<run_id>\data_validation --backup-dir ..\shft_workspace\runs\<run_id>\data_validation\quarantine_backup --apply-quarantine
The intake command writes raw downloads, approved copies, and review-required copies under:
data/learning_intake/<asset_class>/<role>
Only source-policy-approved material is promoted into:
data/learning/<asset_class>/<role>
Downloaded files are never trained directly unless they are already curated JSONL. They first pass through a normalization layer:
raw download -> normalized .normalized.json -> training/validation eligibility -> Step 0b JSONL
Default format policy:
pdf: normalize, train, and validate.jsonl: normalize/pass through, train, and validate.- clean direct
html: normalize and train, but do not use for validation by default. html_index: download and normalize for review, but do not train by default.txtandmd: normalize and train, but do not validate by default.unknown: do not normalize, train, or validate.
Transparency artifacts are written as source_intake_manifest.json, license_manifest.json, training_data_validation_report.json, conflict_report.json, and quarantine_manifest.json. This keeps automatic discovery/download flexible while keeping training strict and auditable.
Step 0b Role-Grounded Reasoning Data
The quality gate rewards explicit scenario analysis, red-flag decisions, pass/fail labels, and rationales. Public filings and investor bulletins are useful grounding material, but the logs proved they are usually verification material rather than directly trainable critical-reasoning material. Step 0b now generates grounded, role-specific reasoning examples from the existing role corpus and then sends each generated example through the existing local source-quality certifier.
Generate role-grounded reasoning data:
cd impl_codex\self_healing_finetuning
python -m n21.cli generate-reasoning-data --asset-class equity --role researcher --max-records 1500
python -m n21.cli generate-reasoning-data --asset-class equity --role portfolio_manager --max-records 1500
The generator writes:
data/learning/<asset_class>/<role>/synthetic_<asset_class>_<role>_critical_reasoning.hf_finetune.jsonl
data/learning/<asset_class>/<role>/synthetic_<asset_class>_<role>_critical_reasoning.hf_finetune.jsonl.manifest.json
Role behavior is controlled by:
configs/data/reasoning_frames.json
data_pipeline/reasoning_data_generation.py
Training selection enforces inclusion of the role-grounded reasoning file when it exists:
data_pipeline/learning_pdf_to_jsonl.py
selected_training_manifest_v1.required_reasoning_jsonls
selected_training_manifest_v1.required_reasoning_included
This is intentionally conservative. The generator does not self-certify its own output, and it does not lower the source-quality threshold. If the certifier rejects a generated example, that example is excluded and recorded in the manifest.
Step 0b PDF Parser Noise
Step 0b PDF conversion and public-source PDF normalization suppress only one known noisy pypdf message:
Ignoring wrong pointing object ... (offset 0)
The filter lives in data_pipeline/pdf_warning_filter.py and is applied by data_pipeline/learning_pdf_to_jsonl.py and data_pipeline/source_normalization.py. It does not hide real PDF extraction exceptions, invalid/tiny PDFs, skipped files, low text counts, or unrelated parser warnings. Operators should judge PDF extraction quality from the conversion report fields page_count_with_text, text_chars, record_count, skipped_pdf_count, and per-source status.
Release-Wide Failure Repair And Oversampling
For measured self-improvement, SHFT converts paired-eval failures into targeted repair examples for the next fresh dataset snapshot. The release-wide builder reads prior paired predictions for the same release, preserves source run ids in metadata, and writes role-local JSONL:
cd impl_codex\self_healing_finetuning
python -m n21.cli build-release-paired-eval-failure-repair --release-id linvest21_fingpt_equity_researcher_v1_001 --asset-class equity --role researcher --max-records 1500
Selected training can then oversample repair rows. The production batch wrappers set
SHFT_REPAIR_OVERSAMPLE_FACTOR=2 and fail closed with
SHFT_MAX_REPAIR_SELECTED_RATIO=0.75 unless the operator explicitly overrides
that environment variable:
python -m n21.cli build-learning-training-jsonl --source ..\..\data\learning\equity\researcher --output ..\shft_workspace\runs\<run_id>\training_selection\selected_training.jsonl --asset-class equity --role researcher --repair-oversample-factor 2
For strict corpus-composition checks, selected training can also cap the effective repair-row share:
python -m n21.cli build-learning-training-jsonl --source ..\..\data\learning\equity\researcher --output ..\shft_workspace\runs\<run_id>\training_selection\selected_training.jsonl --asset-class equity --role researcher --repair-oversample-factor 2 --max-repair-selected-ratio 0.75
The cap is fail-closed. The builder first keeps enough repair rows to satisfy the minimum coverage thresholds, then rejects the build if those mandatory repair rows still exceed the cap. The selected-training manifest records repair_cap_applied, max_repair_selected_ratio, source_repair_row_count, selected_repair_source_rows, and dropped_repair_source_rows. A run whose manifest shows repair_cap_applied=false is not acceptable for paid production evidence unless the operator records an explicit exception.
Before paid submission, the training script runs:
python -m n21.cli repair-coverage-gate --selected-training <selected_training.jsonl> --output <repair_coverage_gate.json>
The run may submit only after it prints SHFT VITAL REPAIR COVERAGE OK=True. This gate checks that the selected training set contains enough repair examples with numeric reasoning, fact/inference separation, neutral language, risk/tradeoff language, and critical reasoning. It is role-generic and applies to every asset/role combination.
Use the all-role local preflight before any paid six-role launch:
impl_codex\scripts\prepare_equity_all_roles_repair_preflight.bat
impl_codex\scripts\prepare_fixed_income_all_roles_repair_preflight.bat
impl_codex\scripts\prepare_multi_asset_all_roles_repair_preflight.bat
These commands do not submit paid training. They build release-wide paired-eval failure repairs, generate grounded critical-reasoning data, build selected training with repair oversampling, run repair-coverage-gate, validate all six role corpora, and write impl_codex/shft_workspace/preflight/<asset_class>/all_roles_preflight_summary.json. The 2026-05-24 non-capped preflight recheck passed for Equity, Fixed Income, and Multi Asset.
Latest Equity local verification:
chief_investment_officer selected=16693 repair_rows=16594 validation_ok=true schema_errors=0 conflicts=0
client_portfolio_manager selected=21063 repair_rows=20906 validation_ok=true schema_errors=0 conflicts=0
performance_manager selected=8855 repair_rows=8822 validation_ok=true schema_errors=0 conflicts=0
portfolio_manager selected=69283 repair_rows=66602 validation_ok=true schema_errors=0 conflicts=0
researcher selected=56480 repair_rows=51058 validation_ok=true schema_errors=0 conflicts=0
risk_manager selected=31690 repair_rows=28092 validation_ok=true schema_errors=0 conflicts=0
After this preflight passes, launch all six Equity roles from protected best checkpoints:
impl_codex\scripts\run_linvest21fingpt_parallel_super_agents.bat --asset equity --version v1_001 --mode all-until-certified --continue-best
Cross-Asset Repair/Preflight Parity
Equity is the reference for all-role repair/preflight. Fixed Income and Multi Asset use the same generic wrapper and summary gate before paid training:
impl_codex\scripts\prepare_asset_all_roles_repair_preflight.bat equity
impl_codex\scripts\prepare_asset_all_roles_repair_preflight.bat fixed_income
impl_codex\scripts\prepare_asset_all_roles_repair_preflight.bat multi_asset
Compatibility wrappers delegate to the generic wrapper:
impl_codex\scripts\prepare_equity_all_roles_repair_preflight.bat
impl_codex\scripts\prepare_fixed_income_all_roles_repair_preflight.bat
impl_codex\scripts\prepare_multi_asset_all_roles_repair_preflight.bat
For every role in the selected asset class, the wrapper must run the same sequence as the Equity reference:
build-release-paired-eval-failure-repair
generate-reasoning-data
build-learning-training-jsonl with repair oversampling
repair-coverage-gate
validate-training-data
The role-local proof directory is:
impl_codex/shft_workspace/preflight/<asset_class>/<role>
Required files:
selected_training.jsonl
selected_training.manifest.json
repair_coverage_gate.json
training_data_validation/training_data_validation_report.json
training_data_validation/conflict_report.json
training_data_validation/quarantine_manifest.json
The asset-level summary is:
impl_codex/shft_workspace/preflight/<asset_class>/all_roles_preflight_summary.json
That summary must show all six roles passing the repair coverage gate and training-data validation, and must record selected record count, repair row count, required reasoning inclusion, selected-training hash, release id, and training source. Paid all-role Fixed Income and Multi Asset training must remain blocked unless this summary is still passing immediately before launch.
The summary verifier can also be run directly:
cd impl_codex\self_healing_finetuning
python -m n21.cli summarize-asset-preflight --asset-class equity
python -m n21.cli summarize-asset-preflight --asset-class fixed_income
python -m n21.cli summarize-asset-preflight --asset-class multi_asset
2026-05-24 local verification status:
equity ok=true passing_roles=6/6 failed_roles=[]
fixed_income ok=true passing_roles=6/6 failed_roles=[]
multi_asset ok=true passing_roles=6/6 failed_roles=[]
Latest local repair coverage:
equity CIO=18934 client_pm=23276 performance=10074 portfolio=69602 researcher=54058 risk=31092
fixed_income CIO=3000 client_pm=384 performance=396 portfolio=648 researcher=2304 risk=3000
multi_asset CIO=396 client_pm=1884 performance=396 portfolio=504 researcher=732 risk=3000
This is the intended fail-closed behavior: the generic wrapper proves parity artifacts for every asset class, and the summary blocks paid all-role launches unless every role passes. The current 2026-05-24 summaries show Fixed Income and Multi Asset locally on par with Equity for non-capped preflight readiness. The same summaries now also expose corpus-risk warnings: all 18 roles are repair-heavy, Fixed Income has two data-thin roles (client_portfolio_manager, performance_manager), and Multi Asset has three data-thin roles (chief_investment_officer, performance_manager, portfolio_manager).
A stricter isolated 75% repair-cap probe was written to:
impl_codex/shft_workspace/preflight_strict_cap_probe/repair_cap_075_after_balance_v2_20260524/strict_repair_cap_probe_summary.json
That probe now passes 18 of 18 roles after source-grounded non-repair balance rows were generated for the thin roles. The balance generator is:
python -m n21.cli generate-nonrepair-balance-data --asset-class <asset_class> --role <role> --min-nonrepair-rows 100 --force
It writes source_grounded_nonrepair_balance_<asset_class>_<role>.hf_finetune.jsonl and a manifest under data/learning/<asset_class>/<role>, and verifies that generated rows are not repair-classified. This closes the strict local corpus-composition gate. All roles still need live HF training, paired-eval proof, quality-gate success, model-card evidence, promotion evidence, and expanded original-source coverage before they are certified on par with trained Equity roles.
After the asset-class preflight summary passes, use the existing live launcher:
impl_codex\scripts\run_linvest21fingpt_parallel_super_agents.bat --asset fixed_income --version v1_001 --mode all-until-certified --continue-best
impl_codex\scripts\run_linvest21fingpt_parallel_super_agents.bat --asset multi_asset --version v1_001 --mode all-until-certified --continue-best
Dataset Provenance And Stale-Adapter Gate
Self-healing does not mean that a dataset can mutate underneath an existing adapter. SHFT treats each paid training attempt as a controlled experiment:
freeze dataset snapshot N -> train adapter N -> evaluate adapter N -> diagnose failures -> generate repair data -> freeze dataset snapshot N+1 -> train adapter N+1
The dataset is immutable per run, not globally. New public sources, normalized JSONL, or generated reasoning examples are valid self-improvement inputs only for a fresh dataset snapshot and a fresh training attempt.
Before paid training, SHFT should freeze and persist:
impl_codex/shft_workspace/runs/<run_id>/training_selection/selected_training.jsonl
impl_codex/shft_workspace/runs/<run_id>/training_selection/selected_training_manifest.json
impl_codex/shft_workspace/runs/<run_id>/dataset_snapshot/dataset_manifest.json
impl_codex/shft_workspace/runs/<run_id>/remote_artifacts/training_plan.json
The frozen snapshot must include SHA-256 hashes for the selected training file and source JSONL files, train/valid/test counts, required reasoning-file presence, reasoning-record count, and reasoning-record ratio. Current HF live training also writes split hashes in dataset_snapshot/dataset_manifest.json:
split_sha256.train
split_sha256.valid
split_sha256.test
provenance.source_sha256
HF dataset staging is run-scoped. stage-hf-dataset uploads a run snapshot to hf://datasets/linvest21/shft-datasets/runs/<run_id> and records:
provider_plans/hf_dataset_stage_result.json
path_in_repo=runs/<run_id>
job_dataset_dir=/data/runs/<run_id>
dataset_manifest_sha256
split_sha256.train / valid / test
The HF trainer receives /data/runs/<run_id> plus the expected manifest and split hashes. Before GPU training starts, it reads the mounted dataset_manifest.json, recomputes train/valid/test hashes, and writes remote_artifacts/training_plan.json.dataset_provenance. If the remote row counts or hashes differ from the local frozen snapshot, it writes training_result.status=blocked_dataset_provenance_mismatch and exits non-zero. This prevents the stale shared-dataset-root failure where one run uploaded to repo root while another job trained from a different /data/train.jsonl.
If generate-reasoning-data, source intake, normalization, or manual curation changes data/learning/<asset_class>/<role> after an adapter has already been trained, the current run is stale:
corpus_changed_after_training=true
stale_training_artifacts=true
force_new_run_required=true
In that state SHFT must not attach to the old train_handle.json, must not reuse the old adapter as the current candidate, and must not export it as a fresh implementation product. It should start a new run id, freeze the expanded dataset, and train a new adapter against that exact snapshot.
The existing operator command does not need to change:
impl_codex\scripts\run_linvest21fingpt_super_agent_menu.bat --asset equity --role researcher --version v1_001 --mode all-until-certified
The command semantics change: before RESUME_POLICY=attached_to_latest_resumable_run can attach to an existing run, the resume gate must compare the current dataset hash against the adapter's training dataset hash. A mismatch means stale artifacts, so resume is blocked and a fresh run is required.
The model-quality gate also checks remote/local dataset parity. Even if paired eval passes, promotion remains blocked unless:
remote_artifacts/training_plan.json.train_records == dataset_snapshot/dataset_manifest.json.split_counts.train
remote_artifacts/training_plan.json.valid_records == dataset_snapshot/dataset_manifest.json.split_counts.valid
remote_artifacts/training_plan.json.dataset_provenance.ok == true
Model-Quality Gate
Self-healing cycle scores are orchestration evidence. They are not enough for promotion.
Promotion and final copyable packaging require measured model-quality evidence:
impl_codex/shft_workspace/runs/<run_id>/eval/paired_eval_report.json
impl_codex/shft_workspace/runs/<run_id>/remote_artifacts/training_plan.json
impl_codex/shft_workspace/runs/<run_id>/dataset_snapshot/dataset_manifest.json
impl_codex/shft_workspace/runs/<run_id>/eval/model_judge_report.json
impl_codex/shft_workspace/runs/<run_id>/eval/human_spot_check_report.json
Run the gate directly:
cd impl_codex\self_healing_finetuning
python -m n21.cli produce-eval-evidence --run-id <run_id> --release-id <release_id>
python -m n21.cli quality-gate --run-id <run_id>
To require a real human spot-check response before the quality gate, request email review:
cd impl_codex\self_healing_finetuning
python -m n21.cli produce-eval-evidence --run-id <run_id> --release-id <release_id> --request-human-email
python -m n21.cli quality-gate --run-id <run_id>
The full run_shft_0_to_16_with_proof.bat lifecycle now enables this by default with SHFT_HUMAN_REVIEW_EMAIL=true and waits up to SHFT_HUMAN_REVIEW_TIMEOUT_SECONDS seconds, default 1800. Set SHFT_HUMAN_REVIEW_EMAIL=false only for noninteractive automation where a pending human report is acceptable.
The gate fails closed and prints exact blockers when evidence is missing or thresholds are not met.
Current all-role certification targets are intentionally aligned across all 18 asset/role models. The paired proof and the model-judge proxy both require the same minimum absolute quality bar:
candidate aggregate / model-judge mean score >= 0.60
candidate critical-pass / model-judge critical-pass >= 0.70
pairwise win rate >= 0.55
pairwise loss rate <= 0.02
This lets a run that has passed measured paired proof proceed to certification only after training budget, trainer overfit, corpus coverage, baseline proof, and explicit human spot-check approval also pass. It does not mean the run is the protected best checkpoint; the best-run tracker still records whether it improved the previous best.
produce-eval-evidence writes the mandatory evidence artifacts that the gate already required:
eval/baseline_proof_report.json
eval/model_judge_report.json
eval/human_spot_check_report.json
eval/required_eval_evidence_manifest.json
The baseline report resolves the zero-baseline trap without weakening the gate. If the paired baseline aggregate is zero and relative improvement is mathematically undefined, the report marks proof_mode: absolute_only_cold_start; the quality gate still requires absolute aggregate, critical-pass, pairwise loss, model-judge, training-budget, corpus-coverage, and human-review checks. The human spot-check report is produced as pending evidence by default; SHFT never infers human approval automatically. Use --approve-human only after a real human review has approved the sampled cases.
Human review email uses the same delivery stack as owner convergence decisions. Configure SHFT_RUN_OWNER_EMAIL, then either SMTP/IMAP (SHFT_SMTP_HOST, SHFT_SMTP_FROM, optional SHFT_SMTP_USERNAME, SHFT_SMTP_PASSWORD, SHFT_IMAP_HOST, SHFT_IMAP_USERNAME, SHFT_IMAP_PASSWORD) or Outlook COM on Windows. Without configured delivery, SHFT writes an outbox artifact and keeps polling the audited response files:
impl_codex/shft_workspace/runs/<run_id>/eval/human_spot_check_response.json
impl_codex/shft_workspace/human_spot_check_reviews/responses/<run_id>.json
Valid review responses are approve or reject. Approval only passes the human gate when the response records zero critical failures.
When all reaches a failed model-quality gate, SHFT now runs an automatic Step 0 stall-breakout pass. If breakout creates new trainable material or the loop creates new repair/reasoning data, all starts a fresh run id and repeats train -> fetch -> paired proof -> quality gate until the gate passes or the configured recovery budget is exhausted. The failed run is never certified retroactively.
For explicit operator-supervised continuous training, use all-until-certified:
impl_codex\scripts\run_linvest21fingpt_super_agent_menu.bat --asset equity --role researcher --version v1_001 --mode all-until-certified
This is a paid live mode. It repeats train -> fetch -> paired proof -> quality gate -> source breakout or repair-data generation -> retrain until the configured model-quality gate passes, the recovery budget is exhausted, or a human operator stops it with Ctrl+C or by closing the command window. It still fails closed for certification: a run is not promoted or packaged as certified unless the measured quality gate passes.
For the Qwen3-32B Equity Researcher path, the current one-command launcher is:
impl_codex\scripts\run_qwen3_32b_equity_researcher_all_until_certified.bat
That command runs the best available SFT/proof automation: Qwen3-32B QLoRA training, artifact-aware fetch, paired proof, mandatory evidence, human review, quality gate, paired-eval diagnostics, repair-target generation, pairwise preference-memory generation, and recovery/retraining until certification or human stop.
The default A+ operator is the artifact-aware autopilot:
impl_codex\scripts\run_shft_autopilot_to_a_plus.bat
With no arguments, it selects the latest run for linvest21_fingpt_equity_researcher_v1_001. It can also resume a specific source or preference run:
impl_codex\scripts\run_shft_autopilot_to_a_plus.bat <run_id> linvest21_fingpt_equity_researcher_v1_001 equity researcher 120 a100-large
The autopilot is conservative with paid jobs. It submits a paired-eval or DPO job only when the required prior artifact is complete and no local handle already exists. Otherwise it polls, fetches, writes autopilot_status.json, and waits. Its loop is:
fetch training/preference artifacts -> paired proof -> quality gate
-> if source gate fails, build loss-targeted preference data and submit DPO
-> if preference gate passes, write A+ report
Terminal states are certification, explicit quality-gate failure after a completed preference proof, terminal HF job failure/cancelation, or the configured poll budget. The default poll budget is 288 polls at 300 seconds each; override with the final two arguments.
The older A+ sequence launcher remains available for manual staging:
impl_codex\scripts\run_qwen3_32b_equity_researcher_a_plus_sequence.bat
It first looks for the latest run with:
impl_codex/shft_workspace/runs/<run_id>/preference_memory/preference_pairs.jsonl
If preference data exists, it submits the DPO preference-optimization stage before another SFT loop. If no preference data exists, it runs the Qwen full-auto SFT/proof/failure-mining loop, then attempts preference optimization again. Preference training is intentionally written as a separate _pref_<timestamp> run so a bad preference adapter cannot overwrite the SFT adapter or protected best checkpoint.
The preference stage is implemented by:
impl_codex/self_healing_finetuning/training/hf_preference_train.py
impl_codex/scripts/run_hf_preference_train_for_run.bat
impl_codex/scripts/fetch_hf_preference_train_for_run.bat
For manual debugging after a DPO job finishes, fetch and proof the new preference run:
impl_codex\scripts\fetch_hf_preference_train_for_run.bat <pref_run_id>
impl_codex\scripts\run_hf_paired_eval_for_run.bat <pref_run_id> 120 a100-large
cd impl_codex\self_healing_finetuning
python -m n21.cli a-plus-report --run-id <pref_run_id> --source-run-id <source_run_id>
Do not describe a run as A+ until eval/a_plus_upgrade_report.json reports grade=A+. The A+ report requires completed preference training, paired proof, aggregate delta >= +0.05, pairwise win rate >= 0.55, pairwise loss rate <= 0.02, critical pass rate >= 0.95, and human approval with zero critical failures.
After a failed paired proof, the loop writes bucketed diagnostics and repair targets before preference memory:
impl_codex/shft_workspace/runs/<run_id>/diagnostics/paired_eval_diagnostics.jsonl
impl_codex/shft_workspace/runs/<run_id>/diagnostics/repair_targets.jsonl
impl_codex/shft_workspace/runs/<run_id>/diagnostics/paired_eval_diagnostics_manifest.json
impl_codex/shft_workspace/runs/<run_id>/preference_memory/preference_pairs.jsonl
impl_codex/shft_workspace/runs/<run_id>/preference_memory/preference_manifest.json
Each diagnostic row records the prompt id, baseline answer, candidate answer, winner, failure bucket, critical-failure flag, judge rationale, and repair target with acceptance checks. The configured buckets are valuation_math, accounting_sec_extraction, moat_reasoning, risk_premium_discount_rate, investment_memo_synthesis, and hallucination_uncertainty.
For Qwen raw-base paired proof, the submit script mounts the base model into the HF job and evaluates from /models/base:
hf://Qwen/Qwen3-32B:/models/base:ro
This avoids relying on a live model download inside the proof job after a previous Qwen/Qwen3-32B load timeout.
During remote Hugging Face waits, the batch loop now prints live job transparency instead of only artifact polling. Each training/proof wait poll calls impl_codex/scripts/show_hf_job_status.ps1, which reports the HF job id, stage, created time, URL, and every fifth poll prints a compact hf jobs logs --tail excerpt with trainer progress such as current step, total steps, loss, learning rate, token count, and estimated remaining time. The helper forces UTF-8 for HF CLI calls to avoid Windows charmap log failures.
For an already-running job, start a separate watcher without disturbing the training process:
impl_codex\scripts\watch_hf_job_status.bat <run_id> train 30 30
The batch wrapper also accepts a full run directory. Use proof instead of train to watch the paired model-quality proof.
powershell -NoProfile -ExecutionPolicy Bypass -File impl_codex\scripts\watch_hf_job_status.ps1 -RunDir impl_codex\shft_workspace\runs\<run_id> -Kind train -IntervalSeconds 30 -TailLines 30
Continuous mode writes operator-visible intelligence and recovery status after each quality gate and breakout:
impl_codex/shft_workspace/continuous_training/<release_id>_status.json
impl_codex/shft_workspace/runs/<run_id>/continuous_training_status.json
impl_codex/shft_workspace/runs/<run_id>/next_data_strategy.json
impl_codex/shft_workspace/best_runs/<release_id>.json
These artifacts report current aggregate, critical-pass rate, pairwise win/loss rate, training loss when available, train/validation counts, distance to configured thresholds, best measured run so far, source-breakout status, and the next data strategy.
Final all-role status and email reporting:
impl_codex\scripts\send_final_all_roles_stats_email.bat "pm-list@example.com;cio-list@example.com"
This finalizer scans all 18 v1_001 asset/role releases, combines paired-proof metrics with the latest seven-bucket defect ranking, and writes polished JSON, Markdown, and HTML reports under:
impl_codex/shft_workspace/final_reports/final_all_18_status_latest.json
impl_codex/shft_workspace/final_reports/final_all_18_status_latest.md
impl_codex/shft_workspace/final_reports/final_all_18_status_latest.html
The report includes promotion eligibility, candidate aggregate score, critical-pass rate, pairwise win/loss rate, top defect categories, and the next repair action per role. It exits 0 after successfully writing the final report. Email is intentionally opt-in so dry status runs do not accidentally send mail. To send the report, configure:
set SHFT_SEND_FINAL_EMAIL=true
set SHFT_FINAL_MAIL_TO=pm-list@example.com;cio-list@example.com
set SHFT_SMTP_HOST=smtp.example.com
set SHFT_SMTP_PORT=587
set SHFT_SMTP_FROM=shft-status@example.com
set SHFT_SMTP_USER=shft-status@example.com
set SHFT_SMTP_PASSWORD=<smtp-password-or-app-password>
set SHFT_SMTP_TLS=true
For all-agent dispatches, set SHFT_FINAL_REPORT_AFTER_DISPATCH=true to run the same finalizer after the dispatcher returns. Use that only when the dispatch mode has actually completed the intended work; parallel tab/window launch is asynchronous, so the separate finalizer command is the preferred final step after the 18 role windows have finished.
When fetch-proof reaches a failed model-quality gate, SHFT still runs the breakout pass and writes the recovery package, but it does not auto-submit a fresh paid training job because fetch-proof is a resume/fetch command.
Each breakout pass creates transparent recovery artifacts under:
impl_codex/shft_workspace/runs/<run_id>/stall_breakout
Key artifacts:
stall_breakout_plan.json
source_intake_manifest.json
license_manifest.json
training_data_validation/training_data_validation_report.json
The breakout pass uses the configured public-source catalog and source policy, downloads public material automatically, promotes only normalized eligibility.training=true sources into data/learning/<asset_class>/<role>, validates the resulting corpus, and records whether the new sources are actually trainable by the current Step 0b converters.
Source acquisition is fallback-aware: a failed URL, 403, 404, timeout, blocked source, or non-trainable normalized source is recorded and SHFT continues to the next matching source until it finds the configured number of trainable sources or exhausts the catalog. The default policy remains: download public material automatically, but train only on material that passes the configured source-policy gate.
Continuous mode has a circuit-breaker for the exact failure pattern seen in the equity logs. When breakout is blocked, adds no trainable sources, and the candidate regressed against the previous best checkpoint, orchestrator/continuous_status.py sets convergence_control.state: NEEDS_REASONING_DATA, should_halt_paid_retraining: true, and exit code 8 once discovery is exhausted. Severe regressions halt immediately. Minor regressions also halt early when the latest retry returns zero candidates, recorded as no_candidate_retry_exhausted=true. The batch flow then builds paired-eval failure repair examples and runs generate-reasoning-data for the same asset/role instead of continuing an unbounded sleep/retry cycle.
After that reasoning data is generated, the existing adapter for that run is stale unless the trainer can prove it consumed a dataset snapshot containing the generated reasoning file. The next continuous iteration must freeze a new dataset snapshot and start or attach only to a run whose adapter training hash matches that snapshot. This is the key guard against the equity portfolio-manager failure pattern where a later fetch/export step showed required_reasoning_included: true, but the run had already skipped training because an old train_handle.json and adapter were present.
Control flags:
set SHFT_AUTO_STALL_BREAKOUT=false
set SHFT_AUTO_BREAKOUT_RETRAIN=false
set SHFT_MAX_BREAKOUT_ROUNDS=20
set SHFT_MIN_DISCOVERY_ATTEMPTS_BEFORE_REASONING_HALT=10
set SHFT_REASONING_DATA_MAX_RECORDS=1500
set SHFT_REPAIR_OVERSAMPLE_FACTOR=2
set SHFT_MAX_REPAIR_SELECTED_RATIO=0.75
set SHFT_TRAIN_MAX_STEPS=600
set SHFT_CONTINUOUS_DISCOVERY_SLEEP_SECONDS=300
Disable automatic breakout for a run with:
set SHFT_AUTO_STALL_BREAKOUT=false
- Total size
- 74.9 GB
- Files
- 13,802
- Last updated
- Jun 8
- Pre-warmed CDN
- US EU US EU