| Task List |
|
|
| Fix validation splitting by token identity |
| Replace class-only split logic in train.py (line 235) |
| Group cached context samples by source_token or token_address |
| Ensure one token can only exist in train or val, never both |
| Keep class balance as a secondary constraint, not the primary identity rule |
| Stop using current validation as decision-grade signal |
| Treat old val curves/checkpoints as contaminated |
| Re-evaluate only after token-grouped split is in place |
| Audit cache metadata and make token identity explicit |
| Ensure every cached sample has stable token identity fields |
| Required minimum: source_token, class_id |
| Prefer also storing lightweight cache-planning metadata for later analysis |
| Redesign cache generation around fixed budgets |
| Define total cache budget first |
| Allocate exact sample counts per token class before writing files |
| Do not let raw source distribution decide cache composition |
| Remove destructive dependence on token class map filtering alone |
| Token class should guide budget allocation |
| It should not be the only logic determining whether cache is useful |
| Add cache-time context-level balancing |
| After sampling a candidate context, evaluate realized future labels for that context |
| Use realized context outcome to decide whether to keep or skip it |
| Do this before saving to disk |
| Start with binary polarity, not movement-threshold balancing |
| Positive if max valid horizon return > 0 |
| Negative otherwise |
| Use this only as cache-selection metadata first |
| Make polarity quotas class-conditional |
| For stronger classes, target positive/negative ratios |
| For garbage classes, do not force positives |
| Keep class 0 mostly natural/negative |
| Keep T_cutoff random during cache generation |
| Do not freeze a single deterministic cutoff per token |
| Determinism should be in the planning/budget logic, not in removing context diversity |
| Add exact acceptance accounting during cache build |
| Track how many samples have already been accepted per class |
| Track polarity counts per class |
| Stop accepting once quotas are filled |
| Avoid cache waste from duplicate low-value contexts |
| Add retry/attempt limits per token |
| If a token cannot satisfy desired quota type, stop oversampling it endlessly |
| Move on to other tokens instead of filling disk with junk |
| Keep label derivation in the data pipeline, not in training logic |
| Loader should produce final labels and masks |
| Collator should only stack/batch them |
| Model should only consume them |
| Reduce or remove train-time class reweighting after cache is fixed |
| Revisit WeightedRandomSampler |
| Revisit class_loss_weights |
| If cache is balanced upstream, training should not need heavy rescue weighting |
| Revisit movement head only after split and cache are fixed |
| Keep it auxiliary |
| Do not let movement-label threshold debates block the more important data fixes |
| Later simplify naming/threshold assumptions if needed |
| Add cache audit tooling |
| Report counts by class_id |
| Report counts by class x polarity |
| Report unique tokens by class |
| Report acceptance/rejection reasons |
| Report train/val token overlap check |
| Add validation integrity checks |
| Assert zero token overlap between train and val |
| Print per-class token counts, not just sample counts |
| Print per-class sample counts too |
| Rebuild cache after the new policy is implemented |
| Old cache is shaped by the wrong distribution |
| Old validation split is not trustworthy |
| New training should start from the rebuilt corpus |
| Retrain and re-baseline from scratch |
| New split |
| New cache |
| Minimal train-time rescue weighting |
| Recompare backbone behavior only after that |
| Recommended implementation order |
|
|
| Token-grouped validation split |
| Validation overlap checks |
| Cache metadata cleanup |
| Exact class quotas in cache generation |
| Class-conditional polarity quotas |
| Cache audit reports |
| Remove/reduce train-time weighting |
| Rebuild cache |
| Retrain |
| Reassess movement head |