IrishCore-DiffMask-135M-v1-rc6 / eval /harness_reconciliation.md
temsa's picture
Publish IrishCore-DiffMask-135M-v1-rc6
b08ade7 verified

DiffMask Eval Harness

  • Model: models/irishcore-diffmask-135m-v1-rc6b-focusv10-e012-b48w0

Result

The deployment-aligned harness is the clean single-pass token-span path.

  • experiments/irish_core_span_raw_only/benchmark_multitask.py and scripts/eval_dllm_release.py --inference-mode clean_single_pass match exactly on the checked suites.
  • The old diffusion-style eval path (diffusion_last_pass) is not deployment-aligned and depresses scores on several suites.

Comparison

Dataset benchmark_multitask eval_clean_single_pass eval_diffusion_last_pass
fresh_holdout 0.7170 0.7170 0.6545
uat_exact 0.9032 0.9032 0.9032
irish_core 0.9733 0.9733 0.9737
multilingual_ppsn 0.9274 0.9274 0.8966

Conclusion

  • Use clean_single_pass for release gating and model comparison.
  • Keep diffusion_last_pass only as a training diagnostic if needed.