CrispASR regression-test fixtures
Reference-dump archives for the per-backend regression suite in
CrispStrobe/CrispASR.
See tests/regression/README.md in that repo for the full design.
Each <backend>/<sample-stem>/ref.gguf is the output of
tools/dump_reference.py --backend <backend> against the
corresponding source model, capturing the encoder output and a few
intermediate stages (per-layer activations for some backends). The
C++ runtime's crispasr-diff <backend> <gguf> <ref> <wav> reads
exactly this format and reports per-stage cosine similarity vs the
captured tensors.
Pinning to revision SHAs from this repo (in
tests/regression/manifest.json's fixtures.revision field)
prevents drift between commits โ re-dumping with a different
torch / NeMo / transformers version will not silently shift what
CI tests against.
Why a separate HF repo
- The reference dumps are ~1 MB to ~50 MB each. Committing them into the CrispASR repo would balloon the working-tree size needlessly.
- Pinning to a fixtures-repo commit SHA is the cleanest way to decouple "we want to test against the same numbers as before" from the CrispASR git history.
- HF revision pinning protects against the "re-upload silently changes what users download" failure mode that motivated the regression suite in the first place.
Current fixtures
| Backend | Sample | Source model | Sample SHA |
|---|---|---|---|
parakeet-tdt-0.6b-ja |
samples/ja/reazon_baseball_14s.wav |
nvidia/parakeet-tdt_ctc-0.6b-ja @ 44edb27e |
initial |
Add a new fixture: see
tests/regression/README.md
in CrispASR.
License
Apache-2.0 to match CrispASR. The reference dumps are derived from the source models' weights at inference time; consult the source models' own licenses for any redistribution constraints.
- Downloads last month
- 614
We're not able to determine the quantization variants.