Spaces:
Sleeping
Sleeping
Enable KenLM domain-LM shallow fusion at decode time
#1
by chirag18 - opened
Three changes that together enable shallow-fusion decoding with a domain LM trained on 731K in-domain radiology reports (~111M words):
- server.py: _build_decoder() loads /app/radiology.bin if present and passes it to build_ctcdecoder with alpha=0.5, beta=1.5. Falls back to non-LM beam search if missing. /health reports kenlm_loaded.
- Dockerfile: fetches radiology.bin (240 MB) from chirag18/radiology-stt-assets at build time.
- requirements.txt: adds pypi-kenlm.
Offline sanity check on the LM passed โ correctly favors 'borderline cardiomegaly' over the common ASR misrecognition 'remaining cardiomegaly' (-1.64 vs -2.21 log-prob/word). The LM should help on word-choice errors that hotwords alone can't fix.
deepakkaura changed pull request status to merged