File size: 2,902 Bytes
acc2cd7 3641786 acc2cd7 0d6633a acc2cd7 67fa3a5 acc2cd7 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 | ---
datasets:
- slprl/TinyStress-15K
license: cc-by-nc-4.0
library_name: transformers
pipeline_tag: automatic-speech-recognition
---
# WhiStress Model
This is the official model checkpoint for [***WhiStress***](https://arxiv.org/abs/2505.19103) β introduced in our paper:
**WhiStress: Enriching Transcriptions with Sentence Stress Detection** (Interspeech 2025).
- π Project Page: [pages.cs.huji.ac.il/adiyoss-lab/whistress](https://pages.cs.huji.ac.il/adiyoss-lab/whistress)
- π Code: [github.com/slp-rl/WhiStress](https://github.com/slp-rl/WhiStress)
- π¦ Dataset: [slprl/TinyStress-15K](https://huggingface.co/datasets/slprl/TinyStress-15K)
---
## Overview
**WhiStress** extends OpenAI's [Whisper](https://huggingface.co/openai/whisper-small.en) ASR model with a decoder-based classifier that predicts **token-level sentence stress**. This allows models not only to transcribe speech but also to detect which words are emphasized.
This checkpoint is based on the `whisper-small.en` variant and adds two stress-specific modules:
- `additional_decoder_block.pt`
- `classifier.pt`
---
## π§ How to Use
You can use the weights in your own pipeline by cloning our codebase and loading the components:
```bash
git clone https://github.com/slp-rl/WhiStress.git
cd WhiStress
pip install -r requirements.txt
```
Then, either download the weights manually from this Hugging Face repo or use our script:
```bash
python download_weights.py
```
The weights should be placed in the following directory structure:
```
whistress/
βββ weights/
β βββ additional_decoder_block.pt
β βββ classifier.pt
β βββ metadata.json
```
---
## π£οΈ Inference Example
```python
from whistress import WhiStressInferenceClient
whistress_client = WhiStressInferenceClient(device="cuda") # or "cpu"
pred_transcription, pred_stresses = whistress_client.predict(
audio=sample['audio'], # (sr, np.ndarray)
transcription=None, # predict directly from audio both transcription and stress, pass transcription to predict stress only.
return_pairs=False # set to True if you a list want a list of (word, binary_label) pairs.
)
print(pred_transcription) # e.g., "I didnβt say she stole my money."
print(pred_stresses) # e.g., ['my']
```
Each prediction includes:
- `transcription`: full text output
- `emphasis_indices`: list of stressed token indices
- `emphasized_tokens`: list of corresponding words
---
## Notes
The model is intended for research purposes only.
## π Citation
If you use our model, please cite our work:
```bibtex
@misc{yosha2025whistress,
title={WHISTRESS: Enriching Transcriptions with Sentence Stress Detection},
author={Iddo Yosha and Dorin Shteyman and Yossi Adi},
year={2025},
eprint={2505.19103},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2505.19103},
}
``` |