|
|
--- |
|
|
datasets: |
|
|
- slprl/TinyStress-15K |
|
|
license: cc-by-nc-4.0 |
|
|
library_name: transformers |
|
|
pipeline_tag: automatic-speech-recognition |
|
|
--- |
|
|
|
|
|
# WhiStress Model |
|
|
|
|
|
This is the official model checkpoint for [***WhiStress***](https://arxiv.org/abs/2505.19103) β introduced in our paper: |
|
|
**WhiStress: Enriching Transcriptions with Sentence Stress Detection** (Interspeech 2025). |
|
|
|
|
|
- π Project Page: [pages.cs.huji.ac.il/adiyoss-lab/whistress](https://pages.cs.huji.ac.il/adiyoss-lab/whistress) |
|
|
- π Code: [github.com/slp-rl/WhiStress](https://github.com/slp-rl/WhiStress) |
|
|
- π¦ Dataset: [slprl/TinyStress-15K](https://huggingface.co/datasets/slprl/TinyStress-15K) |
|
|
|
|
|
--- |
|
|
|
|
|
## Overview |
|
|
|
|
|
**WhiStress** extends OpenAI's [Whisper](https://huggingface.co/openai/whisper-small.en) ASR model with a decoder-based classifier that predicts **token-level sentence stress**. This allows models not only to transcribe speech but also to detect which words are emphasized. |
|
|
|
|
|
This checkpoint is based on the `whisper-small.en` variant and adds two stress-specific modules: |
|
|
- `additional_decoder_block.pt` |
|
|
- `classifier.pt` |
|
|
|
|
|
--- |
|
|
|
|
|
## π§ How to Use |
|
|
|
|
|
You can use the weights in your own pipeline by cloning our codebase and loading the components: |
|
|
|
|
|
```bash |
|
|
git clone https://github.com/slp-rl/WhiStress.git |
|
|
cd WhiStress |
|
|
pip install -r requirements.txt |
|
|
``` |
|
|
|
|
|
Then, either download the weights manually from this Hugging Face repo or use our script: |
|
|
|
|
|
```bash |
|
|
python download_weights.py |
|
|
``` |
|
|
|
|
|
The weights should be placed in the following directory structure: |
|
|
|
|
|
``` |
|
|
whistress/ |
|
|
βββ weights/ |
|
|
β βββ additional_decoder_block.pt |
|
|
β βββ classifier.pt |
|
|
β βββ metadata.json |
|
|
``` |
|
|
|
|
|
--- |
|
|
|
|
|
## π£οΈ Inference Example |
|
|
|
|
|
```python |
|
|
from whistress import WhiStressInferenceClient |
|
|
|
|
|
|
|
|
whistress_client = WhiStressInferenceClient(device="cuda") # or "cpu" |
|
|
|
|
|
pred_transcription, pred_stresses = whistress_client.predict( |
|
|
audio=sample['audio'], # (sr, np.ndarray) |
|
|
transcription=None, # predict directly from audio both transcription and stress, pass transcription to predict stress only. |
|
|
return_pairs=False # set to True if you a list want a list of (word, binary_label) pairs. |
|
|
) |
|
|
print(pred_transcription) # e.g., "I didnβt say she stole my money." |
|
|
print(pred_stresses) # e.g., ['my'] |
|
|
``` |
|
|
|
|
|
Each prediction includes: |
|
|
- `transcription`: full text output |
|
|
- `emphasis_indices`: list of stressed token indices |
|
|
- `emphasized_tokens`: list of corresponding words |
|
|
|
|
|
--- |
|
|
|
|
|
## Notes |
|
|
|
|
|
The model is intended for research purposes only. |
|
|
|
|
|
## π Citation |
|
|
If you use our model, please cite our work: |
|
|
|
|
|
```bibtex |
|
|
@misc{yosha2025whistress, |
|
|
title={WHISTRESS: Enriching Transcriptions with Sentence Stress Detection}, |
|
|
author={Iddo Yosha and Dorin Shteyman and Yossi Adi}, |
|
|
year={2025}, |
|
|
eprint={2505.19103}, |
|
|
archivePrefix={arXiv}, |
|
|
primaryClass={cs.CL}, |
|
|
url={https://arxiv.org/abs/2505.19103}, |
|
|
} |
|
|
``` |