Automatic Speech Recognition
Transformers
File size: 2,902 Bytes
acc2cd7
 
 
3641786
 
 
acc2cd7
 
 
 
0d6633a
acc2cd7
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
67fa3a5
 
 
 
acc2cd7
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
---
datasets:
- slprl/TinyStress-15K
license: cc-by-nc-4.0
library_name: transformers
pipeline_tag: automatic-speech-recognition
---

# WhiStress Model

This is the official model checkpoint for [***WhiStress***](https://arxiv.org/abs/2505.19103) β€” introduced in our paper:  
**WhiStress: Enriching Transcriptions with Sentence Stress Detection** (Interspeech 2025).

- πŸ”— Project Page: [pages.cs.huji.ac.il/adiyoss-lab/whistress](https://pages.cs.huji.ac.il/adiyoss-lab/whistress)  
- πŸ“š Code: [github.com/slp-rl/WhiStress](https://github.com/slp-rl/WhiStress)  
- πŸ“¦ Dataset: [slprl/TinyStress-15K](https://huggingface.co/datasets/slprl/TinyStress-15K)

---

## Overview

**WhiStress** extends OpenAI's [Whisper](https://huggingface.co/openai/whisper-small.en) ASR model with a decoder-based classifier that predicts **token-level sentence stress**. This allows models not only to transcribe speech but also to detect which words are emphasized.

This checkpoint is based on the `whisper-small.en` variant and adds two stress-specific modules:
- `additional_decoder_block.pt`
- `classifier.pt`

---

## πŸ”§ How to Use

You can use the weights in your own pipeline by cloning our codebase and loading the components:

```bash
git clone https://github.com/slp-rl/WhiStress.git
cd WhiStress
pip install -r requirements.txt
```

Then, either download the weights manually from this Hugging Face repo or use our script:

```bash
python download_weights.py
```

The weights should be placed in the following directory structure:

```
whistress/
β”œβ”€β”€ weights/
β”‚   β”œβ”€β”€ additional_decoder_block.pt
β”‚   β”œβ”€β”€ classifier.pt
β”‚   └── metadata.json
```

---

## πŸ—£οΈ Inference Example

```python
from whistress import WhiStressInferenceClient


whistress_client = WhiStressInferenceClient(device="cuda") # or "cpu"

pred_transcription, pred_stresses = whistress_client.predict(
    audio=sample['audio'], # (sr, np.ndarray)
    transcription=None, # predict directly from audio both transcription and stress, pass transcription to predict stress only.
    return_pairs=False # set to True if you a list want a list of (word, binary_label) pairs.
)
print(pred_transcription) # e.g., "I didn’t say she stole my money."
print(pred_stresses) # e.g., ['my']
```

Each prediction includes:
- `transcription`: full text output
- `emphasis_indices`: list of stressed token indices
- `emphasized_tokens`: list of corresponding words

---

## Notes

The model is intended for research purposes only.

## πŸ“œ Citation
If you use our model, please cite our work:

```bibtex
@misc{yosha2025whistress,
    title={WHISTRESS: Enriching Transcriptions with Sentence Stress Detection}, 
    author={Iddo Yosha and Dorin Shteyman and Yossi Adi},
    year={2025},
    eprint={2505.19103},
    archivePrefix={arXiv},
    primaryClass={cs.CL},
    url={https://arxiv.org/abs/2505.19103}, 
}
```