slprl
/

WhiStress

Automatic Speech Recognition

Model card Files Files and versions

WhiStress / README.md

iyosha's picture

Add pipeline tag and library name (#1)

3641786 verified 8 months ago

|

history blame contribute delete

2.9 kB

	---
	datasets:
	- slprl/TinyStress-15K
	license: cc-by-nc-4.0
	library_name: transformers
	pipeline_tag: automatic-speech-recognition
	---

	# WhiStress Model

	This is the official model checkpoint for [*WhiStress*](https://arxiv.org/abs/2505.19103) — introduced in our paper:
	WhiStress: Enriching Transcriptions with Sentence Stress Detection (Interspeech 2025).

	- 🔗 Project Page: [pages.cs.huji.ac.il/adiyoss-lab/whistress](https://pages.cs.huji.ac.il/adiyoss-lab/whistress)
	- 📚 Code: [github.com/slp-rl/WhiStress](https://github.com/slp-rl/WhiStress)
	- 📦 Dataset: [slprl/TinyStress-15K](https://huggingface.co/datasets/slprl/TinyStress-15K)

	---

	## Overview

	WhiStress extends OpenAI's [Whisper](https://huggingface.co/openai/whisper-small.en) ASR model with a decoder-based classifier that predicts token-level sentence stress. This allows models not only to transcribe speech but also to detect which words are emphasized.

	This checkpoint is based on the `whisper-small.en` variant and adds two stress-specific modules:
	- `additional_decoder_block.pt`
	- `classifier.pt`

	---

	## 🔧 How to Use

	You can use the weights in your own pipeline by cloning our codebase and loading the components:

	```bash
	git clone https://github.com/slp-rl/WhiStress.git
	cd WhiStress
	pip install -r requirements.txt
	```

	Then, either download the weights manually from this Hugging Face repo or use our script:

	```bash
	python download_weights.py
	```

	The weights should be placed in the following directory structure:

	```
	whistress/
	├── weights/
	│ ├── additional_decoder_block.pt
	│ ├── classifier.pt
	│ └── metadata.json
	```

	---

	## 🗣️ Inference Example

	```python
	from whistress import WhiStressInferenceClient


	whistress_client = WhiStressInferenceClient(device="cuda") # or "cpu"

	pred_transcription, pred_stresses = whistress_client.predict(
	audio=sample['audio'], # (sr, np.ndarray)
	transcription=None, # predict directly from audio both transcription and stress, pass transcription to predict stress only.
	return_pairs=False # set to True if you a list want a list of (word, binary_label) pairs.
	)
	print(pred_transcription) # e.g., "I didn’t say she stole my money."
	print(pred_stresses) # e.g., ['my']
	```

	Each prediction includes:
	- `transcription`: full text output
	- `emphasis_indices`: list of stressed token indices
	- `emphasized_tokens`: list of corresponding words

	---

	## Notes

	The model is intended for research purposes only.

	## 📜 Citation
	If you use our model, please cite our work:

	```bibtex
	@misc{yosha2025whistress,
	title={WHISTRESS: Enriching Transcriptions with Sentence Stress Detection},
	author={Iddo Yosha and Dorin Shteyman and Yossi Adi},
	year={2025},
	eprint={2505.19103},
	archivePrefix={arXiv},
	primaryClass={cs.CL},
	url={https://arxiv.org/abs/2505.19103},
	}
	```