ICTNLP
/

StreamUni-Phi4

Audio-Text-to-Text

Model card Files Files and versions

StreamUni-Phi4 / README.md

guoshoutao's picture

Update README.md

513c781 verified 5 months ago

|

history blame contribute delete

970 Bytes

	---
	license: apache-2.0
	datasets:
	- ICTNLP/StreamUni
	base_model:
	- microsoft/Phi-4-multimodal-instruct
	pipeline_tag: audio-text-to-text
	library_name: adapter-transformers
	---

	# The model for the paper '[StreamUni: Achieving Streaming Speech Translation with a Unified Large Speech-Language Model](https://arxiv.org/abs/2507.07803v1)'

	## Usage

	Please refer to [Github Page](https://github.com/ictnlp/StreamUni)

	### Requirements

	Phi-4 family has been integrated in the `4.48.2` version of `transformers`. The current `transformers` version can be verified with: `pip list \| grep transformers`.
	We suggest to run with Python 3.10.
	Examples of required packages:
	```
	flash_attn==2.7.4.post1
	torch==2.6.0
	transformers==4.48.2
	accelerate==1.3.0
	soundfile==0.13.1
	pillow==11.1.0
	scipy==1.15.2
	torchvision==0.21.0
	backoff==2.2.1
	peft==0.13.2
	```

	## Training Datasets
	- https://huggingface.co/datasets/ICTNLP/StreamUni
	## Github Pages
	- https://github.com/ictnlp/StreamUni