|
|
--- |
|
|
license: apache-2.0 |
|
|
datasets: |
|
|
- ICTNLP/StreamUni |
|
|
base_model: |
|
|
- microsoft/Phi-4-multimodal-instruct |
|
|
pipeline_tag: audio-text-to-text |
|
|
library_name: adapter-transformers |
|
|
--- |
|
|
|
|
|
# The model for the paper '[StreamUni: Achieving Streaming Speech Translation with a Unified Large Speech-Language Model](https://arxiv.org/abs/2507.07803v1)' |
|
|
|
|
|
## Usage |
|
|
|
|
|
Please refer to [Github Page](https://github.com/ictnlp/StreamUni) |
|
|
|
|
|
### Requirements |
|
|
|
|
|
Phi-4 family has been integrated in the `4.48.2` version of `transformers`. The current `transformers` version can be verified with: `pip list | grep transformers`. |
|
|
We suggest to run with Python 3.10. |
|
|
Examples of required packages: |
|
|
``` |
|
|
flash_attn==2.7.4.post1 |
|
|
torch==2.6.0 |
|
|
transformers==4.48.2 |
|
|
accelerate==1.3.0 |
|
|
soundfile==0.13.1 |
|
|
pillow==11.1.0 |
|
|
scipy==1.15.2 |
|
|
torchvision==0.21.0 |
|
|
backoff==2.2.1 |
|
|
peft==0.13.2 |
|
|
``` |
|
|
|
|
|
## Training Datasets |
|
|
- https://huggingface.co/datasets/ICTNLP/StreamUni |
|
|
## Github Pages |
|
|
- https://github.com/ictnlp/StreamUni |
|
|
|