--- license: apache-2.0 pipeline_tag: audio-text-to-text language: - en - zh base_model: - Yi3852/MuFun-Base --- an instruct version of MuFun model proposed in [Advancing the Foundation Model for Music Understanding](https://arxiv.org/abs/2508.01178) ## Usage some audio processing packages like mutagen, torchaudio are needed to be installed ```python from transformers import AutoTokenizer, AutoModelForCausalLM hf_path = 'Yi3852/MuFun-Instruct' tokenizer = AutoTokenizer.from_pretrained(hf_path, use_fast=False) device='cuda' model = AutoModelForCausalLM.from_pretrained(hf_path, trust_remote_code=True, torch_dtype="bfloat16") model.to(device) # single audio # during inference the audio(converted to a sequence of embeddings) will be placed in the position of