Improving Audio Captioning Models with Fine-grained Audio Features, Text Embedding Supervision, and LLM Mix-up Augmentation
Paper • 2309.17352 • Published • 1
How to use slseanwu/beats-conformer-bart-audio-captioner with Transformers:
# Load model directly
from transformers import AutoTokenizer, BeatsConformerBartSeq2SeqForCaptioning
tokenizer = AutoTokenizer.from_pretrained("slseanwu/beats-conformer-bart-audio-captioner")
model = BeatsConformerBartSeq2SeqForCaptioning.from_pretrained("slseanwu/beats-conformer-bart-audio-captioner")This repo contains the config & pretrained weights of the model described in the following paper:
To use this model, please refer to our code published at:
If you find our model useful, please consider citing our paper. Thanks!
@inproceedings{wu2024improving,
title={Improving Audio Captioning Models with Fine-grained Audio Features, Text Embedding Supervision, and LLM Mix-up Augmentation},
author={Wu, Shih-Lun and Chang, Xuankai and Wichern, Gordon and Jung, Jee-weon and Germain, Fran{\c{c}}ois and Le Roux, Jonathan and Watanabe, Shinji},
booktitle={Proc. Int. Conf. on Acoustics, Speech and Signal Processing (ICASSP)},
year={2024}
}