yky-h's picture
upload model
dba803e verified
|
raw
history blame
3.73 kB
metadata
language: ja
license: apache-2.0
datasets: reazon-research/reazonspeech
pipeline_tag: feature-extraction
inference: false
tags:
  - speech

Japanese data2vec Audio Base

This is a mirror of japanese-data2vec-audio-base, originally released by rinna Co., Ltd. The original model is licensed under the Apache License 2.0. This mirror follows the same license terms. All copyrights remain with the original authors.


Overview

This is a Japanese data2vec Audio Base model trained by rinna Co., Ltd.


How to use the model

import soundfile as sf
from transformers import AutoFeatureExtractor, AutoModel

model_name = "yky-h/japanese-data2vec-audio-base"
feature_extractor = AutoFeatureExtractor.from_pretrained(model_name)
model = AutoModel.from_pretrained(model_name)
model.eval()

raw_speech_16kHz, sr = sf.read(audio_file)
inputs = feature_extractor(
    raw_speech_16kHz,
    return_tensors="pt",
    sampling_rate=sr,
)
outputs = model(**inputs)

print(f"Input:  {inputs.input_values.size()}")  # [1, #samples]
print(f"Output: {outputs.last_hidden_state.size()}")  # [1, #frames, 768]

A fairseq checkpoint file can also be available here.


How to cite

@misc{rinna-japanese-data2vec-audio-base,
    title = {rinna/japanese-data2vec-audio-base},
    author = {Hono, Yukiya and Mitsui, Kentaro and Sawada, Kei},
    url = {https://huggingface.co/rinna/japanese-data2vec-audio-base}
}

@inproceedings{sawada2024release,
    title = {Release of Pre-Trained Models for the {J}apanese Language},
    author = {Sawada, Kei and Zhao, Tianyu and Shing, Makoto and Mitsui, Kentaro and Kaga, Akio and Hono, Yukiya and Wakatsuki, Toshiaki and Mitsuda, Koh},
    booktitle = {Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)},
    month = {5},
    year = {2024},
    pages = {13898--13905},
    url = {https://aclanthology.org/2024.lrec-main.1213},
    note = {\url{https://arxiv.org/abs/2404.01657}}
}

References

@inproceedings{baevski2022data2vec,
    title={Data2vec: A general framework for self-supervised learning in speech, vision and language},
    author={Baevski, Alexei and Hsu, Wei-Ning and Xu, Qiantong and Babu, Arun and Gu, Jiatao and Auli, Michael},
    booktitle={International Conference on Machine Learning},
    year={2022},
    pages={1298--1312},
    doi={10.48550/arXiv.2202.03555}
}

License

The Apache 2.0 license