mpasila
/

faster-whisper-Visual-novel-transcriptor

Automatic Speech Recognition

Model card Files Files and versions

mpasila commited on Feb 23, 2025

Commit

e596d36

·

verified ·

1 Parent(s): 5f60848

Create README.md

Files changed (1) hide show

README.md +89 -0

README.md ADDED Viewed

	@@ -0,0 +1,89 @@

+---
+library_name: transformers
+datasets:
+- reazon-research/reazonspeech
+- joujiboi/japanese-anime-speech
+language:
+- ja
+- en
+metrics:
+- cer
+pipeline_tag: automatic-speech-recognition
+---
+This is a faster-whisper/ct2 conversion of the original model:
+[spow12/Visual-novel-transcriptor](https://huggingface.co/spow12/Visual-novel-transcriptor)
+# Model Card for Model ID
+Fine tunned ASR model from [distil-whisper/distil-large-v2](https://huggingface.co/distil-whisper/distil-large-v2).
+This model aimed to transcribe japanese audio especially visual novel.
+# WaifuModel Collections
+- [TTS](https://huggingface.co/spow12/visual_novel_tts)
+- [Chat](https://huggingface.co/spow12/ChatWaifu_v1.2.1)
+- [ASR](https://huggingface.co/spow12/Visual-novel-transcriptor)
+# Unified Demo
+[WaifuAssitant](https://github.com/yw0nam/WaifuAssistant)
+## Model Details
+### Model Description
+<!-- Provide a longer summary of what this model is. -->
+This is the model card of a 🤗 transformers model that has been pushed on the Hub. This model card has been automatically generated.
+- **Developed by:** spow12(yw_nam)
+- **Shared by :** spow12(yw_nam)
+- **Model type:** Seq2Seq
+- **Language(s) (NLP):** japanese
+- **Finetuned from model :** [distil-whisper/distil-large-v2](https://huggingface.co/distil-whisper/distil-large-v2).
+## Uses
+```python
+from transformers import AutoProcessor, AutoModelForSpeechSeq2Seq
+import librosa
+processor = AutoProcessor.from_pretrained('spow12/Visual-novel-transcriptor', language="ja", task="transcribe")
+model = AutoModelForSpeechSeq2Seq.from_pretrained('spow12/Visual-novel-transcriptor').cuda()
+model.config.forced_decoder_ids = processor.get_decoder_prompt_ids(language="ja", task="transcribe")
+data, _ = librosa.load(wav_path, sr=16000)
+input_features = processor(data, sampling_rate=16000, return_tensors="pt").input_features.cuda()
+predicted_ids = model.generate(input_features)
+transcription = processor.batch_decode(predicted_ids, skip_special_tokens=True)
+print(transcription[0])
+```
+## Bias, Risks, and Limitations
+This model trained by japanese dataset included visual novel which contain nsfw content.
+## Use & Credit
+This model is currently available for non-commercial use only. Also, since I'm not detailed in licensing, I hope you use it responsibly.
+By sharing this model, I hope to contribute to the research efforts of our community (the open-source community and anime persons).
+## Citation
+```bibtex
+@misc {Visual-novel-transcriptor,
+    author       = { YoungWoo Nam },
+    title        = { Visual-novel-transcriptor },
+    year         = 2024,
+    url          = { https://huggingface.co/spow12/Visual-novel-transcriptor },
+    publisher    = { Hugging Face }
+}
+```