GigaAM-v3 Transformers
Local GigaAM-v3 implementation in Hugging Face Transformers format.
This project provides a standard Transformers interface:
AutoModel.from_pretrained(...)pipeline(task="automatic-speech-recognition", ...)- custom
AutoConfig/AutoFeatureExtractor/AutoTokenizerviatrust_remote_code=True
Branches
This repository contains multiple branches with different ASR model architectures:
main(this branch) - RNN-T End-to-End (rnnt_e2e)rnnt- RNN-T modelrnnt_e2e- RNN-T End-to-End modelctc- CTC modelctc_e2e- CTC End-to-End model
What's Inside (main branch)
ConformerencoderRNN-T End-to-Endhead (decoder + joint)- greedy decoding for
rnnt_e2e - SentencePiece tokenizer
- custom ASR pipeline (
GigaAMPipeline)
Important: this model does not provide model.transcribe(...) in this repository. The recommended inference path is pipeline or direct model(...) call + decode.
Installation
Minimal:
pip install torch transformers sentencepiece
Recommended versions:
torch==2.8.0transformers==4.57.1
Quick Start
1) Via pipeline (recommended)
from transformers import pipeline
asr = pipeline(
task="automatic-speech-recognition",
model="./GigaAM-v3-transformers",
trust_remote_code=True,
device=-1, # CPU; for CUDA use 0
)
result = asr("audio.wav")
print(result["text"])
Long audio (chunked):
result = asr("long_audio.wav", chunk_length_s=30)
print(result["text"])
Long audio with overlap:
result = asr("long_audio.wav", chunk_length_s=30, stride_length_s=5)
print(result["text"])
2) Direct model usage
from transformers import AutoModel, AutoFeatureExtractor, AutoTokenizer
import torchaudio
model = AutoModel.from_pretrained(
"./GigaAM-v3-transformers",
trust_remote_code=True,
)
feature_extractor = AutoFeatureExtractor.from_pretrained(
"./GigaAM-v3-transformers",
trust_remote_code=True,
)
tokenizer = AutoTokenizer.from_pretrained(
"./GigaAM-v3-transformers",
trust_remote_code=True,
)
wav, sr = torchaudio.load("audio.wav")
wav = wav.mean(dim=0).numpy() # mono audio
features = feature_extractor(
wav,
sampling_rate=sr,
return_tensors="pt",
)
outputs = model(**features)
token_ids = model.model.decoding.decode(
model.model.head,
outputs.encoded,
outputs.encoded_lengths,
)[0]
text = tokenizer.decode(token_ids)
print(text)
Current Limitations
- Pipeline does not support
return_timestamps. - Pipeline supports
chunk_length_sandstride_length_s(overlap is merged in postprocess). - Only
rnnt_e2eASR mode is supported in this branch.
License
MIT
- Downloads last month
- 14