File size: 2,114 Bytes
2b9ad07 89595ad a5dd3f0 2b9ad07 7fc96d2 cd9f608 7fc96d2 89595ad |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 |
---
license: other
license_name: svector
license_link: LICENSE
pipeline_tag: automatic-speech-recognition
tags:
- SVECTOR
language:
- en
- zh
- de
- es
- ru
- ko
- fr
- ja
- pt
- tr
- pl
- ca
- nl
- ar
- sv
- it
- id
- hi
- fi
- vi
- he
- uk
- el
- ms
- cs
- ro
- da
- hu
- ta
- 'no'
- th
- ur
- hr
- bg
- lt
- la
- mi
- ml
- cy
- sk
- te
- fa
- lv
- bn
- sr
- az
- sl
- kn
- et
- mk
- br
- eu
- is
- hy
- ne
- mn
- bs
- kk
- sq
- sw
- gl
- mr
- pa
- si
- km
- sn
- yo
- so
- af
- oc
- ka
- be
- tg
- sd
- gu
- am
- yi
- lo
- uz
- fo
- ht
- ps
- tk
- nn
- mt
- sa
- tl
- mg
- as
- tt
- haw
- ln
- ha
- ba
- jw
- su
---
# SPTK-2
**SPTK-2** is an open multilingual automatic speech recognition (ASR) model developed by **SVECTOR**.
It supports (after revised) 96 languages and offers improved accuracy, timestamp precision, and energy efficiency compared to previous models.
๐ Read the paper: [SPTK: A Framework for Universal Multilingual ASR (2025)](https://huggingface.co/SVECTOR-CORPORATION/SPTK-2/SPTK.pdf)
---
## ๐งช Example Usage
```python
from transformers import AutoProcessor, AutoModelForSpeechSeq2Seq
import torchaudio
processor = AutoProcessor.from_pretrained("SVECTOR-CORPORATION/SPTK-2")
model = AutoModelForSpeechSeq2Seq.from_pretrained("SVECTOR-CORPORATION/SPTK-2")
# Load and preprocess audio
audio, sr = torchaudio.load("your_audio_file.mp3")
inputs = processor(audio[0], sampling_rate=sr, return_tensors="pt")
# Generate transcription
with torch.no_grad():
predicted_ids = model.generate(inputs.input_values)
# Decode output
print(processor.batch_decode(predicted_ids, skip_special_tokens=True))
```
---
## ๐ฆ Model Details
- Model type: Encoder-decoder
- Architecture: E-Branchformer + Sparse MoE decoder
- Languages: 99+
- Supports transcription, translation, timestamps
- Released: April 2025
---
## ๐ License
This model is licensed under the **SVECTOR Proprietary License**.
For research or commercial use, please contact [licence@svector.co.in](mailto:licence@svector.co.in).
---
## ๐ Related
- ๐ [SVECTOR Official Website](https://www.svector.co.in) |