ogulcanakca's picture
Update README.md
4bc806a verified
metadata
base_model: ogulcanakca/whisper-small-tr
base_model_relation: quantized
language:
  - tr
license: cc0-1.0
tags:
  - ctranslate2
  - faster-whisper
  - int8
  - automatic-speech-recognition
  - robust-speech
  - quantization
  - whisper
spaces:
  - ogulcanakca/turkish-asr-demo
widget:
  - src: >-
      https://drive.google.com/file/d/1fgb1yOwXSba17HtLMIbcLh79j9nMtQzH/view?usp=sharing
    example_title: Sample Speech (Common Voice)
  - text: Use the Space demo above to test the model.
    example_title: Information
model-index:
  - name: faster-whisper-small-tr
    results:
      - task:
          name: Automatic Speech Recognition
          type: automatic-speech-recognition
        metrics:
          - name: Real Time Factor (Speedup)
            type: rtf
            value: 19.21

Faster Whisper Small Turkish (INT8 Quantized)

This model is an optimized CTranslate2 (INT8) conversion of the robust Turkish ASR model ogulcanakca/whisper-small-tr.

Performance Benchmarks

The model was benchmarked against the original Hugging Face Transformers implementation on an NVIDIA A100 GPU.

Model Format Precision Inference Time (Avg) Speedup Factor
Original PyTorch FP16 10.35 sec 1x (Baseline)
Faster-Whisper INT8 0.54 sec 19.2x Faster

Note: Benchmarks were conducted on a standard Common Voice audio sample.

Usage

To use this model, you need the faster-whisper library.

pip install faster-whisper
from faster_whisper import WhisperModel

model_id = "ogulcanakca/faster-whisper-small-tr"

# Run on GPU with INT8
model = WhisperModel(model_id, device="cuda", compute_type="int8")

# or Run on CPU with INT8 (High performance on CPU too!)
# model = WhisperModel(model_id, device="cpu", compute_type="int8")

segments, info = model.transcribe("audio.mp3", beam_size=5, language="tr")

print(f"Detected language '{info.language}' with probability {info.language_probability}")

for segment in segments:
    print(f"[{segment.start:.2f}s -> {segment.end:.2f}s] {segment.text}")

Model Details

  • Base Model: ogulcanakca/whisper-small-tr (Fine-tuned on Common Voice 23.0 with JIT Augmentation)
  • Quantization: 8-bit Integer (INT8)
  • Backend: CTranslate2
  • Objective: Low-latency real-time streaming and high-throughput batch processing.