Update README.md

4bc806a verified 5 months ago

2.45 kB

base_model: ogulcanakca/whisper-small-tr
base_model_relation: quantized
language:
  - tr
license: cc0-1.0
tags:
  - ctranslate2
  - faster-whisper
  - int8
  - automatic-speech-recognition
  - robust-speech
  - quantization
  - whisper
spaces:
  - ogulcanakca/turkish-asr-demo
widget:
  - src: >-
      https://drive.google.com/file/d/1fgb1yOwXSba17HtLMIbcLh79j9nMtQzH/view?usp=sharing
    example_title: Sample Speech (Common Voice)
  - text: Use the Space demo above to test the model.
    example_title: Information
model-index:
  - name: faster-whisper-small-tr
    results:
      - task:
          name: Automatic Speech Recognition
          type: automatic-speech-recognition
        metrics:
          - name: Real Time Factor (Speedup)
            type: rtf
            value: 19.21

Faster Whisper Small Turkish (INT8 Quantized)

This model is an optimized CTranslate2 (INT8) conversion of the robust Turkish ASR model ogulcanakca/whisper-small-tr.

Performance Benchmarks

The model was benchmarked against the original Hugging Face Transformers implementation on an NVIDIA A100 GPU.

Model Format	Precision	Inference Time (Avg)	Speedup Factor
Original PyTorch	FP16	10.35 sec	1x (Baseline)
Faster-Whisper	INT8	0.54 sec	19.2x Faster

Note: Benchmarks were conducted on a standard Common Voice audio sample.

Usage

To use this model, you need the faster-whisper library.

pip install faster-whisper

from faster_whisper import WhisperModel

model_id = "ogulcanakca/faster-whisper-small-tr"

# Run on GPU with INT8
model = WhisperModel(model_id, device="cuda", compute_type="int8")

# or Run on CPU with INT8 (High performance on CPU too!)
# model = WhisperModel(model_id, device="cpu", compute_type="int8")

segments, info = model.transcribe("audio.mp3", beam_size=5, language="tr")

print(f"Detected language '{info.language}' with probability {info.language_probability}")

for segment in segments:
    print(f"[{segment.start:.2f}s -> {segment.end:.2f}s] {segment.text}")

Model Details

Base Model: ogulcanakca/whisper-small-tr (Fine-tuned on Common Voice 23.0 with JIT Augmentation)
Quantization: 8-bit Integer (INT8)
Backend: CTranslate2
Objective: Low-latency real-time streaming and high-throughput batch processing.