faster-whisper-base-ct2 (float16)

OpenAI whisper-base 的 CTranslate2 转换版本（float16 量化）

本仓库是将 openai/whisper-base 使用 ct2-transformers-converter 转换为 CTranslate2 格式的模型，量化类型为 float16。

该版本专为高速推理设计，可直接配合 faster-whisper（强烈推荐）或原生 CTranslate2 使用，推理速度比原生 Transformers 版本快 4~10 倍，显存占用大幅降低，准确率几乎无损。

转换信息

转换命令：

ct2-transformers-converter --model D:/openai-whisper-base
--output_dir D:/faster-whisper-base-ct2
--copy_files tokenizer.json preprocessor_config.json
--quantization float16


from faster_whisper import WhisperModel, BatchedInferencePipeline
import gc
import torch

model_path = "D:/faster-whisper-base-ct2"

# 如果显存不足，用 "int8_float16" 或 "int8"
model = WhisperModel(
    model_path,
    device="cuda" if torch.cuda.is_available() else "cpu",
    compute_type="int8_float16",
    local_files_only=True
)

audio_path = "D:/distil-whisper/output.mp3"

print("正在转录（长音频优化中）...")
segments, info = model.transcribe(
    audio_path,
    language="zh" if "你的音频是中文" else None,
    beam_size=5,
    vad_filter=True,
    vad_parameters=dict(min_silence_duration_ms=500),
    word_timestamps=False
)

print(f"检测语言: {info.language} (概率: {info.language_probability:.2f})")
print(f"总时长: {info.duration:.2f} 秒")
print("转录结果：")

full_text = ""
for segment in segments:
    print(f"'{segment.text}',")
    full_text += segment.text.strip() + ", "

full_text = full_text.rstrip(", ").strip()

print("\n完整文本：")
print(full_text)

del model
gc.collect()
if torch.cuda.is_available():
    torch.cuda.empty_cache()
    torch.cuda.reset_peak_memory_stats()

print("显存已释放！")

Downloads last month: 13