nectec/LOTUSDIS
Viewer β’ Updated β’ 161k β’ 97 β’ 4
How to use pmootr/pathumma-large-v3-dora-robust with PEFT:
Task type is invalid.
Configuration Parsing Warning:In adapter_config.json: "peft.task_type" must be a string
This repository contains the fine-tuned PEFT adapter weights for Thai Automatic Speech Recognition (ASR). The model is built on top of nectec/Pathumma-whisper-th-large-v3 and optimized using DoRA (Weight-Decomposed Low-Rank Adaptation) to handle highly challenging audio environments.
This model was specifically fine-tuned using the LOTUSDIS dataset provided by NECTEC, which features a wide variety of difficult acoustic conditions and microphone types.
all_linear targeting) instead of standard LoRA to achieve better magnitude and directional updates, pushing the Word Error Rate (WER) down to 35.8% on a highly difficult evaluation set.PyThaiNLP for rigorous text normalization, solving Thai floating vowel issues, and converting Arabic numbers to Thai words to match competition standards.You can load this model directly using the transformers and peft libraries.
import torch
import librosa
from transformers import WhisperProcessor, WhisperForConditionalGeneration
from peft import PeftModel
device = "cuda:0" if torch.cuda.is_available() else "cpu"
base_model_id = "nectec/Pathumma-whisper-th-large-v3"
peft_model_id = "pmootr/pathumma-large-v3-dora-robust"
# 1. Load Base Model and Processor
processor = WhisperProcessor.from_pretrained(base_model_id)
base_model = WhisperForConditionalGeneration.from_pretrained(base_model_id, device_map=device)
# 2. Attach DoRA Adapter and Merge
model = PeftModel.from_pretrained(base_model, peft_model_id).merge_and_unload()
# 3. Transcribe Audio
def transcribe(audio_path):
# Note: Ensure the audio is preprocessed (noise reduction) for best results
audio_array, sr = librosa.load(audio_path, sr=16000)
inputs = processor(audio_array, sampling_rate=sr, return_tensors="pt")
forced_decoder_ids = processor.get_decoder_prompt_ids(language="thai", task="transcribe")
with torch.no_grad():
predicted_ids = model.generate(
inputs.input_features.to(device, dtype=model.dtype),
forced_decoder_ids=forced_decoder_ids,
max_new_tokens=255,
num_beams=5,
repetition_penalty=1.2
)
text = processor.batch_decode(predicted_ids, skip_special_tokens=True)[0]
return text.strip()
# Example:
# print(transcribe("sample_audio.wav"))
Base model
openai/whisper-large-v3