How do i Reduce Hallucination and mostly audio's miss the last line after chunking of 30 seconds
So i was trying to to do Transcription using north model and the thing is i do get transcriptions but in longer audios the transcription is hallucinating and i am transcribing the audios with chunks of 30 seconds , which results in the last line of chunk not getting transcripted , and gets missing from the transcription
for ex- i said hey how are you, and someone said i have been good how have you been
i said hey how are you, and someone said i have been good <-- i get this part how have you been but this small part gets missed from transcription
class JiviService:
def __init__(self):
print(torch.cuda)
self.device = "cuda" if torch.cuda.is_available() else "cpu"
print(f"Using JIVI SERVICE device: {self.device}")
try:
self.processor = WhisperProcessor.from_pretrained("jiviai/audioX-north-v1")
self.model = WhisperForConditionalGeneration.from_pretrained("jiviai/audioX-north-v1").to(self.device)
self.model.config.forced_decoder_ids = None
except Exception as e:
print(f"Error loading model: {e}")
raise RuntimeError(f"Could not load the transcription model: {e}")
This is how i am trying to use the Model , i have 2 funcs one for transcribing audio and one for transcribing the chunks
inputs = self.processor(audio_np, sampling_rate=16000, return_tensors="pt")
input_features = inputs.input_features.to(self.device)
predicted_ids = self.model.generate(input_features, task="transcribe", language="hi")
transcription = self.processor.batch_decode(predicted_ids, skip_special_tokens=True)[0]
this is how i am using the chunk func to transcribe audio's
any help will be great.
also is there any way to get TimeStamps??