Mathur08's picture
Upload folder using huggingface_hub
f63dde3 verified
metadata
tags:
  - autotrain
  - transformers
  - automatic-speech-recognition
base_model: openai/whisper-base
widget:
  - src: >-
      https://huggingface.co/datasets/mishig/sample_audio/resolve/main/sample1.wav
    example_title: Sample Audio 1
  - src: >-
      https://huggingface.co/datasets/mishig/sample_audio/resolve/main/sample2.wav
    example_title: Sample Audio 2

Model Trained Using AutoTrain

  • Problem type: Automatic Speech Recognition

Validation Metrics

No validation metrics available

Training Configuration

  • Base Model: base_model: openai/whisper-base
  • Training Data: autotrain-mfnfj-9bczx\autotrain-data
  • Validation Data: None
  • Epochs: 3
  • Batch Size: 8
  • Learning Rate: 3e-05
  • Optimizer: adamw_torch
  • Scheduler: linear
  • Mixed Precision: no

Usage

from transformers import AutoModelForCTC, Wav2Vec2Processor
import torch
import librosa

# Load model and processor
model = AutoModelForCTC.from_pretrained("Mathur08/autotrain-mfnfj-9bczx")
processor = Wav2Vec2Processor.from_pretrained("Mathur08/autotrain-mfnfj-9bczx")

audio, sr = librosa.load("path_to_audio.wav", sr=16000)
inputs = processor(audio, sampling_rate=16000, return_tensors="pt", padding=True)

with torch.no_grad():
    logits = model(inputs.input_values).logits
    predicted_ids = torch.argmax(logits, dim=-1)
    transcription = processor.batch_decode(predicted_ids)