File size: 1,539 Bytes
99c8b44
 
 
 
 
 
d34e1f6
d9c8dbe
99c8b44
 
d34e1f6
d9c8dbe
 
 
 
 
d34e1f6
d9c8dbe
99c8b44
d9c8dbe
 
 
 
 
 
d34e1f6
869c6e5
d34e1f6
99c8b44
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
---
license: mit
base_model:
- ishmamzarif/bangla_asr_augmented_bangla-whisper-epoch-11
pipeline_tag: automatic-speech-recognition
---

## Description
This model is a fine-tuned version of ishmamzarif/bangla_asr_augmented_bangla-whisper-epoch-11 . 
ishmamzarif/bangla_asr_augmented_bangla-whisper-epoch-11 is a finetuned version of bangla-speech-processing/BanglaASR  on Bangla speech data.

## Environment:
- Python version: 3.12.12
- PyTorch version: 2.8.0+cu126
- Librosa version: 0.10.1
- NumPy version: 1.26.4

## Training Parameters:

- BATCH_SIZE = 4
- GRADIENT_ACCUMULATION_STEPS = 4  # Effective batch size = 16
- LEARNING_RATE = 2e-5
- WARMUP_STEPS = 400
- NUM_TRAIN_EPOCHS = 8
- LOGGING_STEPS = 50

## Validation Set Evaluation:

| **Epoch** | **Training Loss** | **Validation Loss** | **WER (%)** | **Normalized Levenshtein Similarity (%)** |
| --------- | ----------------- | ------------------- | ----------- | ----------------------------------------- |
| 0         | 2.3479            | 1.59398             | 26.519      | 83.03                                     |
| 2         | 1.5380            | 1.50034             | 18.011      | 87.15                                     |
| 4         | 1.4665            | 1.47125             | 12.486      | 91.06                                     |
| 6         | 1.4448            | 1.46236             | 10.607      | 91.97                                     |
| 7         | 1.4419            | 1.46210             | 10.441      | 92.12                                     |