---
language:
- bm
library_name: nemo
datasets:
- RobotsMali/an-be-kalan-bench
thumbnail: null
tags:
- automatic-speech-recognition
- speech
- audio
- CTC
- QuartzNet
- legacy-model
- deprecated
- pytorch
- Bambara
- NeMo
license: cc-by-4.0
base_model: RobotsMali/stt-bm-quartznet15x5-v2
model-index:
- name: anbekalanNet
results:
- task:
name: Automatic Speech Recognition
type: automatic-speech-recognition
dataset:
name: An be kalan Children's Reading Benchmark
type: RobotsMali/an-be-kalan-bench
split: test
args:
language: bm
metrics:
- name: Test WER
type: wer
value: 40.0
- name: Test CER
type: cer
value: 15.0
metrics:
- wer
- cer
pipeline_tag: automatic-speech-recognition
---
# anbekalanNet (QuartzNet 15x5 char CTC Series) — [LEGACY]
[](#model-architecture)
| [](#model-architecture)
| [](#datasets)
`anbekalanNet` is the final domain-specific release of the convolutional QuartzNet framework adapted for Bambara children's reading materials. It is a fine-tuned version of [`RobotsMali/stt-bm-quartznet15x5-v2`](https://huggingface.co/RobotsMali/stt-bm-quartznet15x5-v2). Like its predecessors, the model was fine-tuned using **NVIDIA NeMo** and trained with **CTC (Connectionist Temporal Classification) Loss**.
## **🚨 Obsolescence Notice**
This architecture is officially retired. Field testing and benchmark evaluations demonstrate that this convolutional foundation exhibits unstable alignment paths under tight, low-resource constraints compared to hybrid attention-transducer systems.
## NVIDIA NeMo: Installation
To load or run evaluations on this legacy checkpoint, install the standard [NVIDIA NeMo](https://github.com/NVIDIA/NeMo) package:
```bash
pip install nemo-toolkit['asr']
```
## How to Use This Model
### Load Model with NeMo
```python
import nemo.collections.asr as nemo_asr
asr_model = nemo_asr.models.EncDecCTCModel.from_pretrained(model_name="RobotsMali/anbekalanNet")
```
### Transcribe Audio
```python
# Downsamples or processes input natively via its internal preprocessor
asr_model.transcribe(['sample_audio.wav'])
```
### Input / Output
* **Input:** Accepts **16 kHz mono-channel audio (wav files)**.
* **Output:** Generates a transcribed speech hypothesis object with a lowercase `.text` string attribute containing character-encoded text. It does not output punctuations or capitalizations.
## Model Architecture
QuartzNet is a convolutional ASR model consisting of **1D time-channel separable convolutions** designed to minimize parameter count while maintaining acoustic representations. This specific variant utilizes a **15x5 block structure** with roughly 18 million parameters.
## Training & Fine-Tuning Configurations
Four experimental setups were designed to test vocabulary limits and regularization effects. This final artifact (`anbekalanNet`) used the following strict parameters:
*
**Optimization Window:** Regulated with an **Early Stopping mechanism** set to a **15-epoch patience window** monitored against validation metrics.
*
**Convergence Behavior:** Due to high training-batch lexical convergence (<4% WER), validation metrics flatlined early. Operational shutdown was forced at **epoch 30** to protect the encoder from total generalization collapse.
## Dataset
The model was fine-tuned on the combined **Main + Duplicate** expanded subsets (**45.6 hours** total) of the [RobotsMali/an-be-kalan-bench](https://huggingface.co/datasets/RobotsMali/an-be-kalan-bench) educational children's book corpus.
*
**Main Split (1.6h):** Pristine recordings of unique readings across 22 GAIFE books by 8 distinct speakers.
*
**Duplicate Split (44h):** High-density, redundant multi-speaker tracks reading identical textual literature to introduce physical vocal variance (pitch, child vocal acoustics, and regional accents).
## Performance
The performance metrics below illustrate how expanding data volume rescued the QuartzNet framework from catastrophic lexical overfitting.
### Overall Evaluation Metrics
| Experimental Pass | Dataset Baseline Configuration | SpecAugment | Training Volume | Test WER (%) ↓ | Test CER (%) ↓ |
| --- | --- | --- | --- | --- | --- |
| **anbekalanNet-exp3 (this release)** |
**Main + Duplicate** |
**None** |
**45.6 Hours** |
**40.0%** |
**15.0%** |
| *anbekalanNet-exp1* |
*Main Only* |
*None* |
*1.6 Hours* |
*93.0%* |
*80.0%* |
| *anbekalanNet-exp2* |
*Main Only* |
*Active* |
*1.6 Hours* |
*64.0%* |
*23.0%* |
| *anbekalanNet-exp4* |
*Main + Duplicate* |
*Active* |
*45.6 Hours* |
*42.0%* |
*16.0%* |
*All results indicate greedy decoding performance without external Language Models (LMs).*
## License
This legacy checkpoint is archived and released under the **CC-BY-4.0** license.
---
**Repository & Issues:** Technical tracking for this legacy series can be referenced at [RobotsMali-AI/bambara-asr](https://github.com/RobotsMali-AI/bambara-asr/). No further architectural expansions or fine-tuning updates are planned for this model card sequence.