anzorq
/

w2v-bert-2.0-kbd

Automatic Speech Recognition

Model card Files Files and versions

w2v-bert-2.0-kbd / README.md

anzorq's picture

Update README.md

f310d0d verified almost 2 years ago

|

history blame contribute delete

1.87 kB

	---
	license: mit
	language:
	- kbd
	datasets:
	- anzorq/kbd_speech
	- anzorq/sixuxar_yijiri_mak7
	metrics:
	- wer
	pipeline_tag: automatic-speech-recognition
	---
	# Circassian (Kabardian) ASR Model

	This is a fine-tuned model for Automatic Speech Recognition (ASR) in `kbd`, based on the `facebook/w2v-bert-2.0` model.

	The model was trained on a combination of the `anzorq/kbd_speech` (filtered on `country=russia`) and `anzorq/sixuxar_yijiri_mak7` datasets.

	## Model Details

	- Base Model: facebook/w2v-bert-2.0
	- Language: Kabardian
	- Task: Automatic Speech Recognition (ASR)
	- Datasets: anzorq/kbd_speech, anzorq/sixuxar_yijiri_mak7
	- Training Steps: 5000

	## Training

	The model was fine-tuned using the following training arguments:

	```python
	TrainingArguments(
	output_dir='output',
	group_by_length=True,
	per_device_train_batch_size=8,
	gradient_accumulation_steps=2,
	evaluation_strategy="steps",
	num_train_epochs=10,
	gradient_checkpointing=True,
	fp16=True,
	save_steps=1000,
	eval_steps=500,
	logging_steps=300,
	learning_rate=5e-5,
	warmup_steps=500,
	save_total_limit=2,
	push_to_hub=True,
	report_to="wandb"
	)
	```

	## Performance

	The model's performance during training:

	\| Step \| Training Loss \| Validation Loss \| WER \|
	\|------\|---------------\|-----------------\|---------\|
	\| 500 \| 2.859600 \| inf \| 0.870362\|
	\| 1000 \| 0.355500 \| inf \| 0.703617\|
	\| 1500 \| 0.247100 \| inf \| 0.549942\|
	\| 2000 \| 0.196700 \| inf \| 0.471762\|
	\| 2500 \| 0.181500 \| inf \| 0.361494\|
	\| 3000 \| 0.152200 \| inf \| 0.314119\|
	\| 3500 \| 0.135700 \| inf \| 0.275146\|
	\| 4000 \| 0.113400 \| inf \| 0.252625\|
	\| 4500 \| 0.102900 \| inf \| 0.277013\|
	\| 5000 \| 0.078500 \| inf \| 0.250175\|