File size: 1,142 Bytes
133833d
 
 
 
 
 
 
 
 
 
 
 
 
50a7463
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
---
license: apache-2.0
language:
- en
metrics:
- accuracy
base_model:
- facebook/wav2vec2-base
pipeline_tag: audio-classification
library_name: fairseq
tags:
- code
---
Multitask Speech Model with Wav2Vec2

This repository contains a multitask learning pipeline built on top of Wav2Vec2
, designed to jointly perform:

Automatic Speech Recognition (ASR) (character-level CTC loss)

Speaker Identification

Emotion Recognition

The system is trained on a combination of training dataset with parallel data from speech transcriptions, speaker identification and emotion recognition labels.

📌 Features

Multitask model (Wav2Vec2MultiTasks) with shared Wav2Vec2 encoder and separate heads for:

Speech Recognition (CTC)

Speaker classification

Emotion classification

Custom data preprocessing:

Cleans transcripts (removes punctuation & special characters)

Converts numbers into words

Builds a vocabulary and tokenizer

Filters short/invalid audio

Training, validation, and test splits with collators for CTC.

Evaluation metrics:

Character Error Rate (CER) for character recognition

Accuracy for speaker and emotion classification
sh