ishtiakmoin commited on
Commit
f313371
·
verified ·
1 Parent(s): 18f6fe5

Upload fine-tuned Bengali speaker diarization model

Browse files
Files changed (5) hide show
  1. README.md +25 -18
  2. USAGE.md +4 -0
  3. config.yaml +16 -0
  4. pipeline_config.json +4 -16
  5. pytorch_model.bin +3 -0
README.md CHANGED
@@ -1,25 +1,32 @@
1
  ---
2
- license: mit
 
3
  tags:
4
- - speaker-diarization
5
- - pyannote
6
- - audio
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
7
  ---
8
 
9
  # diarization_filtered_v1
10
 
11
- ## Training config
12
- - **Pretrained model:** `pyannote/segmentation-3.0`
13
- - **Embedding model:** `pyannote/wespeaker-voxceleb-resnet34-LM`
14
- - **Duration:** 10.0s chunks
15
- - **Max speakers/chunk:** 4
16
- - **Batch size:** 64
17
- - **Learning rate:** 2e-05
18
- - **Max epochs:** 5
19
 
20
- ## Usage
21
- ```python
22
- from huggingface_hub import hf_hub_download
23
- ckpt = hf_hub_download(repo_id="ishtiakmoin/diarization_filtered_v1", filename="final_model.ckpt")
24
- config = hf_hub_download(repo_id="ishtiakmoin/diarization_filtered_v1", filename="pipeline_config.json")
25
- ```
 
1
  ---
2
+ language:
3
+ - bn
4
  tags:
5
+ - speaker-diarization
6
+ - pyannote
7
+ - pyannote-audio
8
+ - audio
9
+ - voice
10
+ - speech
11
+ - bengali
12
+ license: mit
13
+ datasets:
14
+ - custom
15
+ metrics:
16
+ - der
17
+ model-index:
18
+ - name: diarization_filtered_v1
19
+ results:
20
+ - task:
21
+ type: speaker-diarization
22
+ name: Speaker Diarization
23
+ metrics:
24
+ - type: der
25
+ value: Not computed
26
+ name: Diarization Error Rate
27
  ---
28
 
29
  # diarization_filtered_v1
30
 
31
+ This is a fine-tuned speaker diarization model based on pyannote.audio, specifically trained on Bengali audio data.
 
 
 
 
 
 
 
32
 
 
 
 
 
 
 
USAGE.md ADDED
@@ -0,0 +1,4 @@
 
 
 
 
 
1
+ # Example Usage: diarization_filtered_v1
2
+
3
+ This example shows how to use the model for speaker diarization.
4
+
config.yaml ADDED
@@ -0,0 +1,16 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+
2
+ # Model configuration for pyannote.audio
3
+ task:
4
+ name: SpeakerDiarization
5
+
6
+ architecture:
7
+ name: PyanNet
8
+
9
+ specifications:
10
+ duration: 5.0
11
+ sample_rate: 16000
12
+
13
+ training:
14
+ batch_size: 32
15
+ learning_rate: 0.0001
16
+ max_epochs: 20
pipeline_config.json CHANGED
@@ -1,28 +1,16 @@
1
  {
2
- "segmentation_model": "work/models/final_model.ckpt",
 
3
  "embedding_model": "pyannote/wespeaker-voxceleb-resnet34-LM",
4
- "parameters": {
5
  "segmentation": {
6
  "threshold": 0.5,
7
- "min_duration_off": 0.1,
8
- "min_duration_on": 0.5
9
  },
10
  "clustering": {
11
  "method": "centroid",
12
  "threshold": 0.7,
13
  "min_cluster_size": 12
14
  }
15
- },
16
- "training_config": {
17
- "pretrained_model": "pyannote/segmentation-3.0",
18
- "duration": 10.0,
19
- "max_speakers_per_chunk": 4,
20
- "batch_size": 64,
21
- "learning_rate": 2e-05,
22
- "max_epochs": 5,
23
- "warm_up": [
24
- 0.1,
25
- 0.1
26
- ]
27
  }
28
  }
 
1
  {
2
+ "model_type": "speaker-diarization",
3
+ "pyannote_version": "3.3.2",
4
  "embedding_model": "pyannote/wespeaker-voxceleb-resnet34-LM",
5
+ "optimal_parameters": {
6
  "segmentation": {
7
  "threshold": 0.5,
8
+ "min_duration_off": 0.0
 
9
  },
10
  "clustering": {
11
  "method": "centroid",
12
  "threshold": 0.7,
13
  "min_cluster_size": 12
14
  }
 
 
 
 
 
 
 
 
 
 
 
 
15
  }
16
  }
pytorch_model.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:9633b666624d820d36d046dbc89ebeae69b4c4c5ec41e42bb5e5dd79d52d9c68
3
+ size 17735492