Sam3000 commited on
Commit
7f0b954
·
verified ·
1 Parent(s): 03ffe4a

Upload 5 files

Browse files
Files changed (5) hide show
  1. README.md +77 -0
  2. config.json +18 -0
  3. config.yaml +19 -0
  4. model.safetensors +3 -0
  5. training_args.bin +3 -0
README.md ADDED
@@ -0,0 +1,77 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ library_name: transformers
3
+ language:
4
+ - bn
5
+ license: mit
6
+ base_model: pyannote/speaker-diarization-3.1
7
+ tags:
8
+ - speaker-diarization
9
+ - speaker-segmentation
10
+ - bangla
11
+ - bengali
12
+ - pyannote
13
+ - audio
14
+ - generated_from_trainer
15
+ datasets:
16
+ - Sam3000/speaker-diarization-dataset-bangla
17
+ model-index:
18
+ - name: bangla-segment
19
+ results: []
20
+ ---
21
+
22
+ <!-- This model card has been generated automatically according to the information the Trainer had access to. You
23
+ should probably proofread and complete it, then remove this comment. -->
24
+
25
+ # bangla-segment
26
+
27
+ This model is a fine-tuned version of [pyannote/speaker-diarization-3.1](https://huggingface.co/pyannote/speaker-diarization-3.1) on the Sam3000/speaker-diarization-dataset-bangla dataset.
28
+ It achieves the following results on the evaluation set:
29
+ - Loss: 0.4452
30
+ - Model Preparation Time: 0.0056
31
+ - Der: 0.1488
32
+ - False Alarm: 0.0317
33
+ - Missed Detection: 0.0372
34
+ - Confusion: 0.0799
35
+
36
+ ## Model description
37
+
38
+ More information needed
39
+
40
+ ## Intended uses & limitations
41
+
42
+ More information needed
43
+
44
+ ## Training and evaluation data
45
+
46
+ More information needed
47
+
48
+ ## Training procedure
49
+
50
+ ### Training hyperparameters
51
+
52
+ The following hyperparameters were used during training:
53
+ - learning_rate: 0.001
54
+ - train_batch_size: 32
55
+ - eval_batch_size: 32
56
+ - seed: 42
57
+ - optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
58
+ - lr_scheduler_type: cosine
59
+ - num_epochs: 5
60
+
61
+ ### Training results
62
+
63
+ | Training Loss | Epoch | Step | Validation Loss | Model Preparation Time | Der | False Alarm | Missed Detection | Confusion |
64
+ |:-------------:|:-----:|:----:|:---------------:|:----------------------:|:------:|:-----------:|:----------------:|:---------:|
65
+ | 0.4657 | 1.0 | 170 | 0.4409 | 0.0056 | 0.1506 | 0.0392 | 0.0198 | 0.0916 |
66
+ | 0.4403 | 2.0 | 340 | 0.4201 | 0.0056 | 0.1507 | 0.0328 | 0.0317 | 0.0861 |
67
+ | 0.3691 | 3.0 | 510 | 0.4362 | 0.0056 | 0.1485 | 0.0317 | 0.0350 | 0.0818 |
68
+ | 0.3602 | 4.0 | 680 | 0.4437 | 0.0056 | 0.1493 | 0.0319 | 0.0377 | 0.0797 |
69
+ | 0.3875 | 5.0 | 850 | 0.4452 | 0.0056 | 0.1488 | 0.0317 | 0.0372 | 0.0799 |
70
+
71
+
72
+ ### Framework versions
73
+
74
+ - Transformers 4.46.3
75
+ - Pytorch 2.4.1+cu118
76
+ - Datasets 3.1.0
77
+ - Tokenizers 0.20.3
config.json ADDED
@@ -0,0 +1,18 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "architectures": [
3
+ "SegmentationModel"
4
+ ],
5
+ "chunk_duration": 10.0,
6
+ "max_speakers_per_chunk": 3,
7
+ "max_speakers_per_frame": 2,
8
+ "min_duration": null,
9
+ "model_type": "pyannet",
10
+ "sample_rate": 16000,
11
+ "torch_dtype": "float32",
12
+ "transformers_version": "4.46.3",
13
+ "warm_up": [
14
+ 0.0,
15
+ 0.0
16
+ ],
17
+ "weigh_by_cardinality": false
18
+ }
config.yaml ADDED
@@ -0,0 +1,19 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ version: 3.1.0
2
+
3
+ pipeline:
4
+ name: pyannote.audio.pipelines.SpeakerDiarization
5
+ params:
6
+ clustering: AgglomerativeClustering
7
+ embedding: pyannote/wespeaker-voxceleb-resnet34-LM
8
+ embedding_batch_size: 32
9
+ embedding_exclude_overlap: true
10
+ segmentation: Sam3000/OUTPUT_DIR
11
+ segmentation_batch_size: 32
12
+
13
+ params:
14
+ clustering:
15
+ method: centroid
16
+ min_cluster_size: 12
17
+ threshold: 0.7045654963945799
18
+ segmentation:
19
+ min_duration_off: 0.0
model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:e4b22a94c5458f976484c163bc918e3959456b587828b43360ed06f7f76bd973
3
+ size 5899124
training_args.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:37f7aeaf09d8f2875a4907bde327dc591bfae348eba35a307814adf532d735bc
3
+ size 5304