docs: AISHELL EER 0.48%
Browse files
README.md
CHANGED
|
@@ -31,7 +31,15 @@ waveform → [Preprocessor fp32/CPU] → fbank [1,T,80]
|
|
| 31 |
CAM++ normalizes the fbank internally. The 192-d embedding is used with cosine
|
| 32 |
similarity for speaker verification and diarization clustering.
|
| 33 |
|
| 34 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 35 |
|
| 36 |
## License
|
| 37 |
|
|
|
|
| 31 |
CAM++ normalizes the fbank internally. The 192-d embedding is used with cosine
|
| 32 |
similarity for speaker verification and diarization clustering.
|
| 33 |
|
| 34 |
+
## Benchmark — AISHELL-1 speaker verification
|
| 35 |
+
|
| 36 |
+
| Metric | Value |
|
| 37 |
+
|--------|-------|
|
| 38 |
+
| **EER** | **0.48%** (20 speakers, 6000 same / 6000 diff trials) |
|
| 39 |
+
| same-speaker cosine | 0.805 |
|
| 40 |
+
| different-speaker cosine | 0.256 |
|
| 41 |
+
|
| 42 |
+
AISHELL-1 (clean read Mandarin) is easier than the official CN-Celeb (~6-7%). CoreML↔torch embedding cosine 0.9997-0.99999.
|
| 43 |
|
| 44 |
## License
|
| 45 |
|