Update README.md
Browse files
README.md
CHANGED
|
@@ -62,6 +62,15 @@ feature = ecapa2_model(audio, label='embedding|gfe_1|pool')
|
|
| 62 |
|
| 63 |
The following table describes the available features:
|
| 64 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 65 |
| Feature Type| Description | Usage | Labels |
|
| 66 |
| ----------- | ----------- | ----------- | ----------- |
|
| 67 |
| Local Feature | Non-uniform effective receptive field in the frequency dimension of each frame-level feature.| Abstract features, probably usefull in tasks less related to speaker characteristics. | lfe1, lfe2, lfe3, lfe4
|
|
|
|
| 62 |
|
| 63 |
The following table describes the available features:
|
| 64 |
|
| 65 |
+
| Feature ID| Description | Usage | Labels |
|
| 66 |
+
| ----------- | ----------- | ----------- | ----------- |
|
| 67 |
+
| gfe_1, gfe_2 | Mean and variance of frame-level features as indicated in Figure 1, extracted before ReLU and BatchNorm layer.| Furthest from speaker embedding, probably usefull in tasks less related to speaker characteristics.
|
| 68 |
+
| pool | Pooled statistics (mean and variance) before the bottleneck speaker embedding layer, extracted before ReLU layer.| Generally capture intra-speaker variance better then speaker embeddings. E.g. speaker profiling, emotion recognition.
|
| 69 |
+
| attention | Same as the pooled statistics but with the attention weights applied.| Generally capture intra-speaker variance better then speaker embeddings. E.g. speaker profiling, emotion recognition.
|
| 70 |
+
| embedding | The standard ECAPA2 speaker embedding. | Best for tasks directly depending on the speaker identity (as opposed to speaker characteristics). E.g. speaker verification, speaker diarization.
|
| 71 |
+
|
| 72 |
+
The following table describes the available features:
|
| 73 |
+
|
| 74 |
| Feature Type| Description | Usage | Labels |
|
| 75 |
| ----------- | ----------- | ----------- | ----------- |
|
| 76 |
| Local Feature | Non-uniform effective receptive field in the frequency dimension of each frame-level feature.| Abstract features, probably usefull in tasks less related to speaker characteristics. | lfe1, lfe2, lfe3, lfe4
|