Update README.md
Browse files
README.md
CHANGED
|
@@ -82,6 +82,7 @@ with torch.jit.optimized_execution(False):
|
|
| 82 |
|
| 83 |
There is no need for `ecapa2.eval()` or `torch.no_grad()`, this is done automatically.
|
| 84 |
|
|
|
|
| 85 |
### Hierarchical Feature Extraction
|
| 86 |
|
| 87 |
For the extraction of other hierachical features, the `label` argument can be used, which accepts a string containing the feature ids separated with '|':
|
|
@@ -106,7 +107,7 @@ The following table describes the available features. All features consists of t
|
|
| 106 |
| pool | 3072 | Pooled statistics before the bottleneck speaker embedding layer, extracted before ReLU layer.
|
| 107 |
| attention | 3072 | Same as the pooled statistics but with the attention weights applied.
|
| 108 |
| embedding | 192 | The standard ECAPA2 speaker embedding.
|
| 109 |
-
|
| 110 |
The following table describes the available features:
|
| 111 |
|
| 112 |
| Feature Type| Description | Usage | Labels |
|
|
@@ -114,8 +115,8 @@ The following table describes the available features:
|
|
| 114 |
| Local Feature | Non-uniform effective receptive field in the frequency dimension of each frame-level feature.| Abstract features, probably usefull in tasks less related to speaker characteristics. | lfe1, lfe2, lfe3, lfe4
|
| 115 |
| Global Feature | Uniform effective receptive field of each frame-level feature in the frequency dimension.| Generally capture intra-speaker variance better then speaker embeddings. E.g. speaker profiling, emotion recognition. | gfe1, gfe2, gfe3, pool
|
| 116 |
| Speaker Embedding | Uniform effective receptive field of each frame-level feature in the frequency dimension.| Best for tasks directly depending on the speaker identity (as opposed to speaker characteristics). E.g. speaker verification, speaker diarization. | embedding
|
|
|
|
| 117 |
-->
|
| 118 |
-
<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
|
| 119 |
|
| 120 |
## Citation
|
| 121 |
|
|
|
|
| 82 |
|
| 83 |
There is no need for `ecapa2.eval()` or `torch.no_grad()`, this is done automatically.
|
| 84 |
|
| 85 |
+
<!--
|
| 86 |
### Hierarchical Feature Extraction
|
| 87 |
|
| 88 |
For the extraction of other hierachical features, the `label` argument can be used, which accepts a string containing the feature ids separated with '|':
|
|
|
|
| 107 |
| pool | 3072 | Pooled statistics before the bottleneck speaker embedding layer, extracted before ReLU layer.
|
| 108 |
| attention | 3072 | Same as the pooled statistics but with the attention weights applied.
|
| 109 |
| embedding | 192 | The standard ECAPA2 speaker embedding.
|
| 110 |
+
|
| 111 |
The following table describes the available features:
|
| 112 |
|
| 113 |
| Feature Type| Description | Usage | Labels |
|
|
|
|
| 115 |
| Local Feature | Non-uniform effective receptive field in the frequency dimension of each frame-level feature.| Abstract features, probably usefull in tasks less related to speaker characteristics. | lfe1, lfe2, lfe3, lfe4
|
| 116 |
| Global Feature | Uniform effective receptive field of each frame-level feature in the frequency dimension.| Generally capture intra-speaker variance better then speaker embeddings. E.g. speaker profiling, emotion recognition. | gfe1, gfe2, gfe3, pool
|
| 117 |
| Speaker Embedding | Uniform effective receptive field of each frame-level feature in the frequency dimension.| Best for tasks directly depending on the speaker identity (as opposed to speaker characteristics). E.g. speaker verification, speaker diarization. | embedding
|
| 118 |
+
|
| 119 |
-->
|
|
|
|
| 120 |
|
| 121 |
## Citation
|
| 122 |
|