Update README.md
Browse files
README.md
CHANGED
|
@@ -29,7 +29,9 @@ ECAPA2 is a hybrid neural network architecture and training strategy for speaker
|
|
| 29 |
- **Paper [optional]:** [More Information Needed]
|
| 30 |
- **Demo [optional]:** [More Information Needed]
|
| 31 |
|
| 32 |
-
##
|
|
|
|
|
|
|
| 33 |
|
| 34 |
Extracting speaker embeddings is easy and only requires a few lines of code:
|
| 35 |
```
|
|
@@ -41,12 +43,20 @@ ecapa2_model = torch.load('model.pt')
|
|
| 41 |
embedding = ecapa2_model.extract_embedding(audio)
|
| 42 |
```
|
| 43 |
|
|
|
|
|
|
|
| 44 |
For the extraction of other hierachical features, a separate model function is provided:
|
| 45 |
```
|
| 46 |
-
feature = ecapa2_model.extract_feature(label='gfe1')
|
| 47 |
```
|
| 48 |
|
| 49 |
-
The
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 50 |
|
| 51 |
|
| 52 |
<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
|
|
|
|
| 29 |
- **Paper [optional]:** [More Information Needed]
|
| 30 |
- **Demo [optional]:** [More Information Needed]
|
| 31 |
|
| 32 |
+
## Usage Guide
|
| 33 |
+
|
| 34 |
+
### Speaker Embedding Extraction
|
| 35 |
|
| 36 |
Extracting speaker embeddings is easy and only requires a few lines of code:
|
| 37 |
```
|
|
|
|
| 43 |
embedding = ecapa2_model.extract_embedding(audio)
|
| 44 |
```
|
| 45 |
|
| 46 |
+
### Hierarchical Feature Extraction
|
| 47 |
+
|
| 48 |
For the extraction of other hierachical features, a separate model function is provided:
|
| 49 |
```
|
| 50 |
+
feature = ecapa2_model.extract_feature(label='gfe1', type='mean')
|
| 51 |
```
|
| 52 |
|
| 53 |
+
The following table describes the available features:
|
| 54 |
+
|
| 55 |
+
| Feature Type| Description | Usage | Labels |
|
| 56 |
+
| ----------- | ----------- | ----------- | ----------- |
|
| 57 |
+
| Local Feature | Non-uniform effective receptive field in the frequency dimension of each frame-level feature.| Abstract features, probably usefull in tasks less related to speaker characteristics. | lfe1, lfe2, lfe3, lfe4
|
| 58 |
+
| Global Feature | Uniform effective receptive field of each frame-level feature in the frequency dimension.| Generally capture intra-speaker variance better then speaker embeddings. E.g. speaker profiling, emotion recognition. | gfe1, gfe2, gfe3, pool
|
| 59 |
+
| Speaker Embedding | Uniform effective receptive field of each frame-level feature in the frequency dimension.| Best for tasks directly depending on the speaker identity (as opposed to speaker characteristics). E.g. speaker verification, speaker diarization. | embedding
|
| 60 |
|
| 61 |
|
| 62 |
<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
|