Jenthe
/

ECAPA2

Model card Files Files and versions

xet

Jenthe commited on Oct 6, 2023

Commit

0e15756

1 Parent(s): 487a014

Update README.md

Browse files

Files changed (1) hide show

README.md +13 -3

README.md CHANGED Viewed

@@ -29,7 +29,9 @@ ECAPA2 is a hybrid neural network architecture and training strategy for speaker
 - **Paper [optional]:** [More Information Needed]
 - **Demo [optional]:** [More Information Needed]
-## How-to-use
 Extracting speaker embeddings is easy and only requires a few lines of code:
 ```
@@ -41,12 +43,20 @@ ecapa2_model = torch.load('model.pt')
 embedding = ecapa2_model.extract_embedding(audio)
 ```
 For the extraction of other hierachical features, a separate model function is provided:
 ```
-feature = ecapa2_model.extract_feature(label='gfe1')
 ```
-The list of available labels exists of: 'lfe1', 'lfe2', 'lfe3', 'lfe4', 'gfe1', 'gfe2', 'pool' and 'embedding' (equal to model.extract_embedding()).
 <!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->

 - **Paper [optional]:** [More Information Needed]
 - **Demo [optional]:** [More Information Needed]
+## Usage Guide
+### Speaker Embedding Extraction
 Extracting speaker embeddings is easy and only requires a few lines of code:
 ```
 embedding = ecapa2_model.extract_embedding(audio)
 ```
+### Hierarchical Feature Extraction
 For the extraction of other hierachical features, a separate model function is provided:
 ```
+feature = ecapa2_model.extract_feature(label='gfe1', type='mean')
 ```
+The following table describes the available features:
+| Feature Type| Description | Usage | Labels |
+| ----------- | ----------- | ----------- | ----------- |
+| Local Feature | Non-uniform effective receptive field in the frequency dimension of each frame-level feature.| Abstract features, probably usefull in tasks less related to speaker characteristics. | lfe1, lfe2, lfe3, lfe4
+| Global Feature | Uniform effective receptive field of each frame-level feature in the frequency dimension.| Generally capture intra-speaker variance better then speaker embeddings. E.g. speaker profiling, emotion recognition. | gfe1, gfe2, gfe3, pool
+| Speaker Embedding | Uniform effective receptive field of each frame-level feature in the frequency dimension.| Best for tasks directly depending on the speaker identity (as opposed to speaker characteristics). E.g. speaker verification, speaker diarization. | embedding
 <!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->