Jenthe commited on
Commit
dcb24e0
·
1 Parent(s): b06190b

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +9 -9
README.md CHANGED
@@ -43,7 +43,7 @@ Download model:
43
  from huggingface_hub import hf_hub_download
44
 
45
  # automatically checks for cached file, optionally set `cache_dir` location
46
- model_file = hf_hub_download(repo_id='Jenthe/ECAPA2', filename='model.pt', cache_dir=None)
47
  ```
48
 
49
 
@@ -55,10 +55,10 @@ Extracting speaker embeddings is easy and only requires a few lines of code:
55
  import torch
56
  import torchaudio
57
 
58
- ecapa2_model = torch.jit.load(model_file, map_location='cpu')
59
  audio, sr = torchaudio.load('sample.wav') # sample rate of 16 kHz expected
60
 
61
- embedding = ecapa2_model(audio)
62
  ```
63
 
64
  For faster, 16-bit half-precision CUDA inference (recommended):
@@ -67,11 +67,11 @@ For faster, 16-bit half-precision CUDA inference (recommended):
67
  import torch
68
  import torchaudio
69
 
70
- ecapa2_model = torch.jit.load(model_file, map_location='cuda')
71
- ecapa2_model.half() # optional, but results in faster inference
72
  audio, sr = torchaudio.load('sample.wav') # sample rate of 16 kHz expected
73
 
74
- embedding = ecapa2_model(audio)
75
  ```
76
 
77
  There is no need for `ecapa2_model.eval()` or `torch.no_grad()`, this is done automatically.
@@ -82,13 +82,13 @@ For the extraction of other hierachical features, the `label` argument can be us
82
 
83
  ```python
84
  # default, only extract the embedding
85
- feature = ecapa2_model(audio, label='embedding')
86
 
87
  # concatenates the gfe_1, pool and embedding features
88
- feature = ecapa2_model(audio, label='gfe_1|pool|embedding')
89
 
90
  # returns the same output as previous example, concatenation always follows the order of the network
91
- feature = ecapa2_model(audio, label='embedding|gfe_1|pool')
92
  ```
93
 
94
  The following table describes the available features. All features consists of the mean and variance of the frame-level encodings at the indicated layer, expect for the speaker embedding.
 
43
  from huggingface_hub import hf_hub_download
44
 
45
  # automatically checks for cached file, optionally set `cache_dir` location
46
+ model_file = hf_hub_download(repo_id='Jenthe/ECAPA2', filename='ecapa2.pt', cache_dir=None)
47
  ```
48
 
49
 
 
55
  import torch
56
  import torchaudio
57
 
58
+ ecapa2 = torch.jit.load(model_file, map_location='cpu')
59
  audio, sr = torchaudio.load('sample.wav') # sample rate of 16 kHz expected
60
 
61
+ embedding = ecapa2(audio)
62
  ```
63
 
64
  For faster, 16-bit half-precision CUDA inference (recommended):
 
67
  import torch
68
  import torchaudio
69
 
70
+ ecapa2 = torch.jit.load(model_file, map_location='cuda')
71
+ ecapa2.half() # optional, but results in faster inference
72
  audio, sr = torchaudio.load('sample.wav') # sample rate of 16 kHz expected
73
 
74
+ embedding = ecapa2(audio)
75
  ```
76
 
77
  There is no need for `ecapa2_model.eval()` or `torch.no_grad()`, this is done automatically.
 
82
 
83
  ```python
84
  # default, only extract the embedding
85
+ feature = ecapa2(audio, label='embedding')
86
 
87
  # concatenates the gfe_1, pool and embedding features
88
+ feature = ecapa2(audio, label='gfe_1|pool|embedding')
89
 
90
  # returns the same output as previous example, concatenation always follows the order of the network
91
+ feature = ecapa2(audio, label='embedding|gfe_1|pool')
92
  ```
93
 
94
  The following table describes the available features. All features consists of the mean and variance of the frame-level encodings at the indicated layer, expect for the speaker embedding.