| license: mit | |
| # Content Vec Best | |
| Official Repo: [ContentVec](https://github.com/auspicious3000/contentvec) | |
| This repo brings fairseq ContentVec model to HuggingFace Transformers. | |
| ## How to use | |
| To use this model, you need to define | |
| ```python | |
| from transformers import HubertModel | |
| import torch.nn as nn | |
| class HubertModelWithFinalProj(HubertModel): | |
| def __init__(self, config): | |
| super().__init__(config) | |
| # The final projection layer is only used for backward compatibility. | |
| # Following https://github.com/auspicious3000/contentvec/issues/6 | |
| # Remove this layer is necessary to achieve the desired outcome. | |
| self.final_proj = nn.Linear(config.hidden_size, config.classifier_proj_size) | |
| ``` | |
| and then load the model with | |
| ```python | |
| audio = torch.randn(1, 16000) | |
| model = HubertModelWithFinalProj.from_pretrained("lengyue233/content-vec-best") | |
| x = model(audio)["last_hidden_state"] | |
| ``` | |
| ## How to convert | |
| You need to download the ContentVec_legacy model from the official repo, and then run | |
| ```bash | |
| python convert.py | |
| ``` | |