Fix: Define missing audio input and required imports in example code (#3)

ab04aa7 verified 8 months ago

1.06 kB

	---
	license: mit
	---

	# Content Vec Best
	Official Repo: [ContentVec](https://github.com/auspicious3000/contentvec)
	This repo brings fairseq ContentVec model to HuggingFace Transformers.

	## How to use
	To use this model, you need to define
	```python
	from transformers import HubertModel
	import torch.nn as nn
	class HubertModelWithFinalProj(HubertModel):
	def __init__(self, config):
	super().__init__(config)

	# The final projection layer is only used for backward compatibility.
	# Following https://github.com/auspicious3000/contentvec/issues/6
	# Remove this layer is necessary to achieve the desired outcome.
	self.final_proj = nn.Linear(config.hidden_size, config.classifier_proj_size)
	```

	and then load the model with
	```python
	audio = torch.randn(1, 16000)

	model = HubertModelWithFinalProj.from_pretrained("lengyue233/content-vec-best")

	x = model(audio)["last_hidden_state"]
	```

	## How to convert
	You need to download the ContentVec_legacy model from the official repo, and then run
	```bash
	python convert.py
	```