Added Usage section to README
Browse files
README.md
CHANGED
|
@@ -27,6 +27,70 @@ The model is based on [facebook/wav2vec2-xls-r-300m](https://huggingface.co/face
|
|
| 27 |
|
| 28 |
Note that our baseline models were trained without phonetic feature classifiers and therefore only support phoneme recognition.
|
| 29 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 30 |
Citation
|
| 31 |
========
|
| 32 |
|
|
|
|
| 27 |
|
| 28 |
Note that our baseline models were trained without phonetic feature classifiers and therefore only support phoneme recognition.
|
| 29 |
|
| 30 |
+
Usage
|
| 31 |
+
=====
|
| 32 |
+
|
| 33 |
+
A pre-trained model can be loaded with the [`allophant`](https://github.com/kgnlp/allophant) package from a huggingface checkpoint or local file:
|
| 34 |
+
|
| 35 |
+
```python
|
| 36 |
+
from allophant.estimator import Estimator
|
| 37 |
+
|
| 38 |
+
device = "cpu"
|
| 39 |
+
model, attribute_indexer = Estimator.restore("kgnlp/allophant-shared", device=device)
|
| 40 |
+
supported_features = attribute_indexer.feature_names
|
| 41 |
+
# The phonetic feature categories supported by the model, including "phonemes"
|
| 42 |
+
print(supported_features)
|
| 43 |
+
```
|
| 44 |
+
Allophant supports decoding custom phoneme inventories, which can be constructed in multiple ways:
|
| 45 |
+
|
| 46 |
+
```python
|
| 47 |
+
# 1. For a single language:
|
| 48 |
+
inventory = attribute_indexer.phoneme_inventory("es")
|
| 49 |
+
# 2. For multiple languages, e.g. in code-switching scenarios
|
| 50 |
+
inventory = attribute_indexer.phoneme_inventory(["es", "it"])
|
| 51 |
+
# 3. Any custom selection of phones for which features are available in the Allophoible database
|
| 52 |
+
inventory = ['a', 'ai̯', 'au̯', 'b', 'e', 'eu̯', 'f', 'ɡ', 'l', 'ʎ', 'm', 'ɲ', 'o', 'p', 'ɾ', 's', 't̠ʃ']
|
| 53 |
+
````
|
| 54 |
+
|
| 55 |
+
Audio files can then be loaded, resampled and transcribed using the given
|
| 56 |
+
inventory by first computing the log probabilities for each classifier:
|
| 57 |
+
|
| 58 |
+
```python
|
| 59 |
+
import torch
|
| 60 |
+
import torchaudio
|
| 61 |
+
from allophant.dataset_processing import Batch
|
| 62 |
+
|
| 63 |
+
# Load an audio file and resample the first channel to the sample rate used by the model
|
| 64 |
+
audio, sample_rate = torchaudio.load("utterance.wav")
|
| 65 |
+
audio = torchaudio.functional.resample(audio[:1], sample_rate, model.sample_rate)
|
| 66 |
+
|
| 67 |
+
# Construct a batch of 0-padded single channel audio, lengths and language IDs
|
| 68 |
+
# Language ID can be 0 for inference
|
| 69 |
+
batch = Batch(audio, torch.tensor([audio.shape[1]]), torch.zeros(1))
|
| 70 |
+
model_outputs = model.predict(
|
| 71 |
+
batch.to(device),
|
| 72 |
+
attribute_indexer.composition_feature_matrix(inventory).to(device)
|
| 73 |
+
)
|
| 74 |
+
```
|
| 75 |
+
|
| 76 |
+
Finally, the log probabilities can be decoded into the recognized phonemes or phonetic features:
|
| 77 |
+
|
| 78 |
+
```python
|
| 79 |
+
from allophant import predictions
|
| 80 |
+
|
| 81 |
+
# Create a feature mapping for your inventory and CTC decoders for the desired feature set
|
| 82 |
+
inventory_indexer = attribute_indexer.attributes.subset(inventory)
|
| 83 |
+
ctc_decoders = predictions.feature_decoders(inventory_indexer, feature_names=supported_features)
|
| 84 |
+
|
| 85 |
+
for feature_name, decoder in ctc_decoders.items():
|
| 86 |
+
decoded = decoder(model_outputs.outputs[feature_name].transpose(1, 0), model_outputs.lengths)
|
| 87 |
+
# Print the feature name and values for each utterance in the batch
|
| 88 |
+
for [hypothesis] in decoded:
|
| 89 |
+
# NOTE: token indices are offset by one due to the <BLANK> token used during decoding
|
| 90 |
+
recognized = inventory_indexer.feature_values(feature_name, hypothesis.tokens - 1)
|
| 91 |
+
print(feature_name, recognized)
|
| 92 |
+
```
|
| 93 |
+
|
| 94 |
Citation
|
| 95 |
========
|
| 96 |
|