alexmourachko
commited on
Commit
·
a0f0e17
1
Parent(s):
2209e43
update readme to match github
Browse files
README.md
CHANGED
|
@@ -3,26 +3,30 @@ license: cc-by-nc-4.0
|
|
| 3 |
---
|
| 4 |
|
| 5 |
# SONAR
|
| 6 |
-
[[Paper]]()
|
| 7 |
[[Demo]](#usage)
|
| 8 |
|
| 9 |
-
We introduce SONAR, a new multilingual and multimodal fixed-size sentence embedding space
|
| 10 |
|
| 11 |
-
Speech segments can be embedded in the same
|
| 12 |
-
We also provide a **text decoder for 200 languages**, which allows us to perform text-to-text and speech-to-text machine translation, including for zero-shot language and modality combinations.
|
| 13 |
|
| 14 |
-
|
| 15 |
|
| 16 |
-
|
| 17 |
-
Model inference support thanks [Fairseq2](https://github.com/facebookresearch/fairseq2)
|
| 18 |
|
| 19 |
|
| 20 |
## Installing
|
| 21 |
-
|
| 22 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
| 23 |
|
| 24 |
## Usage
|
| 25 |
-
|
|
|
|
|
|
|
| 26 |
```python
|
| 27 |
from sonar.inference_pipelines.text import TextToEmbeddingModelPipeline
|
| 28 |
t2vec_model = TextToEmbeddingModelPipeline(encoder="text_sonar_basic_encoder",
|
|
@@ -32,7 +36,7 @@ t2vec_model.predict(sentences, source_lang="eng_Latn").shape
|
|
| 32 |
# torch.Size([2, 1024])
|
| 33 |
```
|
| 34 |
|
| 35 |
-
Translate with SONAR
|
| 36 |
```python
|
| 37 |
from sonar.inference_pipelines.text import TextToTextModelPipeline
|
| 38 |
t2t_model = TextToTextModelPipeline(encoder="text_sonar_basic_encoder",
|
|
@@ -44,50 +48,47 @@ t2t_model.predict(sentences, source_lang="eng_Latn", target_lang="fra_Latn")
|
|
| 44 |
# ['Mon nom est SONAR.', "Je peux intégrer les phrases dans l'espace vectoriel."]
|
| 45 |
```
|
| 46 |
|
| 47 |
-
Compute speech sentence embeddings
|
| 48 |
```python
|
| 49 |
-
import
|
| 50 |
-
|
| 51 |
-
|
| 52 |
-
speech_embedding_dp_builder = SpeechToEmbeddingPipeline.load_from_name("sonar_speech_encoder_eng")
|
| 53 |
|
| 54 |
-
|
| 55 |
-
|
| 56 |
-
|
| 57 |
-
|
| 58 |
-
|
| 59 |
-
|
| 60 |
|
| 61 |
-
|
| 62 |
-
|
| 63 |
-
speech_emb = next(iter(speech_embedding_dp))
|
| 64 |
-
speech_emb["audio"]["data"].sentence_embeddings
|
| 65 |
```
|
| 66 |
|
| 67 |
-
|
| 68 |
-
Speech-to-text with SONAR
|
| 69 |
```python
|
| 70 |
-
import
|
| 71 |
-
|
| 72 |
-
|
| 73 |
-
|
| 74 |
-
|
| 75 |
-
|
| 76 |
-
|
| 77 |
-
|
| 78 |
-
|
| 79 |
-
|
| 80 |
-
|
| 81 |
-
|
| 82 |
-
|
| 83 |
-
|
| 84 |
-
|
| 85 |
-
|
| 86 |
-
|
|
|
|
|
|
|
| 87 |
```
|
| 88 |
|
| 89 |
-
|
| 90 |
-
with BLASER
|
| 91 |
```Python
|
| 92 |
import torch
|
| 93 |
from sonar.models.blaser.loader import load_blaser_model
|
|
@@ -102,6 +103,7 @@ print(blaser_qe(src=emb, mt=emb).item()) # 4.9819
|
|
| 102 |
```
|
| 103 |
|
| 104 |
See more complete demo notebooks :
|
|
|
|
| 105 |
* [sonar text2text similarity and translation](examples/sonar_text_demo.ipynb)
|
| 106 |
* [sonar speech2text and other data pipeline examples](examples/inference_pipelines.ipynb)
|
| 107 |
|
|
|
|
| 3 |
---
|
| 4 |
|
| 5 |
# SONAR
|
| 6 |
+
[[Paper]](https://fb.workplace.com/groups/831302610278251/permalink/9713798772028546) (TODO: change for external link once published)
|
| 7 |
[[Demo]](#usage)
|
| 8 |
|
| 9 |
+
We introduce SONAR, a new multilingual and multimodal fixed-size sentence embedding space, with a full suite of speech and text encoders and decoders. It substantially outperforms existing sentence embeddings such as LASER3 and LabSE on the xsim and xsim++ multilingual similarity search tasks.
|
| 10 |
|
| 11 |
+
Speech segments can be embedded in the same SONAR embedding space using language-specific speech encoders trained in a teacher-student setting on speech transcription data. We also provide a single text decoder, which allows us to perform text-to-text and speech-to-text machine translation, including for zero-shot language and modality combinations.
|
|
|
|
| 12 |
|
| 13 |
+
*SONAR* stands for **S**entence-level multim**O**dal and la**N**guage-**A**gnostic **R**epresentations
|
| 14 |
|
| 15 |
+
The full list of supported languages (along with download links) can be found here [below](#supported-languages-and-download-links).
|
|
|
|
| 16 |
|
| 17 |
|
| 18 |
## Installing
|
| 19 |
+
SONAR depends mainly on [Fairseq2](https://github.com/fairinternal/fairseq2) and can be installed using (tested with `python=3.8`)
|
| 20 |
+
```bash
|
| 21 |
+
pip install --upgrade pip
|
| 22 |
+
pip config set global.extra-index-url https://test.pypi.org/simple/
|
| 23 |
+
pip install -e .
|
| 24 |
+
```
|
| 25 |
|
| 26 |
## Usage
|
| 27 |
+
fairseq2 will automatically download models into your `$TORCH_HOME/hub` directory upon using the commands below.
|
| 28 |
+
|
| 29 |
+
### Compute text sentence embeddings with SONAR:
|
| 30 |
```python
|
| 31 |
from sonar.inference_pipelines.text import TextToEmbeddingModelPipeline
|
| 32 |
t2vec_model = TextToEmbeddingModelPipeline(encoder="text_sonar_basic_encoder",
|
|
|
|
| 36 |
# torch.Size([2, 1024])
|
| 37 |
```
|
| 38 |
|
| 39 |
+
### Translate text with SONAR
|
| 40 |
```python
|
| 41 |
from sonar.inference_pipelines.text import TextToTextModelPipeline
|
| 42 |
t2t_model = TextToTextModelPipeline(encoder="text_sonar_basic_encoder",
|
|
|
|
| 48 |
# ['Mon nom est SONAR.', "Je peux intégrer les phrases dans l'espace vectoriel."]
|
| 49 |
```
|
| 50 |
|
| 51 |
+
### Compute speech sentence embeddings with SONAR
|
| 52 |
```python
|
| 53 |
+
from sonar.inference_pipelines.speech import SpeechToEmbeddingModelPipeline
|
| 54 |
+
s2vec_model = SpeechToEmbeddingModelPipeline(encoder="sonar_speech_encoder_eng")
|
|
|
|
|
|
|
| 55 |
|
| 56 |
+
s2vec_model.predict(["./tests/integration_tests/data/audio_files/audio_1.wav",
|
| 57 |
+
"./tests/integration_tests/data/audio_files/audio_2.wav"]).shape
|
| 58 |
+
# torch.Size([2, 1024])
|
| 59 |
+
import torchaudio
|
| 60 |
+
inp, sr = torchaudio.load("./tests/integration_tests/data/audio_files/audio_1.wav")
|
| 61 |
+
assert sr == 16000, "Sample rate should be 16kHz"
|
| 62 |
|
| 63 |
+
s2vec_model.predict([inp]).shape
|
| 64 |
+
# torch.Size([1, 1024])
|
|
|
|
|
|
|
| 65 |
```
|
| 66 |
|
| 67 |
+
### Speech-to-text translation with SONAR
|
|
|
|
| 68 |
```python
|
| 69 |
+
from sonar.inference_pipelines.speech import SpeechToTextModelPipeline
|
| 70 |
+
|
| 71 |
+
s2t_model = SpeechToTextModelPipeline(encoder="sonar_speech_encoder_eng",
|
| 72 |
+
decoder="text_sonar_basic_decoder",
|
| 73 |
+
tokenizer="text_sonar_basic_decoder")
|
| 74 |
+
|
| 75 |
+
import torchaudio
|
| 76 |
+
inp, sr = torchaudio.load("./tests/integration_tests/data/audio_files/audio_1.wav")
|
| 77 |
+
assert sr == 16000, "Sample rate should be 16kHz"
|
| 78 |
+
|
| 79 |
+
# passing loaded audio files
|
| 80 |
+
s2t_model.predict([inp], target_lang="eng_Latn")
|
| 81 |
+
# ['Television reports show white smoke coming from the plant.']
|
| 82 |
+
|
| 83 |
+
# passing multiple wav files
|
| 84 |
+
s2t_model.predict(["./tests/integration_tests/data/audio_files/audio_1.wav",
|
| 85 |
+
"./tests/integration_tests/data/audio_files/audio_2.wav"], target_lang="eng_Latn")
|
| 86 |
+
# ['Television reports show white smoke coming from the plant.',
|
| 87 |
+
# 'These couples may choose to make an adoption plan for their baby.']
|
| 88 |
```
|
| 89 |
|
| 90 |
+
|
| 91 |
+
### Predicting [cross-lingual semantic similarity](https://github.com/facebookresearch/fairseq/tree/nllb/examples/nllb/human_XSTS_eval) with BLASER 2 models
|
| 92 |
```Python
|
| 93 |
import torch
|
| 94 |
from sonar.models.blaser.loader import load_blaser_model
|
|
|
|
| 103 |
```
|
| 104 |
|
| 105 |
See more complete demo notebooks :
|
| 106 |
+
|
| 107 |
* [sonar text2text similarity and translation](examples/sonar_text_demo.ipynb)
|
| 108 |
* [sonar speech2text and other data pipeline examples](examples/inference_pipelines.ipynb)
|
| 109 |
|