Update README.md
Browse files
README.md
CHANGED
|
@@ -1,6 +1,6 @@
|
|
| 1 |
---
|
| 2 |
license: cc-by-nc-4.0
|
| 3 |
-
pipeline_tag:
|
| 4 |
---
|
| 5 |
# SimulSeamless
|
| 6 |

|
|
@@ -16,7 +16,7 @@ In the case of [💬 Inference using docker](#💬-inference-using-docker), use
|
|
| 16 |
`f1f5b9a69a47496630aa43605f1bd46e5484a2f4` for SimulEval.
|
| 17 |
|
| 18 |
## 🤖 Inference using your environment
|
| 19 |
-
|
| 20 |
[Fairseq Simultaneous Translation repository](https://github.com/facebookresearch/fairseq/blob/main/examples/speech_to_text/docs/simulst_mustc_example.md#inference--evaluation):
|
| 21 |
`${LIST_OF_AUDIO}` is the list of audio paths and `${TGT_FILE}` the segment-wise references in the
|
| 22 |
target language.
|
|
@@ -30,7 +30,7 @@ tokenizer used, for example, to evaluate German) or `char` (e.g., to evaluate ch
|
|
| 30 |
languages such as Chinese or Japanese).
|
| 31 |
|
| 32 |
The simultaneous inference of SimulSeamless is based on
|
| 33 |
-
[AlignAtt](ALIGNATT_SIMULST_AGENT_INTERSPEECH2023.md), thus the __f__ parameter (`${FRAME}`) and the
|
| 34 |
layer from which to extract the attention scores (`${LAYER}`) have to be set accordingly.
|
| 35 |
|
| 36 |
### Instruction to replicate IWSLT 2024 results ↙️
|
|
@@ -97,41 +97,6 @@ To set, `${TGT_LANG}`, `${FRAME}`, `${LAYER}`, `${BLEU_TOKENIZER}`, `${LATENCY_U
|
|
| 97 |
`${LIST_OF_AUDIO}`, `${TGT_FILE}`, `${SEG_SIZE}`, and `${OUT_DIR}` refer to
|
| 98 |
[🤖 Inference using your environment](#-inference-using-your-environment).
|
| 99 |
|
| 100 |
-
### Instruction to recreate the docker images
|
| 101 |
-
|
| 102 |
-
To recreate the docker images, follow the steps below.
|
| 103 |
-
|
| 104 |
-
1. Download SimulEval and this repository.
|
| 105 |
-
2. Create a `Dockerfile` with the following content:
|
| 106 |
-
```
|
| 107 |
-
FROM python:3.9
|
| 108 |
-
RUN pip install torch==1.11.0+cu113 torchvision==0.12.0+cu113 torchaudio==0.11.0 --extra-index-url https://download.pytorch.org/whl/cu113
|
| 109 |
-
ADD /SimulEval /SimulEval
|
| 110 |
-
WORKDIR /SimulEval
|
| 111 |
-
RUN pip install -e .
|
| 112 |
-
WORKDIR ../
|
| 113 |
-
ADD /fbk-fairseq /fbk-fairseq
|
| 114 |
-
WORKDIR /fbk-fairseq
|
| 115 |
-
RUN pip install -e .
|
| 116 |
-
RUN pip install -r speech_requirements.txt
|
| 117 |
-
WORKDIR ../
|
| 118 |
-
RUN pip install sentencepiece
|
| 119 |
-
RUN pip install transformers
|
| 120 |
-
|
| 121 |
-
ENTRYPOINT simuleval --standalone --remote-port 2024 \
|
| 122 |
-
--agent-class examples.speech_to_text.simultaneous_translation.agents.v1_1.simul_alignatt_seamlessm4t.AlignAttSeamlessS2T \
|
| 123 |
-
--model-size medium --num-beams 5 --user-dir fbk-fairseq/examples \
|
| 124 |
-
--target-language $TGTLANG --frame-num $FRAME --extract-attn-from-layer $LAYER --device $DEV \
|
| 125 |
-
--sacrebleu-tokenizer ${BLEU_TOKENIZER} --eval-latency-unit ${LATENCY_UNIT}
|
| 126 |
-
```
|
| 127 |
-
3. Build the docker image:
|
| 128 |
-
```
|
| 129 |
-
docker build -t simulseamless .
|
| 130 |
-
```
|
| 131 |
-
4. Save the docker image:
|
| 132 |
-
```
|
| 133 |
-
docker save -o simulseamless.tar simulseamless:latest
|
| 134 |
-
```
|
| 135 |
|
| 136 |
## 📍Citation
|
| 137 |
```bibtex
|
|
|
|
| 1 |
---
|
| 2 |
license: cc-by-nc-4.0
|
| 3 |
+
pipeline_tag: automatic-speech-recognition
|
| 4 |
---
|
| 5 |
# SimulSeamless
|
| 6 |

|
|
|
|
| 16 |
`f1f5b9a69a47496630aa43605f1bd46e5484a2f4` for SimulEval.
|
| 17 |
|
| 18 |
## 🤖 Inference using your environment
|
| 19 |
+
Set `--source`, and `--target` as described in the
|
| 20 |
[Fairseq Simultaneous Translation repository](https://github.com/facebookresearch/fairseq/blob/main/examples/speech_to_text/docs/simulst_mustc_example.md#inference--evaluation):
|
| 21 |
`${LIST_OF_AUDIO}` is the list of audio paths and `${TGT_FILE}` the segment-wise references in the
|
| 22 |
target language.
|
|
|
|
| 30 |
languages such as Chinese or Japanese).
|
| 31 |
|
| 32 |
The simultaneous inference of SimulSeamless is based on
|
| 33 |
+
[AlignAtt](https://github.com/hlt-mt/FBK-fairseq/blob/master/fbk_works/ALIGNATT_SIMULST_AGENT_INTERSPEECH2023.md), thus the __f__ parameter (`${FRAME}`) and the
|
| 34 |
layer from which to extract the attention scores (`${LAYER}`) have to be set accordingly.
|
| 35 |
|
| 36 |
### Instruction to replicate IWSLT 2024 results ↙️
|
|
|
|
| 97 |
`${LIST_OF_AUDIO}`, `${TGT_FILE}`, `${SEG_SIZE}`, and `${OUT_DIR}` refer to
|
| 98 |
[🤖 Inference using your environment](#-inference-using-your-environment).
|
| 99 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 100 |
|
| 101 |
## 📍Citation
|
| 102 |
```bibtex
|