double7
/

Tower-7b-EAX

+---
+license: cc-by-nc-4.0
+datasets:
+- Unbabel/TowerBlocks-v0.1
+language:
+- en
+- de
+- fr
+- nl
+- it
+- es
+- pt
+- ko
+- ru
+- zh
+metrics:
+- bleurt
+- comet
+base_model:
+- double7/Tower-7b-MT-SFT
+pipeline_tag: text-generation
+---
+# Model Card for Tower-7b-EAX
+### Model Sources
+- **Paper**: TODO
+- **Link**: TODO
+- **Repository**: TODO
+## Model Details
+### Model Description
+Tower-7b-EAX is a language model specifically enhanced for inter non-English language pairs.
+The model is built on top of TowerBase, following a two-stage training approach: first, an English-centric parallel data supervised fine-tuning stage (the SFT model is available at [Llama-2-7b-MT-SFT](https://huggingface.co/double7/Llama-2-7b-MT-SFT)), followed by a dedicated x2x optimization stage.
+This approach strategically leverages the established English-centric capabilities of large language models to bootstrap comprehensive multilingual translation capabilities.
+- **Model type:** A 7B parameter translation model built on top of TowerBase, enhanced for x2x language pairs through specialized optimization.
+- **Language(s) (NLP):** English, Portuguese, Spanish, French, German, Dutch, Italian, Korean, Russian, Chinese
+- **License:** CC-BY-NC-4.0, The LLAMA 2 Community License, Copyright © Meta Platforms, Inc. All Rights Reserved.
+## Intended uses & limitations
+Tower-7b-EAX is designed for direct translation between non-English language pairs, addressing a significant gap in current LLM translation capabilities.
+The model maintains strong performance on English-centric translation while significantly improving x2x translation quality.
+Here's how you can run the model with Huggingface Transformers:
+```python
+from transformers import AutoTokenizer, AutoModelForCausalLM
+MODEL_PATH = "double7/Tower-7b-EAX"
+tokenizer = AutoTokenizer.from_pretrained(MODEL_PATH)
+model = AutoModelForCausalLM.from_pretrained(
+    MODEL_PATH, device_map="auto", torch_dtype="auto"
+)
+src_lang = "German"
+trg_lang = "Chinese"
+src_text = "Filmkarriere Collinges Filmdebüt in Die kleinen Füchse von 1941 brachte ihr eine Nominierung für den Academy Award als beste Nebendarstellerin ein."
+prompt = f"Translate the following text from {src_lang} into {trg_lang}:\n{src_lang}: {src_text}\n{trg_lang}:"
+# We use the tokenizer’s chat template to format each message - see https://huggingface.co/docs/transformers/main/en/chat_templating
+messages = [
+    {"role": "user", "content": prompt},
+]
+input_text = tokenizer.apply_chat_template(
+    messages, tokenize=False, add_generation_prompt=True
+)
+inputs = tokenizer(input_text, return_tensors="pt").to(model.device)
+outputs = model.generate(**inputs, do_sample=False, max_new_tokens=256)
+output_text = tokenizer.batch_decode(outputs, skip_special_tokens=False)[0]
+print(output_text)
+# <s><|im_start|> user
+# Translate the following text from German into Chinese:
+# German: Filmkarriere Collinges Filmdebüt in Die kleinen Füchse von 1941 brachte ihr eine Nominierung für den Academy Award als beste Nebendarstellerin ein.
+# Chinese:<|im_end|>
+# <|im_start|> assistant
+```
+### Translation Instructions
+Following [TowerInstruct](https://arxiv.org/pdf/2402.17733), we use diverse translation instructions in training, you can use natural language to describe translation requests, such as:
+```python
+prompt1 = f"Translate the following text from {src_lang} into {trg_lang}:\n{src_lang}: {src_text}\n{trg_lang}:"
+prompt1 = f"Please provide a translation from {src_lang} to {trg_lang} for the following text:\n{src_text}\nTarget:",
+prompt2 = f"Translate this {src_lang} text into {trg_lang}:\nSource: {src_text}\nTranslation:",
+```
+We use `prompt1` for the evaluation.
+### Out-of-Scope Use
+The model is not guaranteed to perform for languages other than the 10 languages it supports.
+## Bias, Risks, and Limitations
+Tower-7b-EAX has not been aligned to human preferences, so the model may generate problematic outputs (e.g., hallucinations, harmful content, or false statements).
+## Prompt Format
+Tower-7b-EAX was trained using the `ChatML` prompt templates without any system prompts. An example follows below:
+```
+<|im_start|>user
+{USER PROMPT}<|im_end|>
+<|im_start|>assistant
+{MODEL RESPONSE}<|im_end|>
+<|im_start|>user
+[...]
+```
+## Training Details
+### Training Data
+We use synthetic data for optimization, which is synthesized using [Tower-7b-MT-SFT](https://huggingface.co/double7/Tower-7b-MT-SFT), with translation data from [TowerBlocks](https://huggingface.co/datasets/Unbabel/TowerBlocks-v0.1) as seeds.
+### Training hyperparameters
+The following hyperparameters were used during x2x training:
+- learning_rate: 2e-07
+- total_train_batch_size: 64
+- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
+- lr_scheduler_type: cosine
+- lr_scheduler_warmup_ratio: 0.1
+- num_epochs: 1
+- max_seq_length: 2048
+- DPO beta: 0.4
+- SFT coefficient: 2.0
+## Citation
+TODO