Text Generation
Transformers
Safetensors
llama
conversational
text-generation-inference
double7 commited on
Commit
66610a7
·
verified ·
1 Parent(s): 69a748b

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +152 -0
README.md ADDED
@@ -0,0 +1,152 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: cc-by-nc-4.0
3
+ datasets:
4
+ - Unbabel/TowerBlocks-v0.1
5
+ language:
6
+ - en
7
+ - de
8
+ - fr
9
+ - nl
10
+ - it
11
+ - es
12
+ - pt
13
+ - ko
14
+ - ru
15
+ - zh
16
+ metrics:
17
+ - bleurt
18
+ - comet
19
+ base_model:
20
+ - double7/Tower-7b-MT-SFT
21
+ pipeline_tag: text-generation
22
+ ---
23
+ # Model Card for Tower-7b-EAX
24
+
25
+ ### Model Sources
26
+
27
+ - **Paper**: TODO
28
+
29
+ - **Link**: TODO
30
+
31
+ - **Repository**: TODO
32
+
33
+
34
+ ## Model Details
35
+
36
+ ### Model Description
37
+
38
+ Tower-7b-EAX is a language model specifically enhanced for inter non-English language pairs.
39
+ The model is built on top of TowerBase, following a two-stage training approach: first, an English-centric parallel data supervised fine-tuning stage (the SFT model is available at [Llama-2-7b-MT-SFT](https://huggingface.co/double7/Llama-2-7b-MT-SFT)), followed by a dedicated x2x optimization stage.
40
+ This approach strategically leverages the established English-centric capabilities of large language models to bootstrap comprehensive multilingual translation capabilities.
41
+
42
+ - **Model type:** A 7B parameter translation model built on top of TowerBase, enhanced for x2x language pairs through specialized optimization.
43
+ - **Language(s) (NLP):** English, Portuguese, Spanish, French, German, Dutch, Italian, Korean, Russian, Chinese
44
+ - **License:** CC-BY-NC-4.0, The LLAMA 2 Community License, Copyright © Meta Platforms, Inc. All Rights Reserved.
45
+
46
+
47
+ ## Intended uses & limitations
48
+
49
+ Tower-7b-EAX is designed for direct translation between non-English language pairs, addressing a significant gap in current LLM translation capabilities.
50
+ The model maintains strong performance on English-centric translation while significantly improving x2x translation quality.
51
+
52
+
53
+ Here's how you can run the model with Huggingface Transformers:
54
+
55
+ ```python
56
+ from transformers import AutoTokenizer, AutoModelForCausalLM
57
+
58
+ MODEL_PATH = "double7/Tower-7b-EAX"
59
+
60
+ tokenizer = AutoTokenizer.from_pretrained(MODEL_PATH)
61
+ model = AutoModelForCausalLM.from_pretrained(
62
+ MODEL_PATH, device_map="auto", torch_dtype="auto"
63
+ )
64
+
65
+ src_lang = "German"
66
+ trg_lang = "Chinese"
67
+ src_text = "Filmkarriere Collinges Filmdebüt in Die kleinen Füchse von 1941 brachte ihr eine Nominierung für den Academy Award als beste Nebendarstellerin ein."
68
+
69
+ prompt = f"Translate the following text from {src_lang} into {trg_lang}:\n{src_lang}: {src_text}\n{trg_lang}:"
70
+
71
+ # We use the tokenizer’s chat template to format each message - see https://huggingface.co/docs/transformers/main/en/chat_templating
72
+ messages = [
73
+ {"role": "user", "content": prompt},
74
+ ]
75
+
76
+ input_text = tokenizer.apply_chat_template(
77
+ messages, tokenize=False, add_generation_prompt=True
78
+ )
79
+
80
+ inputs = tokenizer(input_text, return_tensors="pt").to(model.device)
81
+
82
+ outputs = model.generate(**inputs, do_sample=False, max_new_tokens=256)
83
+ output_text = tokenizer.batch_decode(outputs, skip_special_tokens=False)[0]
84
+ print(output_text)
85
+ # <s><|im_start|> user
86
+ # Translate the following text from German into Chinese:
87
+ # German: Filmkarriere Collinges Filmdebüt in Die kleinen Füchse von 1941 brachte ihr eine Nominierung für den Academy Award als beste Nebendarstellerin ein.
88
+ # Chinese:<|im_end|>
89
+ # <|im_start|> assistant
90
+
91
+ ```
92
+
93
+ ### Translation Instructions
94
+
95
+ Following [TowerInstruct](https://arxiv.org/pdf/2402.17733), we use diverse translation instructions in training, you can use natural language to describe translation requests, such as:
96
+ ```python
97
+ prompt1 = f"Translate the following text from {src_lang} into {trg_lang}:\n{src_lang}: {src_text}\n{trg_lang}:"
98
+
99
+ prompt1 = f"Please provide a translation from {src_lang} to {trg_lang} for the following text:\n{src_text}\nTarget:",
100
+
101
+ prompt2 = f"Translate this {src_lang} text into {trg_lang}:\nSource: {src_text}\nTranslation:",
102
+ ```
103
+
104
+ We use `prompt1` for the evaluation.
105
+
106
+ ### Out-of-Scope Use
107
+
108
+ The model is not guaranteed to perform for languages other than the 10 languages it supports.
109
+
110
+ ## Bias, Risks, and Limitations
111
+
112
+ Tower-7b-EAX has not been aligned to human preferences, so the model may generate problematic outputs (e.g., hallucinations, harmful content, or false statements).
113
+
114
+
115
+ ## Prompt Format
116
+
117
+ Tower-7b-EAX was trained using the `ChatML` prompt templates without any system prompts. An example follows below:
118
+ ```
119
+ <|im_start|>user
120
+ {USER PROMPT}<|im_end|>
121
+ <|im_start|>assistant
122
+ {MODEL RESPONSE}<|im_end|>
123
+ <|im_start|>user
124
+ [...]
125
+ ```
126
+
127
+
128
+ ## Training Details
129
+
130
+ ### Training Data
131
+
132
+ We use synthetic data for optimization, which is synthesized using [Tower-7b-MT-SFT](https://huggingface.co/double7/Tower-7b-MT-SFT), with translation data from [TowerBlocks](https://huggingface.co/datasets/Unbabel/TowerBlocks-v0.1) as seeds.
133
+
134
+
135
+
136
+ ### Training hyperparameters
137
+
138
+ The following hyperparameters were used during x2x training:
139
+ - learning_rate: 2e-07
140
+ - total_train_batch_size: 64
141
+ - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
142
+ - lr_scheduler_type: cosine
143
+ - lr_scheduler_warmup_ratio: 0.1
144
+ - num_epochs: 1
145
+ - max_seq_length: 2048
146
+ - DPO beta: 0.4
147
+ - SFT coefficient: 2.0
148
+
149
+
150
+ ## Citation
151
+
152
+ TODO