pinzhenchen
/

Unbabel_Tower-Plus-9B

@@ -2,41 +2,38 @@
 base_model: google/gemma-2-9b
 license: cc-by-nc-sa-4.0
 language:
-  - de
-  - nl
-  - is
-  - es
-  - fr
-  - pt
-  - uk
-  - hi
-  - zh
-  - ru
-  - cs
-  - ko
-  - ja
-  - it
-  - en
-  - da
-  - pl
-  - hu
-  - sv
-  - 'no'
-  - ro
-  - fi
 library_name: transformers
-datasets:
-  - Widn/TowerBlocks-v4-250205
-  - Widn/TowerDPO-v4-sugarloaf-250227
 ---
 # Model Description:
-**Tower-v4-Sugarloaf** is build on top of Gemma 2 9B. The model goes through the Continuous Pretraining (CPT), Instruction Tuning (IT), Weighted Preference Optimization (WPO). During all stages we include parallel and multilingual data (covering 22 languages).
-This approach makes Tower v4 Sugarloaf one of the best multilingual LLMs under 10B parameters.
-- **Developed by:** Widn
 - **Model type:** A 9B parameter model fine-tuned on a mix of _translation-related tasks_ as well as  _general instruction-following_ datasets that include reasoning, code instructions, etc.
 - **Languages:** German, Spanish, French, Italian, Korean, Dutch, Russian, English, Portuguese (Portugal), Portuguese (Brazilian), Spanish (Latin America), Chinese (Simplified), Chinese (Traditional), Czech, Ukrainian, Hindi, Icelandic, Japanese, Polish, Swedish, Hungarian, Romanian, Danish, Norwegian (Nynorsk), Norwegian (Bokmål), Finnish
 - **License:** CC-BY-NC-4.0
@@ -80,7 +77,7 @@ sampling_params = SamplingParams(
   temperature=0,
   max_tokens=8192,
 )
-llm = LLM(model="Widn/Tower-4-Sugarloaf", tensor_parallel_size=1)
 messages = [{"role": "user", "content": "Translate: Hello, world! into Portuguese."}]
 outputs = llm.chat(messages, sampling_params)
 # Make sure your prompt_token_ids look like this
@@ -97,10 +94,10 @@ print (outputs[0].outputs[0].text)
 import torch
 from transformers import pipeline
-pipe = pipeline("text-generation", model="Widn/Tower-4-Sugarloaf", device_map="auto")
 # We use the tokenizer’s chat template to format each message - see https://huggingface.co/docs/transformers/main/en/chat_templating
 messages = [{"role": "user", "content": "Translate: Hello, world! into Portuguese."}]
 input_ids = pipe.tokenizer.apply_chat_template(messages, tokenize=True, add_generation_prompt=True)
 outputs = pipe(messages, max_new_tokens=256, do_sample=False)
 print(outputs[0]["generated_text"])
-```

 base_model: google/gemma-2-9b
 license: cc-by-nc-sa-4.0
 language:
+- de
+- nl
+- is
+- es
+- fr
+- pt
+- uk
+- hi
+- zh
+- ru
+- cs
+- ko
+- ja
+- it
+- en
+- da
+- pl
+- hu
+- sv
+- 'no'
+- ro
+- fi
 library_name: transformers
 ---
 # Model Description:
+**Tower+ 9B** is build on top of Gemma 2 9B. The model goes through the Continuous Pretraining (CPT), Instruction Tuning (IT), Weighted Preference Optimization (WPO). During all stages we include parallel and multilingual data (covering 22 languages).
+This approach makes Tower+ 9B one of the best multilingual LLMs under 10B parameters.
+- **Developed by:** Unbabel
 - **Model type:** A 9B parameter model fine-tuned on a mix of _translation-related tasks_ as well as  _general instruction-following_ datasets that include reasoning, code instructions, etc.
 - **Languages:** German, Spanish, French, Italian, Korean, Dutch, Russian, English, Portuguese (Portugal), Portuguese (Brazilian), Spanish (Latin America), Chinese (Simplified), Chinese (Traditional), Czech, Ukrainian, Hindi, Icelandic, Japanese, Polish, Swedish, Hungarian, Romanian, Danish, Norwegian (Nynorsk), Norwegian (Bokmål), Finnish
 - **License:** CC-BY-NC-4.0
   temperature=0,
   max_tokens=8192,
 )
+llm = LLM(model="Unbabel/Tower-Plus-9B", tensor_parallel_size=1)
 messages = [{"role": "user", "content": "Translate: Hello, world! into Portuguese."}]
 outputs = llm.chat(messages, sampling_params)
 # Make sure your prompt_token_ids look like this
 import torch
 from transformers import pipeline
+pipe = pipeline("text-generation", model="Unbabel/Tower-Plus-9B", device_map="auto")
 # We use the tokenizer’s chat template to format each message - see https://huggingface.co/docs/transformers/main/en/chat_templating
 messages = [{"role": "user", "content": "Translate: Hello, world! into Portuguese."}]
 input_ids = pipe.tokenizer.apply_chat_template(messages, tokenize=True, add_generation_prompt=True)
 outputs = pipe(messages, max_new_tokens=256, do_sample=False)
 print(outputs[0]["generated_text"])
+```