| --- |
| license: apache-2.0 |
| library_name: transformers |
| base_model: BSC-LT/salamandra-2b-instruct |
| pipeline_tag: text-generation |
| language: |
| - bg |
| - ca |
| - code |
| - cs |
| - cy |
| - da |
| - de |
| - el |
| - en |
| - es |
| - et |
| - eu |
| - fi |
| - fr |
| - ga |
| - gl |
| - hr |
| - hu |
| - it |
| - lt |
| - lv |
| - mt |
| - nl |
| - nn |
| - \no |
| - oc |
| - pl |
| - pt |
| - ro |
| - ru |
| - sh |
| - sk |
| - sl |
| - sr |
| - sv |
| - uk |
| --- |
| |
|  |
|
|
| # Salamandra-2b-instruct-fp8 Model Card |
|
|
| This model is the fp8-quantized version of [Salamandra-2b-instruct](https://huggingface.co/BSC-LT/salamandra-2b-instruct). |
|
|
| The model weights are quantized from FP16 to FP8 (8-bit weights) using the FP8 quantization algorithm |
| from [NeuralMagic](https://neuralmagic.com/blog/vllm-brings-fp8-inference-to-the-open-source-community/). |
| Inferencing with this model can be done using [VLLM](https://docs.vllm.ai/en/stable/models/engine_args.html). |
|
|
| Salamandra is a highly multilingual model pre-trained from scratch that comes in three different |
| sizes — 2B, 7B and 40B parameters — with their respective base and instruction-tuned variants, |
| promoted and financed by the Government of Catalonia through the [Aina Project](https://projecteaina.cat/) |
| and the _Ministerio para la Transformación Digital y de la Función Pública_ - Funded by EU – NextGenerationEU |
| within the framework of [ILENIA Project](https://proyectoilenia.es/) with reference 2022/TL22/00215337. |
|
|
| This model card corresponds to the fp8-quantized version of Salamandra-2b-instruct. |
|
|
| The entire Salamandra family is released under a permissive [Apache 2.0 license]((https://www.apache.org/licenses/LICENSE-2.0)). |
|
|
|
|
| ## How to Use |
|
|
| The following example code works under ``Python 3.9.16``, ``vllm==0.6.3.post1``, ``torch==2.4.0`` and ``torchvision==0.19.0``, though it should run on |
| any current version of the libraries. This is an example of a conversational chatbot using the model: |
|
|
| ``` |
| from vllm import LLM, SamplingParams |
| |
| model_name = "BSC-LT/salamandra-2b-instruct-fp8" |
| llm = LLM(model=model_name) |
| |
| messages = [] |
| |
| while True: |
| user_input = input("user >> ") |
| if user_input.lower() == "exit": |
| print("Chat ended.") |
| break |
| |
| messages.append({'role': 'user', 'content': user_input}) |
| |
| outputs = llm.chat(messages, |
| sampling_params=SamplingParams( |
| temperature=0.5, |
| stop_token_ids=[5], |
| max_tokens=200) |
| )[0].outputs |
| |
| model_output = outputs[0].text |
| print(f'assistant >> {model_output}') |
| messages.append({'role': 'assistant', 'content': model_output}) |
| ``` |
|
|
|
|
| ### Author |
| International Business Machines (IBM). |
|
|
| ### Copyright |
| International Business Machines (IBM). |
|
|
| ### Contact |
| For further information, please send an email to <langtech@bsc.es>. |
|
|
| ### Acknowledgements |
| We appreciate the collaboration with IBM in this work. |
| Specifically, the IBM team created fp8-quantized version of the Salamandra-2b-instruct model released here. |
|
|
| ### Disclaimer |
| Be aware that the model may contain biases or other unintended distortions. |
| When third parties deploy systems or provide services based on this model, or use the model themselves, |
| they bear the responsibility for mitigating any associated risks and ensuring compliance with applicable |
| regulations, including those governing the use of Artificial Intelligence. |
|
|
| Barcelona Supercomputing Center and International Business Machines shall |
| not be held liable for any outcomes resulting from third-party use. |
|
|
| ### License |
| [Apache License, Version 2.0](https://www.apache.org/licenses/LICENSE-2.0) |