BSC-LT
/

salamandra-2b-instruct-fp8

Text Generation

text-generation-inference

compressed-tensors

🇪🇺 Region: EU

Model card Files Files and versions

salamandra-2b-instruct-fp8 / README.md

jsaizant's picture

Upload tokenizer

8c0e097 verified about 1 year ago

|

history blame contribute delete

3.66 kB

	---
	license: apache-2.0
	library_name: transformers
	base_model: BSC-LT/salamandra-2b-instruct
	pipeline_tag: text-generation
	language:
	- bg
	- ca
	- code
	- cs
	- cy
	- da
	- de
	- el
	- en
	- es
	- et
	- eu
	- fi
	- fr
	- ga
	- gl
	- hr
	- hu
	- it
	- lt
	- lv
	- mt
	- nl
	- nn
	- \no
	- oc
	- pl
	- pt
	- ro
	- ru
	- sh
	- sk
	- sl
	- sr
	- sv
	- uk
	---

	![image/png](https://cdn-uploads.huggingface.co/production/uploads/633b489acbdbadd99c0b75ef/6y7ivSTUx4zYTn-2o-aAZ.png)

	# Salamandra-2b-instruct-fp8 Model Card

	This model is the fp8-quantized version of [Salamandra-2b-instruct](https://huggingface.co/BSC-LT/salamandra-2b-instruct).

	The model weights are quantized from FP16 to FP8 (8-bit weights) using the FP8 quantization algorithm
	from [NeuralMagic](https://neuralmagic.com/blog/vllm-brings-fp8-inference-to-the-open-source-community/).
	Inferencing with this model can be done using [VLLM](https://docs.vllm.ai/en/stable/models/engine_args.html).

	Salamandra is a highly multilingual model pre-trained from scratch that comes in three different
	sizes — 2B, 7B and 40B parameters — with their respective base and instruction-tuned variants,
	promoted and financed by the Government of Catalonia through the [Aina Project](https://projecteaina.cat/)
	and the _Ministerio para la Transformación Digital y de la Función Pública_ - Funded by EU – NextGenerationEU
	within the framework of [ILENIA Project](https://proyectoilenia.es/) with reference 2022/TL22/00215337.

	This model card corresponds to the fp8-quantized version of Salamandra-2b-instruct.

	The entire Salamandra family is released under a permissive [Apache 2.0 license]((https://www.apache.org/licenses/LICENSE-2.0)).


	## How to Use

	The following example code works under ``Python 3.9.16``, ``vllm==0.6.3.post1``, ``torch==2.4.0`` and ``torchvision==0.19.0``, though it should run on
	any current version of the libraries. This is an example of a conversational chatbot using the model:

	```
	from vllm import LLM, SamplingParams

	model_name = "BSC-LT/salamandra-2b-instruct-fp8"
	llm = LLM(model=model_name)

	messages = []

	while True:
	user_input = input("user >> ")
	if user_input.lower() == "exit":
	print("Chat ended.")
	break

	messages.append({'role': 'user', 'content': user_input})

	outputs = llm.chat(messages,
	sampling_params=SamplingParams(
	temperature=0.5,
	stop_token_ids=[5],
	max_tokens=200)
	)[0].outputs

	model_output = outputs[0].text
	print(f'assistant >> {model_output}')
	messages.append({'role': 'assistant', 'content': model_output})
	```


	### Author
	International Business Machines (IBM).

	### Copyright
	International Business Machines (IBM).

	### Contact
	For further information, please send an email to <langtech@bsc.es>.

	### Acknowledgements
	We appreciate the collaboration with IBM in this work.
	Specifically, the IBM team created fp8-quantized version of the Salamandra-2b-instruct model released here.

	### Disclaimer
	Be aware that the model may contain biases or other unintended distortions.
	When third parties deploy systems or provide services based on this model, or use the model themselves,
	they bear the responsibility for mitigating any associated risks and ensuring compliance with applicable
	regulations, including those governing the use of Artificial Intelligence.

	Barcelona Supercomputing Center and International Business Machines shall
	not be held liable for any outcomes resulting from third-party use.

	### License
	[Apache License, Version 2.0](https://www.apache.org/licenses/LICENSE-2.0)