Instructions to use LumiOpen/Poro-34B-chat with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use LumiOpen/Poro-34B-chat with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="LumiOpen/Poro-34B-chat")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("LumiOpen/Poro-34B-chat")
model = AutoModelForCausalLM.from_pretrained("LumiOpen/Poro-34B-chat")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use LumiOpen/Poro-34B-chat with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "LumiOpen/Poro-34B-chat"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "LumiOpen/Poro-34B-chat",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/LumiOpen/Poro-34B-chat

SGLang

How to use LumiOpen/Poro-34B-chat with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "LumiOpen/Poro-34B-chat" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "LumiOpen/Poro-34B-chat",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "LumiOpen/Poro-34B-chat" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "LumiOpen/Poro-34B-chat",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use LumiOpen/Poro-34B-chat with Docker Model Runner:
```
docker model run hf.co/LumiOpen/Poro-34B-chat
```

Problem with random output and RAG usage and

by pesonen - opened Sep 12, 2024

Discussion

pesonen

Sep 12, 2024

•

edited Sep 12, 2024

Hi, we have two separate issues.

for some Finnish inputs we get some random EU (?) texts as output:

<|user|>Käännä suomeksi: The Commission shall be empowered to adopt delegated acts in accordance with Article 21 to supplement this Regulation by establishing the detailed rules for the application of the provisions of this Chapter. <|assistant|>Siirretään komissiolle valta antaa 21 artiklan mukaisesti delegoituja säädöksiä, joilla täydennetään tätä asetusta vahvistamalla tämän luvun säännösten soveltamista koskevat yksityiskohtaiset säännöt.

Osakkaalla on lähtökohtaisesti oikeus tehdä muutostöitä hallitsemallaan parvekkeella. Muutostyöoikeus on kuitenkin käytännössä rajoitetumpi kuin huoneiston sisäpuolella. Muutostyötä rajoittaa ensinnäkin se, että muutoksesta ei saa aiheutua haittaa tai vahinkoa muille osakkeenomistajille tai yhtiölle.

Toiseksi muutos ei saa haitata rakennuksen julkisivun esteettistä ilmettä tai tuottaa naapureille haittaa, vaikka samat asennukset huoneiston sisälle toteutettuina eivät haittaisikaan. Osakkaalla ei automaattisesti ole oikeutta esimerkiksi asentaa isoa lautasantennia parvekkeen kaiteelle, koska se vaikuttaa rakennuksen julkisivun ilmeeseen. Sama koskee myös osakkaan asentamia markiiseja tai vastaavia.

Osakas voi kuitenkin tehdä välttämättömiä kunnossapitotöitä parvekkeella ilman muiden osakkeenomistajien suostumusta. Lisäksi yhtiöjärjestyksessä voidaan määrätä toisin parvekehallintaoikeudesta ja -kunnossapitovastuusta.

for RAG, we are trying to instruct the model to only use information in the context, but that is ignored. Exactly same system works with Llama3.1 in Finnish, for example. One version of system prompts we have tried in Finnish and English also:
Olet avulias assistentti.

Vastaa kysymyksiin lyhyesti.
Käytä vain tägien ja sisällä olevaa tietoa vastaamiseen.
Älä koskaan keksi vastausta.
Jos tietoa ei löydy, sano, että et tiedä vastausta.

We have tried different quentized versions of the model with Ollama (GGUF & minimal modelfile) and transformers library.

jonabur

LumiOpen org Sep 12, 2024

for #1 are you using the chatml formatting to prompt the model? can you provide more information here on exactly what you're sending?

for #2 unfortunately Poro's instruction tuning is pretty basic compared to Llama 3.1. you might have more luck using a few shot prompting approach if that's possible for your use case.

pesonen

Sep 13, 2024

•

edited Sep 13, 2024

Sorry now I see that this "random" output is some very old history from our previous experiments. I'll have to check what is going on with Ollama & Open WebUI combo.
Thanks we'll try giving some examples

pesonen

Sep 24, 2024

Followup question: we sometimes get tags in output when using Ollama. This is with minimal modelfile, just FROM clause. We also tried using bloom template
[INST] {{ .Prompt }} [/INST]
but Poro didn't like that. Is there some template that should be used with Ollama server for optimal output?

jonabur

LumiOpen org Sep 25, 2024

I don't know anything about Ollama, but the chat templating should be ChatML, e.g.

jonabur

LumiOpen org Sep 25, 2024

Also, is there any chance you're using the base model instead of the chat model?

pesonen

Sep 25, 2024

Also, is there any chance you're using the base model instead of the chat model?

We are using poro-34b-chat.Q5_K_M.gguf

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment