Instructions to use ARTEXIT/Llama-PLLuM-8B-instruct-ArtexIT-reasoning with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use ARTEXIT/Llama-PLLuM-8B-instruct-ArtexIT-reasoning with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="ARTEXIT/Llama-PLLuM-8B-instruct-ArtexIT-reasoning")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("ARTEXIT/Llama-PLLuM-8B-instruct-ArtexIT-reasoning")
model = AutoModelForCausalLM.from_pretrained("ARTEXIT/Llama-PLLuM-8B-instruct-ArtexIT-reasoning")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use ARTEXIT/Llama-PLLuM-8B-instruct-ArtexIT-reasoning with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "ARTEXIT/Llama-PLLuM-8B-instruct-ArtexIT-reasoning"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "ARTEXIT/Llama-PLLuM-8B-instruct-ArtexIT-reasoning",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/ARTEXIT/Llama-PLLuM-8B-instruct-ArtexIT-reasoning

SGLang

How to use ARTEXIT/Llama-PLLuM-8B-instruct-ArtexIT-reasoning with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "ARTEXIT/Llama-PLLuM-8B-instruct-ArtexIT-reasoning" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "ARTEXIT/Llama-PLLuM-8B-instruct-ArtexIT-reasoning",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "ARTEXIT/Llama-PLLuM-8B-instruct-ArtexIT-reasoning" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "ARTEXIT/Llama-PLLuM-8B-instruct-ArtexIT-reasoning",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use ARTEXIT/Llama-PLLuM-8B-instruct-ArtexIT-reasoning with Docker Model Runner:
```
docker model run hf.co/ARTEXIT/Llama-PLLuM-8B-instruct-ArtexIT-reasoning
```
Browse Quantizations to use this model in llama.cpp, Ollama, LM Studio, or any compatible app.

Llama-PLLuM-8B-instruct-ArtexIT-reasoning

Built with Llama

This repository contains a GRPO fine‑tune of [CYFRAGOVPL/Llama-PLLuM-8B-instruct] trained on GSM8K (MIT). We publish both Hugging Face (safetensors) and GGUF artifacts (Q8_0, Q5_K_M) for use with llama.cpp.

What is this?

Base: Meta Llama 3.1 → PLLuM 8B Instruct (Polish) → GRPO fine‑tune (math / word problems).
Context: ~131k (based on GGUF header).
Message format: Llama [INST] ... [/INST] + explicit reasoning / answer tags (see below).
Default chat template: The tokenizer includes a default system instruction enforcing the two‑block format.

Prompt format

The model expects Llama chat formatting and supports explicit tags:

Reasoning: <think> ... </think>
Final answer: <answer> ... </answer>

Example

[INST] Rozwiąż: 12 * 13 = ? [/INST]
<think>12*13 = 156.</think>
<answer>156</answer>

Quickstart

Transformers (PyTorch)

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

repo = "ARTEXIT/Llama-PLLuM-8B-instruct-ArtexIT-reasoning"
tok = AutoTokenizer.from_pretrained(repo, use_fast=True)
model = AutoModelForCausalLM.from_pretrained(repo, torch_dtype="auto", device_map="auto")

prompt = tok.apply_chat_template(
    [{"role": "user", "content": "Podaj 3 miasta w Polsce."}],
    add_generation_prompt=True,
    tokenize=False,
)
inputs = tok(prompt, return_tensors="pt").to(model.device)
out = model.generate(**inputs, max_new_tokens=64)
print(tok.decode(out[0], skip_special_tokens=False))

Training (brief)

Method: GRPO (policy‑gradient reinforcement learning with multiple reward functions).
Data: openai/gsm8k — License: MIT.
Goal: consistent two‑block outputs (reasoning + final answer) using the training tags.

License & Attribution

This repository contains derivatives of Llama 3.1 and PLLuM:

Llama 3.1 Community License applies. When redistributing, you must:
- include a copy of the license and prominently display “Built with Llama”,
- include “Llama” at the beginning of any distributed model’s name if it was created, trained or fine‑tuned using Llama materials,
- keep a NOTICE file with the following line:
  Llama 3.1 is licensed under the Llama 3.1 Community License, Copyright © Meta Platforms, Inc. All Rights Reserved.
- comply with the Acceptable Use Policy (AUP).
PLLuM: please cite the PLLuM work (see Citation below).
Data: GSM8K is MIT‑licensed; include dataset attribution.

This repo includes:

LICENSE — full text of the Llama 3.1 Community License
USE_POLICY.md — pointer to the official Acceptable Use Policy
NOTICE — required Llama attribution line

If your (or your affiliates’) products exceeded 700M monthly active users on the Llama 3.1 release date, you must obtain a separate license from Meta before exercising the rights in the Llama 3.1 license.

Citation

If you use PLLuM in research or deployments, please cite:

@unpublished{pllum2025,
    title={PLLuM: A Family of Polish Large Language Models},
    author={PLLuM Consortium},
    year={2025}
}

Downloads last month: 4

Safetensors

Model size

8B params

Tensor type

F16

Model tree for ARTEXIT/Llama-PLLuM-8B-instruct-ArtexIT-reasoning

Base model

CYFRAGOVPL/Llama-PLLuM-8B-instruct-2412

Finetuned

(2)

this model

Quantizations

2 models

ARTEXIT
/

Llama-PLLuM-8B-instruct-ArtexIT-reasoning