Instructions to use ARTEXIT/Llama-PLLuM-8B-instruct-ArtexIT-reasoning with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use ARTEXIT/Llama-PLLuM-8B-instruct-ArtexIT-reasoning with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="ARTEXIT/Llama-PLLuM-8B-instruct-ArtexIT-reasoning") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("ARTEXIT/Llama-PLLuM-8B-instruct-ArtexIT-reasoning") model = AutoModelForCausalLM.from_pretrained("ARTEXIT/Llama-PLLuM-8B-instruct-ArtexIT-reasoning") messages = [ {"role": "user", "content": "Who are you?"}, ] inputs = tokenizer.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use ARTEXIT/Llama-PLLuM-8B-instruct-ArtexIT-reasoning with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "ARTEXIT/Llama-PLLuM-8B-instruct-ArtexIT-reasoning" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "ARTEXIT/Llama-PLLuM-8B-instruct-ArtexIT-reasoning", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/ARTEXIT/Llama-PLLuM-8B-instruct-ArtexIT-reasoning
- SGLang
How to use ARTEXIT/Llama-PLLuM-8B-instruct-ArtexIT-reasoning with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "ARTEXIT/Llama-PLLuM-8B-instruct-ArtexIT-reasoning" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "ARTEXIT/Llama-PLLuM-8B-instruct-ArtexIT-reasoning", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "ARTEXIT/Llama-PLLuM-8B-instruct-ArtexIT-reasoning" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "ARTEXIT/Llama-PLLuM-8B-instruct-ArtexIT-reasoning", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use ARTEXIT/Llama-PLLuM-8B-instruct-ArtexIT-reasoning with Docker Model Runner:
docker model run hf.co/ARTEXIT/Llama-PLLuM-8B-instruct-ArtexIT-reasoning
Llama-PLLuM-8B-instruct-ArtexIT-reasoning
Built with Llama
This repository contains a GRPO fine‑tune of [CYFRAGOVPL/Llama-PLLuM-8B-instruct] trained on GSM8K (MIT).
We publish both Hugging Face (safetensors) and GGUF artifacts (Q8_0, Q5_K_M) for use with llama.cpp.
What is this?
- Base: Meta Llama 3.1 → PLLuM 8B Instruct (Polish) → GRPO fine‑tune (math / word problems).
- Context: ~131k (based on GGUF header).
- Message format: Llama
[INST] ... [/INST]+ explicit reasoning / answer tags (see below). - Default chat template: The tokenizer includes a default system instruction enforcing the two‑block format.
Prompt format
The model expects Llama chat formatting and supports explicit tags:
- Reasoning:
<think> ... </think> - Final answer:
<answer> ... </answer>
Example
[INST] Rozwiąż: 12 * 13 = ? [/INST]
<think>12*13 = 156.</think>
<answer>156</answer>
Quickstart
Transformers (PyTorch)
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
repo = "ARTEXIT/Llama-PLLuM-8B-instruct-ArtexIT-reasoning"
tok = AutoTokenizer.from_pretrained(repo, use_fast=True)
model = AutoModelForCausalLM.from_pretrained(repo, torch_dtype="auto", device_map="auto")
prompt = tok.apply_chat_template(
[{"role": "user", "content": "Podaj 3 miasta w Polsce."}],
add_generation_prompt=True,
tokenize=False,
)
inputs = tok(prompt, return_tensors="pt").to(model.device)
out = model.generate(**inputs, max_new_tokens=64)
print(tok.decode(out[0], skip_special_tokens=False))
Training (brief)
- Method: GRPO (policy‑gradient reinforcement learning with multiple reward functions).
- Data:
openai/gsm8k— License: MIT. - Goal: consistent two‑block outputs (reasoning + final answer) using the training tags.
License & Attribution
This repository contains derivatives of Llama 3.1 and PLLuM:
- Llama 3.1 Community License applies. When redistributing, you must:
- include a copy of the license and prominently display “Built with Llama”,
- include “Llama” at the beginning of any distributed model’s name if it was created, trained or fine‑tuned using Llama materials,
- keep a NOTICE file with the following line:
Llama 3.1 is licensed under the Llama 3.1 Community License, Copyright © Meta Platforms, Inc. All Rights Reserved. - comply with the Acceptable Use Policy (AUP).
- PLLuM: please cite the PLLuM work (see Citation below).
- Data: GSM8K is MIT‑licensed; include dataset attribution.
This repo includes:
LICENSE— full text of the Llama 3.1 Community LicenseUSE_POLICY.md— pointer to the official Acceptable Use PolicyNOTICE— required Llama attribution line
If your (or your affiliates’) products exceeded 700M monthly active users on the Llama 3.1 release date, you must obtain a separate license from Meta before exercising the rights in the Llama 3.1 license.
Citation
If you use PLLuM in research or deployments, please cite:
@unpublished{pllum2025,
title={PLLuM: A Family of Polish Large Language Models},
author={PLLuM Consortium},
year={2025}
}
- Downloads last month
- 4