ARTEXIT's picture
Add README front-matter metadata
52deb46 verified
---
language: [pl]
license: llama3.1
pipeline_tag: text-generation
library_name: transformers
tags:
- llama
- llama-3.1
- polish
- grpo
- reasoning
- safetensors
datasets:
- openai/gsm8k
base_model: CYFRAGOVPL/Llama-PLLuM-8B-instruct
base_model_relation: finetune
---
# Llama-PLLuM-8B-instruct-ArtexIT-reasoning
**Built with Llama**
This repository contains a GRPO fine‑tune of [`CYFRAGOVPL/Llama-PLLuM-8B-instruct`] trained on **GSM8K** (MIT).
We publish both **Hugging Face (safetensors)** and **GGUF** artifacts (Q8_0, Q5_K_M) for use with `llama.cpp`.
## What is this?
- **Base**: Meta Llama 3.1 → PLLuM 8B Instruct (Polish) → GRPO fine‑tune (math / word problems).
- **Context**: ~131k (based on GGUF header).
- **Message format**: Llama `[INST] ... [/INST]` + explicit reasoning / answer tags (see below).
- **Default chat template**: The tokenizer includes a default system instruction enforcing the two‑block format.
## Prompt format
The model expects Llama chat formatting and supports explicit tags:
- **Reasoning**: `<think> ... </think>`
- **Final answer**: `<answer> ... </answer>`
**Example**
```text
[INST] Rozwiąż: 12 * 13 = ? [/INST]
<think>12*13 = 156.</think>
<answer>156</answer>
```
## Quickstart
### Transformers (PyTorch)
```python
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
repo = "ARTEXIT/Llama-PLLuM-8B-instruct-ArtexIT-reasoning"
tok = AutoTokenizer.from_pretrained(repo, use_fast=True)
model = AutoModelForCausalLM.from_pretrained(repo, torch_dtype="auto", device_map="auto")
prompt = tok.apply_chat_template(
[{"role": "user", "content": "Podaj 3 miasta w Polsce."}],
add_generation_prompt=True,
tokenize=False,
)
inputs = tok(prompt, return_tensors="pt").to(model.device)
out = model.generate(**inputs, max_new_tokens=64)
print(tok.decode(out[0], skip_special_tokens=False))
```
## Training (brief)
- **Method**: GRPO (policy‑gradient reinforcement learning with multiple reward functions).
- **Data**: `openai/gsm8k` — License: **MIT**.
- **Goal**: consistent two‑block outputs (reasoning + final answer) using the training tags.
## License & Attribution
This repository contains derivatives of **Llama 3.1** and **PLLuM**:
- **Llama 3.1 Community License** applies. When redistributing, you must:
- include a copy of the license and **prominently display “Built with Llama”**,
- include **“Llama” at the beginning of any distributed model’s name** if it was created, trained or fine‑tuned using Llama materials,
- keep a **NOTICE** file with the following line:
`Llama 3.1 is licensed under the Llama 3.1 Community License, Copyright © Meta Platforms, Inc. All Rights Reserved.`
- comply with the **Acceptable Use Policy (AUP)**.
- **PLLuM**: please cite the PLLuM work (see **Citation** below).
- **Data**: GSM8K is MIT‑licensed; include dataset attribution.
This repo includes:
- `LICENSE` — full text of the **Llama 3.1 Community License**
- `USE_POLICY.md` — pointer to the official **Acceptable Use Policy**
- `NOTICE` — required Llama attribution line
> If your (or your affiliates’) products exceeded **700M monthly active users** on the Llama 3.1 release date, you must obtain a separate license from Meta before exercising the rights in the Llama 3.1 license.
## Citation
If you use PLLuM in research or deployments, please cite:
```bibtex
@unpublished{pllum2025,
title={PLLuM: A Family of Polish Large Language Models},
author={PLLuM Consortium},
year={2025}
}
```