|
|
--- |
|
|
language: [pl] |
|
|
license: llama3.1 |
|
|
pipeline_tag: text-generation |
|
|
library_name: transformers |
|
|
tags: |
|
|
- llama |
|
|
- llama-3.1 |
|
|
- polish |
|
|
- grpo |
|
|
- reasoning |
|
|
- safetensors |
|
|
datasets: |
|
|
- openai/gsm8k |
|
|
base_model: CYFRAGOVPL/Llama-PLLuM-8B-instruct |
|
|
base_model_relation: finetune |
|
|
--- |
|
|
|
|
|
# Llama-PLLuM-8B-instruct-ArtexIT-reasoning |
|
|
|
|
|
**Built with Llama** |
|
|
|
|
|
This repository contains a GRPO fine‑tune of [`CYFRAGOVPL/Llama-PLLuM-8B-instruct`] trained on **GSM8K** (MIT). |
|
|
We publish both **Hugging Face (safetensors)** and **GGUF** artifacts (Q8_0, Q5_K_M) for use with `llama.cpp`. |
|
|
|
|
|
|
|
|
## What is this? |
|
|
- **Base**: Meta Llama 3.1 → PLLuM 8B Instruct (Polish) → GRPO fine‑tune (math / word problems). |
|
|
- **Context**: ~131k (based on GGUF header). |
|
|
- **Message format**: Llama `[INST] ... [/INST]` + explicit reasoning / answer tags (see below). |
|
|
- **Default chat template**: The tokenizer includes a default system instruction enforcing the two‑block format. |
|
|
|
|
|
|
|
|
## Prompt format |
|
|
|
|
|
The model expects Llama chat formatting and supports explicit tags: |
|
|
|
|
|
- **Reasoning**: `<think> ... </think>` |
|
|
- **Final answer**: `<answer> ... </answer>` |
|
|
|
|
|
**Example** |
|
|
```text |
|
|
[INST] Rozwiąż: 12 * 13 = ? [/INST] |
|
|
<think>12*13 = 156.</think> |
|
|
<answer>156</answer> |
|
|
``` |
|
|
|
|
|
## Quickstart |
|
|
|
|
|
### Transformers (PyTorch) |
|
|
|
|
|
```python |
|
|
import torch |
|
|
from transformers import AutoModelForCausalLM, AutoTokenizer |
|
|
|
|
|
repo = "ARTEXIT/Llama-PLLuM-8B-instruct-ArtexIT-reasoning" |
|
|
tok = AutoTokenizer.from_pretrained(repo, use_fast=True) |
|
|
model = AutoModelForCausalLM.from_pretrained(repo, torch_dtype="auto", device_map="auto") |
|
|
|
|
|
prompt = tok.apply_chat_template( |
|
|
[{"role": "user", "content": "Podaj 3 miasta w Polsce."}], |
|
|
add_generation_prompt=True, |
|
|
tokenize=False, |
|
|
) |
|
|
inputs = tok(prompt, return_tensors="pt").to(model.device) |
|
|
out = model.generate(**inputs, max_new_tokens=64) |
|
|
print(tok.decode(out[0], skip_special_tokens=False)) |
|
|
``` |
|
|
|
|
|
|
|
|
## Training (brief) |
|
|
|
|
|
- **Method**: GRPO (policy‑gradient reinforcement learning with multiple reward functions). |
|
|
- **Data**: `openai/gsm8k` — License: **MIT**. |
|
|
- **Goal**: consistent two‑block outputs (reasoning + final answer) using the training tags. |
|
|
|
|
|
|
|
|
## License & Attribution |
|
|
|
|
|
This repository contains derivatives of **Llama 3.1** and **PLLuM**: |
|
|
|
|
|
- **Llama 3.1 Community License** applies. When redistributing, you must: |
|
|
- include a copy of the license and **prominently display “Built with Llama”**, |
|
|
- include **“Llama” at the beginning of any distributed model’s name** if it was created, trained or fine‑tuned using Llama materials, |
|
|
- keep a **NOTICE** file with the following line: |
|
|
`Llama 3.1 is licensed under the Llama 3.1 Community License, Copyright © Meta Platforms, Inc. All Rights Reserved.` |
|
|
- comply with the **Acceptable Use Policy (AUP)**. |
|
|
- **PLLuM**: please cite the PLLuM work (see **Citation** below). |
|
|
- **Data**: GSM8K is MIT‑licensed; include dataset attribution. |
|
|
|
|
|
This repo includes: |
|
|
- `LICENSE` — full text of the **Llama 3.1 Community License** |
|
|
- `USE_POLICY.md` — pointer to the official **Acceptable Use Policy** |
|
|
- `NOTICE` — required Llama attribution line |
|
|
|
|
|
> If your (or your affiliates’) products exceeded **700M monthly active users** on the Llama 3.1 release date, you must obtain a separate license from Meta before exercising the rights in the Llama 3.1 license. |
|
|
|
|
|
|
|
|
## Citation |
|
|
|
|
|
If you use PLLuM in research or deployments, please cite: |
|
|
|
|
|
```bibtex |
|
|
@unpublished{pllum2025, |
|
|
title={PLLuM: A Family of Polish Large Language Models}, |
|
|
author={PLLuM Consortium}, |
|
|
year={2025} |
|
|
} |
|
|
``` |
|
|
|