--- language: [pl] license: llama3.1 pipeline_tag: text-generation library_name: transformers tags: - llama - llama-3.1 - polish - grpo - reasoning - safetensors datasets: - openai/gsm8k base_model: CYFRAGOVPL/Llama-PLLuM-8B-instruct base_model_relation: finetune --- # Llama-PLLuM-8B-instruct-ArtexIT-reasoning **Built with Llama** This repository contains a GRPO fine‑tune of [`CYFRAGOVPL/Llama-PLLuM-8B-instruct`] trained on **GSM8K** (MIT). We publish both **Hugging Face (safetensors)** and **GGUF** artifacts (Q8_0, Q5_K_M) for use with `llama.cpp`. ## What is this? - **Base**: Meta Llama 3.1 → PLLuM 8B Instruct (Polish) → GRPO fine‑tune (math / word problems). - **Context**: ~131k (based on GGUF header). - **Message format**: Llama `[INST] ... [/INST]` + explicit reasoning / answer tags (see below). - **Default chat template**: The tokenizer includes a default system instruction enforcing the two‑block format. ## Prompt format The model expects Llama chat formatting and supports explicit tags: - **Reasoning**: ` ... ` - **Final answer**: ` ... ` **Example** ```text [INST] Rozwiąż: 12 * 13 = ? [/INST] 12*13 = 156. 156 ``` ## Quickstart ### Transformers (PyTorch) ```python import torch from transformers import AutoModelForCausalLM, AutoTokenizer repo = "ARTEXIT/Llama-PLLuM-8B-instruct-ArtexIT-reasoning" tok = AutoTokenizer.from_pretrained(repo, use_fast=True) model = AutoModelForCausalLM.from_pretrained(repo, torch_dtype="auto", device_map="auto") prompt = tok.apply_chat_template( [{"role": "user", "content": "Podaj 3 miasta w Polsce."}], add_generation_prompt=True, tokenize=False, ) inputs = tok(prompt, return_tensors="pt").to(model.device) out = model.generate(**inputs, max_new_tokens=64) print(tok.decode(out[0], skip_special_tokens=False)) ``` ## Training (brief) - **Method**: GRPO (policy‑gradient reinforcement learning with multiple reward functions). - **Data**: `openai/gsm8k` — License: **MIT**. - **Goal**: consistent two‑block outputs (reasoning + final answer) using the training tags. ## License & Attribution This repository contains derivatives of **Llama 3.1** and **PLLuM**: - **Llama 3.1 Community License** applies. When redistributing, you must: - include a copy of the license and **prominently display “Built with Llama”**, - include **“Llama” at the beginning of any distributed model’s name** if it was created, trained or fine‑tuned using Llama materials, - keep a **NOTICE** file with the following line: `Llama 3.1 is licensed under the Llama 3.1 Community License, Copyright © Meta Platforms, Inc. All Rights Reserved.` - comply with the **Acceptable Use Policy (AUP)**. - **PLLuM**: please cite the PLLuM work (see **Citation** below). - **Data**: GSM8K is MIT‑licensed; include dataset attribution. This repo includes: - `LICENSE` — full text of the **Llama 3.1 Community License** - `USE_POLICY.md` — pointer to the official **Acceptable Use Policy** - `NOTICE` — required Llama attribution line > If your (or your affiliates’) products exceeded **700M monthly active users** on the Llama 3.1 release date, you must obtain a separate license from Meta before exercising the rights in the Llama 3.1 license. ## Citation If you use PLLuM in research or deployments, please cite: ```bibtex @unpublished{pllum2025, title={PLLuM: A Family of Polish Large Language Models}, author={PLLuM Consortium}, year={2025} } ```