ARTEXIT
/

Llama-PLLuM-8B-instruct-ArtexIT-reasoning

Text Generation

text-generation-inference

Model card Files Files and versions

Llama-PLLuM-8B-instruct-ArtexIT-reasoning / README.md

ARTEXIT's picture

Add README front-matter metadata

52deb46 verified 4 months ago

|

history blame contribute delete

3.56 kB

	---
	language: [pl]
	license: llama3.1
	pipeline_tag: text-generation
	library_name: transformers
	tags:
	- llama
	- llama-3.1
	- polish
	- grpo
	- reasoning
	- safetensors
	datasets:
	- openai/gsm8k
	base_model: CYFRAGOVPL/Llama-PLLuM-8B-instruct
	base_model_relation: finetune
	---

	# Llama-PLLuM-8B-instruct-ArtexIT-reasoning

	Built with Llama

	This repository contains a GRPO fine‑tune of [`CYFRAGOVPL/Llama-PLLuM-8B-instruct`] trained on GSM8K (MIT).
	We publish both Hugging Face (safetensors) and GGUF artifacts (Q8_0, Q5_K_M) for use with `llama.cpp`.


	## What is this?
	- Base: Meta Llama 3.1 → PLLuM 8B Instruct (Polish) → GRPO fine‑tune (math / word problems).
	- Context: ~131k (based on GGUF header).
	- Message format: Llama `[INST] ... [/INST]` + explicit reasoning / answer tags (see below).
	- Default chat template: The tokenizer includes a default system instruction enforcing the two‑block format.


	## Prompt format

	The model expects Llama chat formatting and supports explicit tags:

	- Reasoning: `<think> ... </think>`
	- Final answer: `<answer> ... </answer>`

	Example
	```text
	[INST] Rozwiąż: 12 * 13 = ? [/INST]
	<think>12*13 = 156.</think>
	<answer>156</answer>
	```

	## Quickstart

	### Transformers (PyTorch)

	```python
	import torch
	from transformers import AutoModelForCausalLM, AutoTokenizer

	repo = "ARTEXIT/Llama-PLLuM-8B-instruct-ArtexIT-reasoning"
	tok = AutoTokenizer.from_pretrained(repo, use_fast=True)
	model = AutoModelForCausalLM.from_pretrained(repo, torch_dtype="auto", device_map="auto")

	prompt = tok.apply_chat_template(
	[{"role": "user", "content": "Podaj 3 miasta w Polsce."}],
	add_generation_prompt=True,
	tokenize=False,
	)
	inputs = tok(prompt, return_tensors="pt").to(model.device)
	out = model.generate(**inputs, max_new_tokens=64)
	print(tok.decode(out[0], skip_special_tokens=False))
	```


	## Training (brief)

	- Method: GRPO (policy‑gradient reinforcement learning with multiple reward functions).
	- Data: `openai/gsm8k` — License: MIT.
	- Goal: consistent two‑block outputs (reasoning + final answer) using the training tags.


	## License & Attribution

	This repository contains derivatives of Llama 3.1 and PLLuM:

	- Llama 3.1 Community License applies. When redistributing, you must:
	- include a copy of the license and prominently display “Built with Llama”,
	- include “Llama” at the beginning of any distributed model’s name if it was created, trained or fine‑tuned using Llama materials,
	- keep a NOTICE file with the following line:
	`Llama 3.1 is licensed under the Llama 3.1 Community License, Copyright © Meta Platforms, Inc. All Rights Reserved.`
	- comply with the Acceptable Use Policy (AUP).
	- PLLuM: please cite the PLLuM work (see Citation below).
	- Data: GSM8K is MIT‑licensed; include dataset attribution.

	This repo includes:
	- `LICENSE` — full text of the Llama 3.1 Community License
	- `USE_POLICY.md` — pointer to the official Acceptable Use Policy
	- `NOTICE` — required Llama attribution line

	> If your (or your affiliates’) products exceeded 700M monthly active users on the Llama 3.1 release date, you must obtain a separate license from Meta before exercising the rights in the Llama 3.1 license.


	## Citation

	If you use PLLuM in research or deployments, please cite:

	```bibtex
	@unpublished{pllum2025,
	title={PLLuM: A Family of Polish Large Language Models},
	author={PLLuM Consortium},
	year={2025}
	}
	```