Dargk — Llama-3.2-3B-instruct GRPO LoRA (β=0.05)

LoRA adapter fine-tuned from meta-llama/Llama-3.2-3B-instruct using GRPO (Group Relative Policy Optimization) with a KL-divergence penalty of β=0.05.

This model was developed as part of the Dargk team's submission to the Voight-Kampff task at ELOQUENT Lab 2026, CLEF 2026. The task asks: can text generated by a language model be distinguished from text written by a human? Systems are scored by how often their outputs fool an AI-detection classifier into believing they are human-authored.

Model Details

  • Developed by: Dargk Team — Antonela Tommasel & Juan Manuel Rodriguez
  • Base model: meta-llama/Llama-3.2-3B-instruct
  • Model type: Causal LM — Fine-tuned, decoder-only transformer, 3B parameters
  • Language: English
  • License: Llama 3.2 Community License
  • Task: Text generation with human-like stylistic properties

Training

Objective

The model was fine-tuned to generate text that is classified as human-written by an AI-detection classifier. The reward signal is 1 − p(AI), where p(AI) is the probability assigned by Mdok2 — our fine-tuned AI-detection classifier (described below) — that a generated text is AI-authored. This is not RLHF: there is no human feedback. The signal comes entirely from Mdok2, which was itself trained on a labeled corpus of human-written and AI-generated text.

Reward model — Mdok2

Mdok2 is a binary sequence classifier (human-written vs. AI-generated) built on FacebookAI/roberta-large (355M parameters, encoder-only), fine-tuned with LoRA (r=64, α=16, dropout=0.1) on the PAN25 AI-generated text detection dataset (Task 1). It is inspired by but distinct from the original Mdok system. Text is preprocessed before classification: lowercased, with emails, @-mentions, and phone numbers replaced by placeholder tokens.

Training data

Prompts were drawn from the Voight-Kampff task datasets for 2024, 2025, and 2026. Each prompt combines the task's suggested base prompt, a Content field (bullet-point description of a ~500-word text), and a Genre and Style field.

Training configuration

Parameter Value
Algorithm GRPO (TRL)
KL penalty β 0.05
GRPO group size G 8
Epochs 10
Learning rate 5e-5
Batch size 1 (grad. accum. 4)
Max completion length 1000 tokens

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

base_model_id = "meta-llama/Llama-3.2-3B-instruct"
model_id = "jmrodri/Llama-3.2_voight-kampff_beta_005"

tokenizer = AutoTokenizer.from_pretrained(base_model_id)
tokenizer.pad_token = tokenizer.eos_token

model = AutoModelForCausalLM.from_pretrained(model_id, torch_dtype="auto", device_map="auto")
model.eval()

prompt = "Write a text of about 500 words which covers the following items: ..."

chat = [
    {"role": "system", "content": "You are a helpful assistant that generates helpful answers. "
                                  "You will avoid pleasantries and small talk, focusing on the task at hand."},
    {"role": "system", "content": "You will avoid short paragraphs and bullet points."},
    {"role": "user", "content": prompt},
    {"role": "assistant", "content": ""},
]

inputs = tokenizer.apply_chat_template(chat, return_tensors="pt").to(model.device)

with torch.no_grad():
    outputs = model.generate(
        **inputs,
        max_new_tokens=600,
        do_sample=True,
        temperature=0.8,
        top_p=0.9,
    )

decoded = tokenizer.decode(outputs[0], skip_special_tokens=True)

Intended use

This model was developed for participation in the ELOQUENT Lab 2026 Voight-Kampff shared task. It is intended for research into generative text quality, human-likeness evaluation, and AI-detection robustness.

For more information, see the repository Darkg Eloquent 2026.

Contact

Dargk Team

Downloads last month
41
Safetensors
Model size
3B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support