How to use from the
Use from the
Transformers library
# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="xupy21/ContextRL_Klear_AgentForge_8B")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)
# Load model directly
from transformers import AutoTokenizer, AutoModelForMultimodalLM

tokenizer = AutoTokenizer.from_pretrained("xupy21/ContextRL_Klear_AgentForge_8B")
model = AutoModelForMultimodalLM.from_pretrained("xupy21/ContextRL_Klear_AgentForge_8B")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))
Quick Links

ContextRL-Klear-AgentForge-8B

This is the agentic (long-horizon) model released with the paper Context-Aware RL for Agentic and Multimodal LLMs. It is fine-tuned from Klear-AgentForge-8B, a model specialized for complex agentic coding, using ContextRL, a context-aware reinforcement learning method that augments standard GRPO with an auxiliary context-selection objective to improve fine-grained context grounding in long-horizon agent trajectories.

Results

Across 5 long-horizon benchmarks (2 in-distribution agentic coding, 3 out-of-distribution), ContextRL improves over the standard GRPO baseline by +3.2 points on average, while improving every individual benchmark.

Benchmark Base RL (GRPO) ContextRL (Ours)
SWE-Bench Verified 26.6 28.0 30.2
SWE-Bench Lite 21.0 21.7 24.0
LiveCodeBench v6 21.7 22.3 24.0
LongBench v2 (Overall) 27.4 27.0 29.6
LongBench v2 (Long) 21.3 24.1 28.7
NIAH 68.3 65.5 71.3

Metrics: SWE-Bench Verified/Lite resolve rate (%), LiveCodeBench v6 solve rate (%), LongBench v2 accuracy (%), NIAH mean recall (%). On the long-context tasks (LongBench v2, NIAH) where standard outcome-based GRPO struggles or regresses, ContextRL surpasses both the base model and the RL baseline, demonstrating strong out-of-distribution generalization.

Usage

This model follows the same interface as its Klear-AgentForge-8B base and can be loaded with transformers. Training and evaluation code, data construction pipelines, and detailed configurations are available in the repository: 👉 https://github.com/xupy2003/ContextAwareRL Please refer to the repo's README for environment setup, inference scripts, and reproduction instructions.

Downloads last month
16
Safetensors
Model size
8B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Collection including xupy21/ContextRL_Klear_AgentForge_8B