File size: 2,227 Bytes
af32841
 
7c49c9d
 
 
 
 
 
 
 
 
 
 
 
 
af32841
7c49c9d
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
---
license: apache-2.0
base_model:
  - Klear-AgentForge-8B
pipeline_tag: text-generation
library_name: transformers
tags:
  - agentic
  - code
  - software-engineering
  - reinforcement-learning
  - grpo
  - context-aware
  - long-context

---

# ContextRL-Klear-AgentForge-8B

This is the **agentic (long-horizon) model** released with the paper
**Context-Aware RL for Agentic and Multimodal LLMs**.
It is fine-tuned from **Klear-AgentForge-8B**, a model specialized for complex agentic
coding, using **ContextRL**, a context-aware reinforcement learning method that augments
standard GRPO with an auxiliary *context-selection* objective to improve fine-grained
context grounding in long-horizon agent trajectories.

## Results

Across 5 long-horizon benchmarks (2 in-distribution agentic coding, 3 out-of-distribution),
ContextRL improves over the standard GRPO baseline by **+3.2 points** on average, while
improving every individual benchmark.

| Benchmark              | Base | RL (GRPO) | ContextRL (Ours) |
| ---------------------- | ---- | --------- | ---------------- |
| SWE-Bench Verified     | 26.6 | 28.0      | **30.2**         |
| SWE-Bench Lite         | 21.0 | 21.7      | **24.0**         |
| LiveCodeBench v6       | 21.7 | 22.3      | **24.0**         |
| LongBench v2 (Overall) | 27.4 | 27.0      | **29.6**         |
| LongBench v2 (Long)    | 21.3 | 24.1      | **28.7**         |
| NIAH                   | 68.3 | 65.5      | **71.3**         |

*Metrics: SWE-Bench Verified/Lite resolve rate (%), LiveCodeBench v6 solve rate (%), LongBench v2 accuracy (%), NIAH mean recall (%).* On the long-context tasks (LongBench v2, NIAH) where standard outcome-based GRPO struggles or regresses, ContextRL surpasses both the base model and the RL baseline, demonstrating strong out-of-distribution generalization.

## Usage

This model follows the same interface as its Klear-AgentForge-8B base and can be loaded
with `transformers`. Training and evaluation code, data construction pipelines, and
detailed configurations are available in the repository:
👉 **https://github.com/xupy2003/ContextAwareRL**
Please refer to the repo's README for environment setup, inference scripts, and
reproduction instructions.