File size: 356 Bytes
d9a11f8
11e046c
d9a11f8
11e046c
d9a11f8
 
11e046c
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
---
title: RLM Model Evaluation
sdk: docker
hardware: t4-small
---

# RLM Model Evaluation

Evaluates the trained needle-in-haystack model against the base model.

## Models
- Base: Qwen/Qwen3-0.6B-Base
- Trained: mindchain/qwen3-0.6b-rlm-needle

## Expected Results
- Base: ~25% accuracy (random guessing)
- Trained: 50-75% accuracy (after GRPO training)