Spaces:
Runtime error
Runtime error
File size: 356 Bytes
d9a11f8 11e046c d9a11f8 11e046c d9a11f8 11e046c | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 | ---
title: RLM Model Evaluation
sdk: docker
hardware: t4-small
---
# RLM Model Evaluation
Evaluates the trained needle-in-haystack model against the base model.
## Models
- Base: Qwen/Qwen3-0.6B-Base
- Trained: mindchain/qwen3-0.6b-rlm-needle
## Expected Results
- Base: ~25% accuracy (random guessing)
- Trained: 50-75% accuracy (after GRPO training)
|