Spaces:
Runtime error
Runtime error
| title: RLM Model Evaluation | |
| sdk: docker | |
| hardware: t4-small | |
| # RLM Model Evaluation | |
| Evaluates the trained needle-in-haystack model against the base model. | |
| ## Models | |
| - Base: Qwen/Qwen3-0.6B-Base | |
| - Trained: mindchain/qwen3-0.6b-rlm-needle | |
| ## Expected Results | |
| - Base: ~25% accuracy (random guessing) | |
| - Trained: 50-75% accuracy (after GRPO training) | |