Spaces:

mindchain
/

rlm-evaluation-test

Runtime error

App Files Files Community

rlm-evaluation-test / README.md

mindchain's picture

Add README.md

11e046c verified about 1 month ago

|

history blame contribute delete

356 Bytes

title: RLM Model Evaluation
sdk: docker
hardware: t4-small

RLM Model Evaluation

Evaluates the trained needle-in-haystack model against the base model.

Models

Base: Qwen/Qwen3-0.6B-Base
Trained: mindchain/qwen3-0.6b-rlm-needle

Expected Results

Base: ~25% accuracy (random guessing)
Trained: 50-75% accuracy (after GRPO training)