Spaces:

mindchain
/

rlm-evaluation-test

Runtime error

Add README.md

11e046c verified about 2 months ago

356 Bytes

	---
	title: RLM Model Evaluation
	sdk: docker
	hardware: t4-small
	---

	# RLM Model Evaluation

	Evaluates the trained needle-in-haystack model against the base model.

	## Models
	- Base: Qwen/Qwen3-0.6B-Base
	- Trained: mindchain/qwen3-0.6b-rlm-needle

	## Expected Results
	- Base: ~25% accuracy (random guessing)
	- Trained: 50-75% accuracy (after GRPO training)