Reverse Chinese Text (RL)
This model is a reinforcement learning fine-tuned version of ivanleomk/reverse-chinese-text (SFT model) trained on the task of reversing Chinese text character-by-character.
Training Pipeline
- Base Model: PrimeIntellect/Qwen3-0.6B
- SFT: Fine-tuned on ivanleomk/reverse-chinese-poems โ ivanleomk/reverse-chinese-text
- RL: Trained with GRPO using ivanleomk/chinese-text-reverse verifier environment
RL Training Details
- Method: GRPO (Group Relative Policy Optimization)
- Training Steps: 50
- Learning Rate: 3e-6
- Batch Size: 128
- Rollouts per Example: 16
- Framework: Prime-RL
Benchmark Results
Evaluated on 1,000 samples from the test set:
| Model | Character Accuracy | Exact Match Rate |
|---|---|---|
| PrimeIntellect/Qwen3-0.6B (base) | 0.11% | 0.00% |
| ivanleomk/reverse-chinese-text (SFT) | 63.55% | 9.60% |
| ivanleomk/reverse-chinese-text-rl (RL) | 59.26% | 8.70% |
Notes on RL Performance
The RL model showed slight regression from the SFT baseline. This is due to the binary (0/1) reward signal from the verifier - most rollouts received identical rewards, resulting in zero gradients for many training steps. Future improvements should use partial rewards (character-level accuracy) to provide denser learning signal.
Usage
from transformers import AutoModelForCausalLM, AutoTokenizer
model = AutoModelForCausalLM.from_pretrained("ivanleomk/reverse-chinese-text-rl")
tokenizer = AutoTokenizer.from_pretrained("ivanleomk/reverse-chinese-text-rl")
messages = [
{"role": "system", "content": "You are a text reversal assistant. Given Chinese text, reverse it character by character."},
{"role": "user", "content": "่ฏทๅ่ฝฌไปฅไธๆๅญ๏ผๅบๅๆๆๅ
"}
]
input_ids = tokenizer.apply_chat_template(messages, return_tensors="pt")
output = model.generate(input_ids, max_new_tokens=100)
print(tokenizer.decode(output[0], skip_special_tokens=True))
# Expected: ๅ
ๆๆๅๅบ
License
Apache 2.0
- Downloads last month
- 13
Model tree for ivanleomk/reverse-chinese-text-rl
Base model
Qwen/Qwen3-0.6B-Base
Finetuned
PrimeIntellect/Qwen3-0.6B
Finetuned
ivanleomk/reverse-chinese-text