Reverse Chinese Text (RL)

This model is a reinforcement learning fine-tuned version of ivanleomk/reverse-chinese-text (SFT model) trained on the task of reversing Chinese text character-by-character.

Training Pipeline

  1. Base Model: PrimeIntellect/Qwen3-0.6B
  2. SFT: Fine-tuned on ivanleomk/reverse-chinese-poems โ†’ ivanleomk/reverse-chinese-text
  3. RL: Trained with GRPO using ivanleomk/chinese-text-reverse verifier environment

RL Training Details

  • Method: GRPO (Group Relative Policy Optimization)
  • Training Steps: 50
  • Learning Rate: 3e-6
  • Batch Size: 128
  • Rollouts per Example: 16
  • Framework: Prime-RL

Benchmark Results

Evaluated on 1,000 samples from the test set:

Model Character Accuracy Exact Match Rate
PrimeIntellect/Qwen3-0.6B (base) 0.11% 0.00%
ivanleomk/reverse-chinese-text (SFT) 63.55% 9.60%
ivanleomk/reverse-chinese-text-rl (RL) 59.26% 8.70%

Notes on RL Performance

The RL model showed slight regression from the SFT baseline. This is due to the binary (0/1) reward signal from the verifier - most rollouts received identical rewards, resulting in zero gradients for many training steps. Future improvements should use partial rewards (character-level accuracy) to provide denser learning signal.

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained("ivanleomk/reverse-chinese-text-rl")
tokenizer = AutoTokenizer.from_pretrained("ivanleomk/reverse-chinese-text-rl")

messages = [
    {"role": "system", "content": "You are a text reversal assistant. Given Chinese text, reverse it character by character."},
    {"role": "user", "content": "่ฏทๅ่ฝฌไปฅไธ‹ๆ–‡ๅญ—๏ผšๅบŠๅ‰ๆ˜Žๆœˆๅ…‰"}
]

input_ids = tokenizer.apply_chat_template(messages, return_tensors="pt")
output = model.generate(input_ids, max_new_tokens=100)
print(tokenizer.decode(output[0], skip_special_tokens=True))
# Expected: ๅ…‰ๆœˆๆ˜Žๅ‰ๅบŠ

License

Apache 2.0

Downloads last month
13
Safetensors
Model size
0.8B params
Tensor type
BF16
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for ivanleomk/reverse-chinese-text-rl

Finetuned
(1)
this model