ivanleomk commited on
Commit
de91065
·
verified ·
1 Parent(s): 9431430

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +60 -0
README.md ADDED
@@ -0,0 +1,60 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ base_model: PrimeIntellect/Qwen3-0.6B
4
+ tags:
5
+ - text-generation
6
+ - chinese
7
+ - sft
8
+ - qwen3
9
+ datasets:
10
+ - ivanleomk/reverse-chinese-poems
11
+ language:
12
+ - zh
13
+ pipeline_tag: text-generation
14
+ ---
15
+
16
+ # Reverse Chinese Text (SFT)
17
+
18
+ This model is a fine-tuned version of [PrimeIntellect/Qwen3-0.6B](https://huggingface.co/PrimeIntellect/Qwen3-0.6B) trained on the task of reversing Chinese text character-by-character.
19
+
20
+ ## Training
21
+
22
+ - **Base Model:** PrimeIntellect/Qwen3-0.6B
23
+ - **Method:** Supervised Fine-Tuning (SFT)
24
+ - **Dataset:** [ivanleomk/reverse-chinese-poems](https://huggingface.co/datasets/ivanleomk/reverse-chinese-poems)
25
+ - **Training Steps:** 200
26
+ - **Learning Rate:** 2e-5
27
+ - **Batch Size:** 16
28
+ - **Framework:** [Prime-RL](https://github.com/PrimeIntellect-ai/prime-rl)
29
+
30
+ ## Benchmark Results
31
+
32
+ Evaluated on 1,000 samples from the test set:
33
+
34
+ | Model | Character Accuracy | Exact Match Rate |
35
+ |-------|-------------------|------------------|
36
+ | PrimeIntellect/Qwen3-0.6B (base) | 0.10% | 0.00% |
37
+ | **ivanleomk/reverse-chinese-text (SFT)** | **63.55%** | **9.60%** |
38
+
39
+ ## Usage
40
+
41
+ ```python
42
+ from transformers import AutoModelForCausalLM, AutoTokenizer
43
+
44
+ model = AutoModelForCausalLM.from_pretrained("ivanleomk/reverse-chinese-text")
45
+ tokenizer = AutoTokenizer.from_pretrained("ivanleomk/reverse-chinese-text")
46
+
47
+ messages = [
48
+ {"role": "system", "content": "You are a text reversal assistant. Given Chinese text, reverse it character by character."},
49
+ {"role": "user", "content": "请反转以下文字:床前明月光"}
50
+ ]
51
+
52
+ input_ids = tokenizer.apply_chat_template(messages, return_tensors="pt")
53
+ output = model.generate(input_ids, max_new_tokens=100)
54
+ print(tokenizer.decode(output[0], skip_special_tokens=True))
55
+ # Expected: 光月明前床
56
+ ```
57
+
58
+ ## License
59
+
60
+ Apache 2.0