Improve language tag

#1
by lbourdois - opened
Files changed (1) hide show
  1. README.md +174 -163
README.md CHANGED
@@ -1,164 +1,175 @@
1
- ---
2
- library_name: transformers
3
- license: apache-2.0
4
- datasets:
5
- - HoangHa/Pensez-v0.1
6
- language:
7
- - en
8
- - fr
9
- base_model:
10
- - Qwen/Qwen2.5-7B-Instruct
11
- ---
12
-
13
- <div align="center">
14
-
15
- # Pensez: Less Data, Better Reasoning – Rethinking French LLM
16
-
17
- [**About**](#about) | [**How to Run Locally**](#run-locally) | [**Models and Datasets**](#models-and-datasets) | [**Benchmarks**](#benchmarks) | [**Training Details**](#training-details)
18
-
19
- ![image/png](https://cdn-uploads.huggingface.co/production/uploads/630a5ef0e81e1dea2cedcec0/lbFwSuyLkixvcLWcMs7ZV.png)
20
- </div>
21
-
22
- ## About
23
-
24
- Pensez is a bilingual (French-English) reasoning model designed to maximize efficiency with significantly reduced training data. The model leverages a curated dataset focusing on daily reasoning tasks and scientific questions to enhance performance.
25
-
26
- Key strategies for improved reasoning:
27
- - **Concise reasoning** for simple tasks to prevent overthinking.
28
- - **Extended reasoning** for complex domains like mathematics, coding, and science.
29
- - **Special tokens (`<think>...</think>`)** to explicitly guide the model’s reasoning process.
30
-
31
- These optimizations result in superior reasoning capabilities while maintaining robust general understanding compared to models like [DeepSeek-R1-Distill-Qwen-7B](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-7B).
32
-
33
- ## Models and Datasets
34
-
35
- ### Model Versions
36
-
37
- Pensez is built upon [Qwen 2.5 Instruct 7B](https://huggingface.co/Qwen/Qwen2.5-7B-Instruct) and trained over five epochs.
38
-
39
- | Model | Backbone | Size | Download Link |
40
- |---------------|----------------------------------------|------|---------------|
41
- | Pensez-v0.1-e1 | Qwen2.5-7B-Instruct | 7B | [🤗 Pensez-v0.1-e1](https://huggingface.co/HoangHa/Pensez-v0.1-e1) |
42
- | Pensez-v0.1-e2 | Qwen2.5-7B-Instruct | 7B | [🤗 Pensez-v0.1-e2](https://huggingface.co/HoangHa/Pensez-v0.1-e2) |
43
- | Pensez-v0.1-e3 | Qwen2.5-7B-Instruct | 7B | [🤗 Pensez-v0.1-e3](https://huggingface.co/HoangHa/Pensez-v0.1-e3) |
44
- | Pensez-v0.1-e4 | Qwen2.5-7B-Instruct | 7B | [🤗 Pensez-v0.1-e4](https://huggingface.co/HoangHa/Pensez-v0.1-e4) |
45
- | Pensez-v0.1-e5 | Qwen2.5-7B-Instruct | 7B | [🤗 Pensez-v0.1-e5](https://huggingface.co/HoangHa/Pensez-v0.1-e5) |
46
-
47
- ### Dataset
48
-
49
- Pensez was trained on the hand-curated [Pensez v0.1](https://huggingface.co/datasets/HoangHa/Pensez-v0.1) dataset containing 2,000 samples (1,000 French, 1,000 English).
50
-
51
- | Dataset | Description | Size | Link |
52
- |--------------|----------------------|-------|-------|
53
- | Pensez v0.1 | SFT Training Dataset | 2K samples | [🤗 Pensez v0.1](https://huggingface.co/datasets/HoangHa/Pensez-v0.1) |
54
-
55
- ## Benchmarks
56
-
57
- Pensez was evaluated on French-specific benchmarks, demonstrating strong reasoning ability and improved task-specific performance:
58
-
59
- | Benchmark | Pensez-v0.1-e5 | DeepSeek-R1-Distill-Qwen-7B | Qwen2.5-7B-Instruct |
60
- |-----------|---------------|-----------------------------|----------------------|
61
- | Math-hard (fr) | 0.3458 | 0.3403 | 0.2253 |
62
- | MMLU (fr) | 0.5766 | 0.4961 | 0.6612 |
63
- | BoolQA (fr) | 0.9157 | 0.7079 | 0.9382 |
64
- | Trivia (en) | 0.4421 | 0.2711 | 0.5316 |
65
- | HellaSwag (en) | 0.5050 | 0.3540 | 0.5258 |
66
-
67
- **Key Observations:**
68
- - Pensez outperforms Qwen2.5-7B-Instruct in reasoning tasks.
69
- - Comparable to DeepSeek-R1-Distill-Qwen-7B in reasoning while maintaining strong understanding.
70
- - Reduced degradation in knowledge-based tasks.
71
-
72
- <details>
73
- <summary>Click for detailed benchmark results</summary>
74
-
75
- | Tasks | Pensez v0.1 e1 | Pensez v0.1 e2 | Pensez v0.1 e3 | Pensez v0.1 e4 | Pensez v0.1 e5 | Qwen 7B instruct | R1 distil |
76
- |------------------------------------------------|---------------|---------------|---------------|---------------|---------------|-----------------|-----------|
77
- | leaderboard_math_hard_fr | 0.0918 | 0.2547 | 0.2783 | 0.3035 | 0.3458 | 0.2253 | 0.3403 |
78
- | leaderboard_math_algebra_hard_fr | 0.1029 | 0.3914 | 0.3971 | 0.5114 | 0.5000 | 0.4229 | 0.4771 |
79
- | leaderboard_math_counting_and_prob_hard_fr | 0.0765 | 0.1378 | 0.1939 | 0.2041 | 0.2398 | 0.1224 | 0.2347 |
80
- | leaderboard_math_geometry_hard_fr | 0.0388 | 0.1019 | 0.1408 | 0.1359 | 0.1748 | 0.1019 | 0.2330 |
81
- | leaderboard_math_num_theory_hard_fr | 0.1198 | 0.2581 | 0.3502 | 0.3548 | 0.4332 | 0.3180 | 0.3963 |
82
- | leaderboard_math_prealgebra_hard_fr | 0.1681 | 0.4425 | 0.4690 | 0.4956 | 0.5841 | 0.3274 | 0.4867 |
83
- | leaderboard_math_precalculus_hard_fr | 0.0357 | 0.0714 | 0.1190 | 0.1190 | 0.1429 | 0.0595 | 0.2143 |
84
- | leaderboard_mmlu_fr | 0.3806 | 0.3329 | - | - | 0.5766 | 0.6612 | 0.4961 |
85
- | french_bench_arc_challenge | 0.5047 | 0.5021 | 0.4919 | 0.4859 | 0.4842 | 0.5518 | 0.3447 |
86
- | french_bench_boolqa | 0.9326 | 0.9326 | 0.9326 | 0.9270 | 0.9157 | 0.9382 | 0.7079 |
87
- | french_bench_fquadv2 | 0.4325 | 0.4400 | 0.4412 | 0.4375 | 0.4387 | 0.4800 | 0.2988 |
88
- | french_bench_hellaswag | 0.4970 | 0.5055 | 0.5092 | 0.5058 | 0.5050 | 0.5258 | 0.3540 |
89
- | french_bench_trivia | 0.4763 | 0.4763 | 0.4553 | 0.4395 | 0.4421 | 0.5316 | 0.2711 |
90
-
91
- </details>
92
-
93
- ## Run Locally
94
-
95
- You can run Pensez using Hugging Face’s `transformers` library:
96
-
97
- ```python
98
- import torch
99
- from transformers import AutoTokenizer, AutoModelForCausalLM
100
-
101
- model_path = "HoangHa/Pensez-v0.1-e5"
102
-
103
- # Load tokenizer and model
104
- tokenizer = AutoTokenizer.from_pretrained(model_path)
105
- model = AutoModelForCausalLM.from_pretrained(
106
- model_path, torch_dtype=torch.float16, device_map="auto"
107
- )
108
-
109
- # Example input
110
- messages = [{"role": "user", "content": "Bonjour!"}]
111
- input_ids = tokenizer.apply_chat_template(messages, tokenize=True, add_generation_prompt=True, return_tensors='pt').to("cuda")
112
-
113
- generated_ids = model.generate(input_ids, max_new_tokens=2500, temperature=0.8, repetition_penalty=1.1, do_sample=True, eos_token_id=tokenizer.eos_token_id)
114
- response = tokenizer.decode(generated_ids[0], skip_special_tokens=True, clean_up_tokenization_space=True)
115
- print(f"Réponse: {response}")
116
- ```
117
-
118
- ## Training Details
119
-
120
- Pensez was trained with:
121
- - **Packing Inputs Without Cross-Contamination Attention** ([Reference](https://github.com/MeetKai/functionary/tree/main/functionary/train/packing))
122
- - **Liger Kernel** ([Reference](https://github.com/linkedin/Liger-Kernel))
123
- - **DeepSpeed 3** ([Reference](https://github.com/deepspeedai/DeepSpeed))
124
- - **NEFTune Noise** ([Reference](https://arxiv.org/abs/2310.05914)) for robustness.
125
-
126
- | **Parameter** | **Value** |
127
- |--------------|----------|
128
- | Epochs | 5 |
129
- | Global Batch Size | 200 |
130
- | Learning Rate | 1e-5 |
131
- | Scheduler | Cosine |
132
- | Optimizer | AdamW |
133
- | Warmup Ratio | 0.05 |
134
- | Weight Decay | 0.01 |
135
- | Max Sequence Length | 16,384 |
136
-
137
- More details: [Training Config]() | Loss curves: [Wandb](https://wandb.ai/hahuyhoanghhh41/llamafactory?nw=nwuserhahuyhoanghhh41)
138
-
139
- ## Citation
140
-
141
- ```bibtex
142
- @misc{dao2025alphamazeenhancinglargelanguage,
143
- title={Pensez: Less Data, Better Reasoning – Rethinking French LLM},
144
- author={Ha Huy Hoang},
145
- year={2025},
146
- archivePrefix={arXiv},
147
- primaryClass={cs.CL},
148
- url={},
149
- }
150
- ```
151
-
152
-
153
- ## Acknowledgement
154
-
155
- - [llama-factory](https://github.com/hiyouga/LLaMA-Factory)
156
- - [Deepseek R1](https://github.com/deepseek-ai/DeepSeek-R1)
157
- - [Qwen 2.5](https://github.com/QwenLM/Qwen2.5)
158
- - [NEFTune Noise](https://arxiv.org/abs/2310.05914)
159
- - [Packing Inputs Without Cross-Contamination Attention](https://github.com/MeetKai/functionary/tree/main/functionary/train/packing)
160
- - [Liger Kernel](https://github.com/linkedin/Liger-Kernel)
161
- - [Deepspeed](https://github.com/deepspeedai/DeepSpeed)
162
- - [lm-evaluation-harness](https://github.com/EleutherAI/lm-evaluation-harness)
163
- - [Hyperbolic](https://hyperbolic.xyz/)
 
 
 
 
 
 
 
 
 
 
 
164
  - [Modal](https://modal.com/)
 
1
+ ---
2
+ library_name: transformers
3
+ license: apache-2.0
4
+ datasets:
5
+ - HoangHa/Pensez-v0.1
6
+ language:
7
+ - zho
8
+ - eng
9
+ - fra
10
+ - spa
11
+ - por
12
+ - deu
13
+ - ita
14
+ - rus
15
+ - jpn
16
+ - kor
17
+ - vie
18
+ - tha
19
+ - ara
20
+ base_model:
21
+ - Qwen/Qwen2.5-7B-Instruct
22
+ ---
23
+
24
+ <div align="center">
25
+
26
+ # Pensez: Less Data, Better Reasoning – Rethinking French LLM
27
+
28
+ [**About**](#about) | [**How to Run Locally**](#run-locally) | [**Models and Datasets**](#models-and-datasets) | [**Benchmarks**](#benchmarks) | [**Training Details**](#training-details)
29
+
30
+ ![image/png](https://cdn-uploads.huggingface.co/production/uploads/630a5ef0e81e1dea2cedcec0/lbFwSuyLkixvcLWcMs7ZV.png)
31
+ </div>
32
+
33
+ ## About
34
+
35
+ Pensez is a bilingual (French-English) reasoning model designed to maximize efficiency with significantly reduced training data. The model leverages a curated dataset focusing on daily reasoning tasks and scientific questions to enhance performance.
36
+
37
+ Key strategies for improved reasoning:
38
+ - **Concise reasoning** for simple tasks to prevent overthinking.
39
+ - **Extended reasoning** for complex domains like mathematics, coding, and science.
40
+ - **Special tokens (`<think>...</think>`)** to explicitly guide the model’s reasoning process.
41
+
42
+ These optimizations result in superior reasoning capabilities while maintaining robust general understanding compared to models like [DeepSeek-R1-Distill-Qwen-7B](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-7B).
43
+
44
+ ## Models and Datasets
45
+
46
+ ### Model Versions
47
+
48
+ Pensez is built upon [Qwen 2.5 Instruct 7B](https://huggingface.co/Qwen/Qwen2.5-7B-Instruct) and trained over five epochs.
49
+
50
+ | Model | Backbone | Size | Download Link |
51
+ |---------------|----------------------------------------|------|---------------|
52
+ | Pensez-v0.1-e1 | Qwen2.5-7B-Instruct | 7B | [🤗 Pensez-v0.1-e1](https://huggingface.co/HoangHa/Pensez-v0.1-e1) |
53
+ | Pensez-v0.1-e2 | Qwen2.5-7B-Instruct | 7B | [🤗 Pensez-v0.1-e2](https://huggingface.co/HoangHa/Pensez-v0.1-e2) |
54
+ | Pensez-v0.1-e3 | Qwen2.5-7B-Instruct | 7B | [🤗 Pensez-v0.1-e3](https://huggingface.co/HoangHa/Pensez-v0.1-e3) |
55
+ | Pensez-v0.1-e4 | Qwen2.5-7B-Instruct | 7B | [🤗 Pensez-v0.1-e4](https://huggingface.co/HoangHa/Pensez-v0.1-e4) |
56
+ | Pensez-v0.1-e5 | Qwen2.5-7B-Instruct | 7B | [🤗 Pensez-v0.1-e5](https://huggingface.co/HoangHa/Pensez-v0.1-e5) |
57
+
58
+ ### Dataset
59
+
60
+ Pensez was trained on the hand-curated [Pensez v0.1](https://huggingface.co/datasets/HoangHa/Pensez-v0.1) dataset containing 2,000 samples (1,000 French, 1,000 English).
61
+
62
+ | Dataset | Description | Size | Link |
63
+ |--------------|----------------------|-------|-------|
64
+ | Pensez v0.1 | SFT Training Dataset | 2K samples | [🤗 Pensez v0.1](https://huggingface.co/datasets/HoangHa/Pensez-v0.1) |
65
+
66
+ ## Benchmarks
67
+
68
+ Pensez was evaluated on French-specific benchmarks, demonstrating strong reasoning ability and improved task-specific performance:
69
+
70
+ | Benchmark | Pensez-v0.1-e5 | DeepSeek-R1-Distill-Qwen-7B | Qwen2.5-7B-Instruct |
71
+ |-----------|---------------|-----------------------------|----------------------|
72
+ | Math-hard (fr) | 0.3458 | 0.3403 | 0.2253 |
73
+ | MMLU (fr) | 0.5766 | 0.4961 | 0.6612 |
74
+ | BoolQA (fr) | 0.9157 | 0.7079 | 0.9382 |
75
+ | Trivia (en) | 0.4421 | 0.2711 | 0.5316 |
76
+ | HellaSwag (en) | 0.5050 | 0.3540 | 0.5258 |
77
+
78
+ **Key Observations:**
79
+ - Pensez outperforms Qwen2.5-7B-Instruct in reasoning tasks.
80
+ - Comparable to DeepSeek-R1-Distill-Qwen-7B in reasoning while maintaining strong understanding.
81
+ - Reduced degradation in knowledge-based tasks.
82
+
83
+ <details>
84
+ <summary>Click for detailed benchmark results</summary>
85
+
86
+ | Tasks | Pensez v0.1 e1 | Pensez v0.1 e2 | Pensez v0.1 e3 | Pensez v0.1 e4 | Pensez v0.1 e5 | Qwen 7B instruct | R1 distil |
87
+ |------------------------------------------------|---------------|---------------|---------------|---------------|---------------|-----------------|-----------|
88
+ | leaderboard_math_hard_fr | 0.0918 | 0.2547 | 0.2783 | 0.3035 | 0.3458 | 0.2253 | 0.3403 |
89
+ | leaderboard_math_algebra_hard_fr | 0.1029 | 0.3914 | 0.3971 | 0.5114 | 0.5000 | 0.4229 | 0.4771 |
90
+ | leaderboard_math_counting_and_prob_hard_fr | 0.0765 | 0.1378 | 0.1939 | 0.2041 | 0.2398 | 0.1224 | 0.2347 |
91
+ | leaderboard_math_geometry_hard_fr | 0.0388 | 0.1019 | 0.1408 | 0.1359 | 0.1748 | 0.1019 | 0.2330 |
92
+ | leaderboard_math_num_theory_hard_fr | 0.1198 | 0.2581 | 0.3502 | 0.3548 | 0.4332 | 0.3180 | 0.3963 |
93
+ | leaderboard_math_prealgebra_hard_fr | 0.1681 | 0.4425 | 0.4690 | 0.4956 | 0.5841 | 0.3274 | 0.4867 |
94
+ | leaderboard_math_precalculus_hard_fr | 0.0357 | 0.0714 | 0.1190 | 0.1190 | 0.1429 | 0.0595 | 0.2143 |
95
+ | leaderboard_mmlu_fr | 0.3806 | 0.3329 | - | - | 0.5766 | 0.6612 | 0.4961 |
96
+ | french_bench_arc_challenge | 0.5047 | 0.5021 | 0.4919 | 0.4859 | 0.4842 | 0.5518 | 0.3447 |
97
+ | french_bench_boolqa | 0.9326 | 0.9326 | 0.9326 | 0.9270 | 0.9157 | 0.9382 | 0.7079 |
98
+ | french_bench_fquadv2 | 0.4325 | 0.4400 | 0.4412 | 0.4375 | 0.4387 | 0.4800 | 0.2988 |
99
+ | french_bench_hellaswag | 0.4970 | 0.5055 | 0.5092 | 0.5058 | 0.5050 | 0.5258 | 0.3540 |
100
+ | french_bench_trivia | 0.4763 | 0.4763 | 0.4553 | 0.4395 | 0.4421 | 0.5316 | 0.2711 |
101
+
102
+ </details>
103
+
104
+ ## Run Locally
105
+
106
+ You can run Pensez using Hugging Face’s `transformers` library:
107
+
108
+ ```python
109
+ import torch
110
+ from transformers import AutoTokenizer, AutoModelForCausalLM
111
+
112
+ model_path = "HoangHa/Pensez-v0.1-e5"
113
+
114
+ # Load tokenizer and model
115
+ tokenizer = AutoTokenizer.from_pretrained(model_path)
116
+ model = AutoModelForCausalLM.from_pretrained(
117
+ model_path, torch_dtype=torch.float16, device_map="auto"
118
+ )
119
+
120
+ # Example input
121
+ messages = [{"role": "user", "content": "Bonjour!"}]
122
+ input_ids = tokenizer.apply_chat_template(messages, tokenize=True, add_generation_prompt=True, return_tensors='pt').to("cuda")
123
+
124
+ generated_ids = model.generate(input_ids, max_new_tokens=2500, temperature=0.8, repetition_penalty=1.1, do_sample=True, eos_token_id=tokenizer.eos_token_id)
125
+ response = tokenizer.decode(generated_ids[0], skip_special_tokens=True, clean_up_tokenization_space=True)
126
+ print(f"Réponse: {response}")
127
+ ```
128
+
129
+ ## Training Details
130
+
131
+ Pensez was trained with:
132
+ - **Packing Inputs Without Cross-Contamination Attention** ([Reference](https://github.com/MeetKai/functionary/tree/main/functionary/train/packing))
133
+ - **Liger Kernel** ([Reference](https://github.com/linkedin/Liger-Kernel))
134
+ - **DeepSpeed 3** ([Reference](https://github.com/deepspeedai/DeepSpeed))
135
+ - **NEFTune Noise** ([Reference](https://arxiv.org/abs/2310.05914)) for robustness.
136
+
137
+ | **Parameter** | **Value** |
138
+ |--------------|----------|
139
+ | Epochs | 5 |
140
+ | Global Batch Size | 200 |
141
+ | Learning Rate | 1e-5 |
142
+ | Scheduler | Cosine |
143
+ | Optimizer | AdamW |
144
+ | Warmup Ratio | 0.05 |
145
+ | Weight Decay | 0.01 |
146
+ | Max Sequence Length | 16,384 |
147
+
148
+ More details: [Training Config]() | Loss curves: [Wandb](https://wandb.ai/hahuyhoanghhh41/llamafactory?nw=nwuserhahuyhoanghhh41)
149
+
150
+ ## Citation
151
+
152
+ ```bibtex
153
+ @misc{dao2025alphamazeenhancinglargelanguage,
154
+ title={Pensez: Less Data, Better Reasoning – Rethinking French LLM},
155
+ author={Ha Huy Hoang},
156
+ year={2025},
157
+ archivePrefix={arXiv},
158
+ primaryClass={cs.CL},
159
+ url={},
160
+ }
161
+ ```
162
+
163
+
164
+ ## Acknowledgement
165
+
166
+ - [llama-factory](https://github.com/hiyouga/LLaMA-Factory)
167
+ - [Deepseek R1](https://github.com/deepseek-ai/DeepSeek-R1)
168
+ - [Qwen 2.5](https://github.com/QwenLM/Qwen2.5)
169
+ - [NEFTune Noise](https://arxiv.org/abs/2310.05914)
170
+ - [Packing Inputs Without Cross-Contamination Attention](https://github.com/MeetKai/functionary/tree/main/functionary/train/packing)
171
+ - [Liger Kernel](https://github.com/linkedin/Liger-Kernel)
172
+ - [Deepspeed](https://github.com/deepspeedai/DeepSpeed)
173
+ - [lm-evaluation-harness](https://github.com/EleutherAI/lm-evaluation-harness)
174
+ - [Hyperbolic](https://hyperbolic.xyz/)
175
  - [Modal](https://modal.com/)