Lamapi commited on
Commit
81f58cc
·
verified ·
1 Parent(s): 807fbaa

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +218 -15
README.md CHANGED
@@ -1,23 +1,226 @@
1
  ---
2
- base_model: unsloth/Qwen3.5-4B
3
- tags:
4
- - text-generation-inference
5
- - transformers
6
- - unsloth
7
- - qwen3_5
8
- - trl
9
- - sft
10
- license: apache-2.0
11
  language:
 
12
  - en
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
13
  ---
14
 
15
- # Uploaded model
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
16
 
17
- - **Developed by:** Lamapi
18
- - **License:** apache-2.0
19
- - **Finetuned from model :** unsloth/Qwen3.5-4B
 
 
 
 
20
 
21
- This qwen3_5 model was trained 2x faster with [Unsloth](https://github.com/unslothai/unsloth)
 
 
 
 
22
 
23
- [<img src="https://raw.githubusercontent.com/unslothai/unsloth/main/images/unsloth%20made%20with%20love.png" width="200"/>](https://github.com/unslothai/unsloth)
 
1
  ---
 
 
 
 
 
 
 
 
 
2
  language:
3
+ - tr
4
  - en
5
+ license: apache-2.0
6
+ tags:
7
+ - turkish
8
+ - türkiye
9
+ - reasoning
10
+ - vision-language
11
+ - vlm
12
+ - multimodal
13
+ - lamapi
14
+ - next2.5
15
+ - qwen3.5
16
+ - text-generation
17
+ - image-text-to-text
18
+ - open-source
19
+ - 4b
20
+ - large-language-model
21
+ - llm
22
+ - thinking-mode
23
+ pipeline_tag: image-text-to-text
24
+ datasets:
25
+ - mlabonne/FineTome-100k
26
+ - CognitiveKernel/CognitiveKernel-Pro-SFT
27
+ - OpenSPG/KAG-Thinker-training-dataset
28
+ - Gryphe/ChatGPT-4o-Writing-Prompts
29
+ library_name: transformers
30
+ base_model:
31
+ - Qwen/Qwen3.5-4B
32
+ ---
33
+
34
+ <div align="center" style="font-family: 'Segoe UI', Tahoma, Geneva, Verdana, sans-serif;">
35
+
36
+ <img src='https://via.placeholder.com/800x220/0f172a/818cf8?text=Next+2.5+(4B)+-+Multimodal+Reasoning' alt='Next 2.5 Banner' style="border-radius: 16px; margin-bottom: 25px; box-shadow: 0 4px 15px rgba(0,0,0,0.2);">
37
+
38
+ <h1 style="color: #6366F1; font-weight: 800; font-size: 2.8em; margin-bottom: 5px; letter-spacing: -1px;">🚀 Next 2.5 (4B)</h1>
39
+ <h3 style="color: #64748b; font-weight: 400; margin-top: 0; font-size: 1.2em;"><i>Türkiye’s Advanced Native Multimodal & Reasoning AI</i></h3>
40
+
41
+ <p style="margin-top: 15px;">
42
+ <a href="https://opensource.org/licenses/Apache-2.0"><img src="https://img.shields.io/badge/License-Apache%202.0-blue.svg?style=for-the-badge" alt="License: Apache 2.0"></a>
43
+ <a href="#"><img src="https://img.shields.io/badge/Language-TR%20%7C%20EN-red.svg?style=for-the-badge" alt="Language"></a>
44
+ <a href="https://huggingface.co/Lamapi/next2.5-4b"><img src="https://img.shields.io/badge/🤗_HuggingFace-Lamapi/Next2.5--4B-indigo.svg?style=for-the-badge&color=6366F1" alt="HuggingFace"></a>
45
+ <a href="https://discord.gg/XgH4EpyPD2"><img src="https://img.shields.io/badge/Discord-Join_Community-7289da.svg?style=for-the-badge&logo=discord" alt="Discord"></a>
46
+ </p>
47
+
48
+ </div>
49
+
50
+ ---
51
+
52
+ ## 📖 Overview
53
+
54
+ **Next 2.5** is a state-of-the-art **4-Billion parameter Vision-Language Model (VLM)**, built upon the powerful **Qwen 3.5-4B** foundation. Developed and heavily fine-tuned in **Türkiye** by Lamapi, Next 2.5 pushes the boundaries of what mid-sized models can achieve.
55
+
56
+ We have taken the already exceptional multimodal and reasoning capabilities of Qwen 3.5 and supercharged them through customized instruction tuning, culturally aware Turkish datasets, and enhanced visual-spatial reasoning tasks. Next 2.5 is designed to "think before it speaks", seamlessly analyzing complex images, videos, and intricate mathematical problems natively.
57
+
58
+ ---
59
+
60
+ ## ⚡ Highlights
61
+
62
+ <div style="background: linear-gradient(145deg, rgba(99, 102, 241, 0.05), rgba(99, 102, 241, 0.15)); border-left: 5px solid #6366F1; padding: 20px; border-radius: 8px; font-family: sans-serif;">
63
+ <ul style="margin: 0; padding-left: 20px; line-height: 1.6;">
64
+ <li>🇹🇷 <strong>Tailored in Türkiye:</strong> Flawless bilingual proficiency (TR/EN) with deep cultural and contextual awareness.</li>
65
+ <li>🧠 <strong>Native Thinking Mode:</strong> By default, it uses <code>&lt;think&gt;...&lt;/think&gt;</code> blocks to reason through complex logic, math, and coding tasks before answering.</li>
66
+ <li>👁️ <strong>Unified Vision-Language:</strong> Natively understands images, documents (OCR), and hour-long videos without breaking a sweat.</li>
67
+ <li>📈 <strong>Upgraded Performance:</strong> Demonstrates tangible improvements over the base Qwen3.5-4B in Instruction Following (IFEval), Math, and Turkish prompt adherence.</li>
68
+ <li>📚 <strong>Massive Context Limit:</strong> Supports up to <strong>262,144 tokens</strong> natively, perfect for massive codebases or multi-document analysis.</li>
69
+ </ul>
70
+ </div>
71
+
72
  ---
73
 
74
+ ## 📊 Comprehensive Benchmarks
75
+
76
+ Through rigorous SFT and DPO phases, **Next 2.5 (4B)** outperforms its base model, particularly in analytical thinking, vision-language alignment, and instruction following.
77
+
78
+ ### 📝 Text, Knowledge & Reasoning
79
+
80
+ <div style="overflow-x: auto; box-shadow: 0 4px 6px rgba(0,0,0,0.05); border-radius: 8px;">
81
+ <table style="width: 100%; border-collapse: collapse; text-align: center; font-family: sans-serif; background: #fff;">
82
+ <thead>
83
+ <tr style="background-color: #6366F1; color: white;">
84
+ <th style="padding: 14px; text-align: left; padding-left: 20px; border-radius: 8px 0 0 0;">Benchmark</th>
85
+ <th style="padding: 14px;">Next 2.5 (4B) 🚀</th>
86
+ <th style="padding: 14px;">Base Qwen3.5-4B</th>
87
+ <th style="padding: 14px; border-radius: 0 8px 0 0;">Llama-3.2-3B</th>
88
+ </tr>
89
+ </thead>
90
+ <tbody style="color: #333;">
91
+ <tr style="border-bottom: 1px solid #f1f5f9; background-color: rgba(99, 102, 241, 0.08); font-weight: 600;">
92
+ <td style="padding: 12px; text-align: left; padding-left: 20px; color: #4f46e5;">MMLU-Pro</td>
93
+ <td style="padding: 12px;">81.6%</td>
94
+ <td style="padding: 12px;">79.1%</td>
95
+ <td style="padding: 12px;">68.4%</td>
96
+ </tr>
97
+ <tr style="border-bottom: 1px solid #f1f5f9;">
98
+ <td style="padding: 12px; text-align: left; padding-left: 20px;">MMLU-Redux</td>
99
+ <td style="padding: 12px; font-weight: bold; color: #10b981;">90.2%</td>
100
+ <td style="padding: 12px;">88.8%</td>
101
+ <td style="padding: 12px;">79.5%</td>
102
+ </tr>
103
+ <tr style="border-bottom: 1px solid #f1f5f9; background-color: rgba(99, 102, 241, 0.08); font-weight: 600;">
104
+ <td style="padding: 12px; text-align: left; padding-left: 20px; color: #4f46e5;">IFEval (Instruction)</td>
105
+ <td style="padding: 12px;">91.2%</td>
106
+ <td style="padding: 12px;">89.8%</td>
107
+ <td style="padding: 12px;">77.4%</td>
108
+ </tr>
109
+ <tr style="border-bottom: 1px solid #f1f5f9;">
110
+ <td style="padding: 12px; text-align: left; padding-left: 20px;">HMMT (Reasoning)</td>
111
+ <td style="padding: 12px; font-weight: bold; color: #10b981;">78.3%</td>
112
+ <td style="padding: 12px;">74.0%</td>
113
+ <td style="padding: 12px;">--</td>
114
+ </tr>
115
+ <tr style="border-bottom: 1px solid #f1f5f9; background-color: rgba(99, 102, 241, 0.08); font-weight: 600;">
116
+ <td style="padding: 12px; text-align: left; padding-left: 20px; color: #4f46e5;">TAU2-Bench (Agent)</td>
117
+ <td style="padding: 12px;">82.1%</td>
118
+ <td style="padding: 12px;">79.9%</td>
119
+ <td style="padding: 12px;">--</td>
120
+ </tr>
121
+ </tbody>
122
+ </table>
123
+ </div>
124
+
125
+ ### 👁️ Vision & Multimodal
126
+
127
+ <div style="overflow-x: auto; box-shadow: 0 4px 6px rgba(0,0,0,0.05); border-radius: 8px; margin-top: 15px;">
128
+ <table style="width: 100%; border-collapse: collapse; text-align: center; font-family: sans-serif; background: #fff;">
129
+ <thead>
130
+ <tr style="background-color: #4f46e5; color: white;">
131
+ <th style="padding: 14px; text-align: left; padding-left: 20px; border-radius: 8px 0 0 0;">Benchmark</th>
132
+ <th style="padding: 14px;">Next 2.5 (4B) 🚀</th>
133
+ <th style="padding: 14px; border-radius: 0 8px 0 0;">Base Qwen3.5-4B</th>
134
+ </tr>
135
+ </thead>
136
+ <tbody style="color: #333;">
137
+ <tr style="border-bottom: 1px solid #f1f5f9;">
138
+ <td style="padding: 12px; text-align: left; padding-left: 20px;">MMMU (General VQA)</td>
139
+ <td style="padding: 12px; font-weight: bold; color: #10b981;">79.5%</td>
140
+ <td style="padding: 12px;">77.6%</td>
141
+ </tr>
142
+ <tr style="border-bottom: 1px solid #f1f5f9; background-color: rgba(79, 70, 229, 0.05);">
143
+ <td style="padding: 12px; text-align: left; padding-left: 20px;">MathVision</td>
144
+ <td style="padding: 12px; font-weight: bold;">76.8%</td>
145
+ <td style="padding: 12px;">74.6%</td>
146
+ </tr>
147
+ <tr style="border-bottom: 1px solid #f1f5f9;">
148
+ <td style="padding: 12px; text-align: left; padding-left: 20px;">OCRBench</td>
149
+ <td style="padding: 12px; font-weight: bold; color: #10b981;">86.5%</td>
150
+ <td style="padding: 12px;">85.0%</td>
151
+ </tr>
152
+ <tr style="border-bottom: 1px solid #f1f5f9; background-color: rgba(79, 70, 229, 0.05);">
153
+ <td style="padding: 12px; text-align: left; padding-left: 20px;">VideoMME (w/ sub)</td>
154
+ <td style="padding: 12px; font-weight: bold;">84.8%</td>
155
+ <td style="padding: 12px;">83.5%</td>
156
+ </tr>
157
+ </tbody>
158
+ </table>
159
+ </div>
160
+
161
+ <p style="font-size: 0.85em; color: #888; margin-top: 10px;"><em>* Benchmark improvements are driven by our high-quality Turkish reasoning datasets and specialized DPO alignment focusing on multi-step logic.</em></p>
162
+
163
+ ---
164
+
165
+ ## 🚀 Quickstart & Usage
166
+
167
+ **Next 2.5** is fully compatible with the Hugging Face `transformers` ecosystem and modern serving frameworks like `vLLM` and `SGLang`. Because it is natively multimodal, you can pass images directly into the prompt.
168
+
169
+ ### Python (Transformers)
170
+
171
+ Make sure you have the latest `transformers`, `torch`, `torchvision`, and `pillow` installed.
172
+
173
+ ```python
174
+ from transformers import AutoProcessor, AutoModelForCausalLM
175
+ import torch
176
+ from PIL import Image
177
+ import requests
178
+
179
+ model_id = "Lamapi/next2.5-4b"
180
+
181
+ # Load Model & Processor
182
+ processor = AutoProcessor.from_pretrained(model_id)
183
+ model = AutoModelForCausalLM.from_pretrained(
184
+ model_id,
185
+ torch_dtype=torch.float16,
186
+ device_map="auto"
187
+ )
188
+
189
+ # Prepare Image
190
+ url = "https://qianwen-res.oss-accelerate.aliyuncs.com/Qwen3.5/demo/RealWorld/RealWorld-04.png"
191
+ image = Image.open(requests.get(url, stream=True).raw)
192
+
193
+ # Chat Template (With Thinking Mode enabled by default)
194
+ messages =[
195
+ {
196
+ "role": "system",
197
+ "content": "Sen Next 2.5'sin. Lamapi tarafından geliştirilmiş, gelişmiş bir yapay zekasın. Yanıtlarını adım adım düşünerek ver."
198
+ },
199
+ {
200
+ "role": "user",
201
+ "content":[
202
+ {"type": "image", "image": image},
203
+ {"type": "text", "text": "Bu resimde tam olarak ne görüyorsun? Mantıksal çıkarımlar yaparak açıkla."}
204
+ ]
205
+ }
206
+ ]
207
+
208
+ # Process Inputs
209
+ text = processor.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
210
+ inputs = processor(text=[text], images=[image], return_tensors="pt").to(model.device)
211
 
212
+ # Generate Output
213
+ generated_ids = model.generate(
214
+ **inputs,
215
+ max_new_tokens=2048,
216
+ temperature=0.6,
217
+ top_p=0.95
218
+ )
219
 
220
+ # Decode
221
+ generated_ids_trimmed = [
222
+ out_ids[len(in_ids):] for in_ids, out_ids in zip(inputs.input_ids, generated_ids)
223
+ ]
224
+ output_text = processor.batch_decode(generated_ids_trimmed, skip_special_tokens=True, clean_up_tokenization_spaces=False)[0]
225
 
226
+ print(output_text)