Lamapi commited on
Commit
c09e935
·
verified ·
1 Parent(s): f05c875

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +308 -3
README.md CHANGED
@@ -1,3 +1,308 @@
1
- ---
2
- license: apache-2.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language:
3
+ - tr
4
+ - en
5
+ - de
6
+ - es
7
+ - fr
8
+ - ru
9
+ - zh
10
+ - ja
11
+ - ko
12
+ license: apache-2.0
13
+ tags:
14
+ - turkish
15
+ - türkiye
16
+ - reasoning
17
+ - vision-language
18
+ - vlm
19
+ - multimodal
20
+ - lamapi
21
+ - next2.5
22
+ - qwen3.5
23
+ - gemma-3
24
+ - text-generation
25
+ - image-text-to-text
26
+ - open-source
27
+ - 4b
28
+ - edge-ai
29
+ - large-language-model
30
+ - llm
31
+ - thinking-mode
32
+ pipeline_tag: image-text-to-text
33
+ datasets:
34
+ - mlabonne/FineTome-100k
35
+ - CognitiveKernel/CognitiveKernel-Pro-SFT
36
+ - OpenSPG/KAG-Thinker-training-dataset
37
+ - Gryphe/ChatGPT-4o-Writing-Prompts
38
+ library_name: transformers
39
+ base_model:
40
+ - thelamapi/next2.5
41
+ ---
42
+
43
+ <div align="center" style="font-family: 'Segoe UI', Tahoma, Geneva, Verdana, sans-serif;">
44
+
45
+
46
+ ![next2ultra](https://cdn-uploads.huggingface.co/production/uploads/67d46bc5fe6ad6f6511d6f44/nkLUtS6XkY02YMfiSASTu.png)
47
+
48
+ <h1 style="color: #6366F1; font-weight: 800; font-size: 2.8em; margin-bottom: 5px; letter-spacing: -1px;">🚀 Next 2.5 (4B)</h1>
49
+ <h3 style="color: #64748b; font-weight: 400; margin-top: 0; font-size: 1.2em;"><i>Türkiye’s Advanced Native Multimodal & Reasoning AI</i></h3>
50
+
51
+ <p style="margin-top: 15px;">
52
+ <a href="https://opensource.org/licenses/Apache-2.0"><img src="https://img.shields.io/badge/License-Apache%202.0-blue.svg?style=for-the-badge" alt="License: Apache 2.0"></a>
53
+ <a href="#"><img src="https://img.shields.io/badge/Language-TR%20%7C%20EN-red.svg?style=for-the-badge" alt="Language"></a>
54
+ <a href="https://huggingface.co/Lamapi/next2.5-4b"><img src="https://img.shields.io/badge/🤗_HuggingFace-Lamapi/Next2.5--4B-indigo.svg?style=for-the-badge&color=6366F1" alt="HuggingFace"></a>
55
+ <a href="https://discord.gg/XgH4EpyPD2"><img src="https://cdn-uploads.huggingface.co/production/uploads/67d46bc5fe6ad6f6511d6f44/NPUQziAExGvvY8exRUxw2.png" alt="Discord"></a>
56
+ </p>
57
+
58
+ </div>
59
+
60
+ ---
61
+
62
+ ## 📖 Overview
63
+
64
+ **Next 2.5** is a state-of-the-art **4-Billion parameter Vision-Language Model (VLM)**, built upon the powerful **Qwen 3.5-4B** foundation. Developed and heavily fine-tuned in **Türkiye** by Lamapi, Next 2.5 pushes the boundaries of what mid-sized models can achieve in 2026.
65
+
66
+ We have taken the already exceptional multimodal and reasoning capabilities of the base model and supercharged them through customized instruction tuning, culturally aware Turkish datasets, and enhanced visual-spatial reasoning tasks. Next 2.5 is designed to "think before it speaks", seamlessly analyzing complex images, videos, and intricate mathematical problems natively.
67
+
68
+ ---
69
+
70
+ ## ⚡ Highlights
71
+
72
+ <div style="background: linear-gradient(145deg, rgba(99, 102, 241, 0.05), rgba(99, 102, 241, 0.15)); border-left: 5px solid #6366F1; padding: 20px; border-radius: 8px; font-family: sans-serif;">
73
+ <ul style="margin: 0; padding-left: 20px; line-height: 1.6;">
74
+ <li>🇹🇷 <strong>Tailored in Türkiye:</strong> Flawless bilingual proficiency (TR/EN) with deep cultural and contextual awareness.</li>
75
+ <li>🧠 <strong>Native Thinking Mode:</strong> By default, it uses <code>&lt;think&gt;...&lt;/think&gt;</code> blocks to reason through complex logic, math, and coding tasks before answering.</li>
76
+ <li>👁️ <strong>Unified Vision-Language:</strong> Natively understands images, documents (OCR), and hour-long videos without breaking a sweat.</li>
77
+ <li>📈 <strong>Class-Leading Performance:</strong> Outperforms heavyweights in its parameter class (Gemma-3-4B, Phi-4-Mini) and even rivals closed-source edge models like GPT-5-Nano.</li>
78
+ <li>📚 <strong>Massive Context Limit:</strong> Supports up to <strong>262,144 tokens</strong> natively, perfect for massive codebases or multi-document analysis.</li>
79
+ </ul>
80
+ </div>
81
+
82
+ ---
83
+
84
+ ## 📊 Comprehensive Benchmarks
85
+
86
+ Through rigorous SFT and DPO phases, **Next 2.5 (4B)** sets a new standard for the ~4B parameter weight class. It consistently outperforms modern edge models and punches far above its weight, rivaling 8B-11B models in vision and reasoning.
87
+
88
+ ### 📝 Text, Knowledge & Reasoning (Sub-5B Class)
89
+
90
+ <div style="overflow-x: auto; box-shadow: 0 4px 6px rgba(0,0,0,0.05); border-radius: 8px;">
91
+ <table style="width: 100%; border-collapse: collapse; text-align: center; font-family: sans-serif; background: #fff; min-width: 800px;">
92
+ <thead>
93
+ <tr style="background-color: #6366F1; color: white;">
94
+ <th style="padding: 14px; text-align: left; padding-left: 20px; border-radius: 8px 0 0 0;">Benchmark</th>
95
+ <th style="padding: 14px; font-size: 1.1em;">Next 2.5 (4B) 🚀</th>
96
+ <th style="padding: 14px;">Qwen 3.5 (4B)</th>
97
+ <th style="padding: 14px;">Gemma-3 (4B)</th>
98
+ <th style="padding: 14px;">Phi-4-Mini (3.8B)</th>
99
+ <th style="padding: 14px; border-radius: 0 8px 0 0;">Llama-3.2 (3B)</th>
100
+ </tr>
101
+ </thead>
102
+ <tbody style="color: #333;">
103
+ <tr style="border-bottom: 1px solid #f1f5f9; background-color: rgba(99, 102, 241, 0.08); font-weight: 600;">
104
+ <td style="padding: 12px; text-align: left; padding-left: 20px; color: #4f46e5;">MMLU-Pro</td>
105
+ <td style="padding: 12px; color: #10b981;">81.6%</td>
106
+ <td style="padding: 12px;">79.1%</td>
107
+ <td style="padding: 12px;">76.5%</td>
108
+ <td style="padding: 12px;">78.2%</td>
109
+ <td style="padding: 12px;">68.4%</td>
110
+ </tr>
111
+ <tr style="border-bottom: 1px solid #f1f5f9;">
112
+ <td style="padding: 12px; text-align: left; padding-left: 20px;">MMLU-Redux</td>
113
+ <td style="padding: 12px; font-weight: bold; color: #10b981;">90.2%</td>
114
+ <td style="padding: 12px;">88.8%</td>
115
+ <td style="padding: 12px;">86.1%</td>
116
+ <td style="padding: 12px;">87.5%</td>
117
+ <td style="padding: 12px;">79.5%</td>
118
+ </tr>
119
+ <tr style="border-bottom: 1px solid #f1f5f9; background-color: rgba(99, 102, 241, 0.08); font-weight: 600;">
120
+ <td style="padding: 12px; text-align: left; padding-left: 20px; color: #4f46e5;">IFEval (Instruction)</td>
121
+ <td style="padding: 12px; color: #10b981;">91.2%</td>
122
+ <td style="padding: 12px;">89.8%</td>
123
+ <td style="padding: 12px;">85.4%</td>
124
+ <td style="padding: 12px;">88.1%</td>
125
+ <td style="padding: 12px;">77.4%</td>
126
+ </tr>
127
+ <tr style="border-bottom: 1px solid #f1f5f9;">
128
+ <td style="padding: 12px; text-align: left; padding-left: 20px;">HMMT (Reasoning)</td>
129
+ <td style="padding: 12px; font-weight: bold; color: #10b981;">78.3%</td>
130
+ <td style="padding: 12px;">74.0%</td>
131
+ <td style="padding: 12px;">70.2%</td>
132
+ <td style="padding: 12px;">72.8%</td>
133
+ <td style="padding: 12px; color: #94a3b8;">--</td>
134
+ </tr>
135
+ <tr style="border-bottom: 1px solid #f1f5f9; background-color: rgba(99, 102, 241, 0.08); font-weight: 600;">
136
+ <td style="padding: 12px; text-align: left; padding-left: 20px; color: #4f46e5;">LiveCodeBench v6</td>
137
+ <td style="padding: 12px; color: #10b981;">58.4%</td>
138
+ <td style="padding: 12px;">55.8%</td>
139
+ <td style="padding: 12px;">51.0%</td>
140
+ <td style="padding: 12px;">54.2%</td>
141
+ <td style="padding: 12px;">45.1%</td>
142
+ </tr>
143
+ <tr style="border-bottom: 1px solid #f1f5f9;">
144
+ <td style="padding: 12px; text-align: left; padding-left: 20px;">TAU2-Bench (Agent)</td>
145
+ <td style="padding: 12px; font-weight: bold; color: #10b981;">82.1%</td>
146
+ <td style="padding: 12px;">79.9%</td>
147
+ <td style="padding: 12px;">72.4%</td>
148
+ <td style="padding: 12px;">75.0%</td>
149
+ <td style="padding: 12px; color: #94a3b8;">--</td>
150
+ </tr>
151
+ </tbody>
152
+ </table>
153
+ </div>
154
+
155
+ ### 👁️ Vision & Multimodal Edge
156
+
157
+ Next 2.5's visual cortex allows it to rival or beat proprietary nano-models from leading labs and larger 11B parameter open-weight models.
158
+
159
+ <div style="overflow-x: auto; box-shadow: 0 4px 6px rgba(0,0,0,0.05); border-radius: 8px; margin-top: 15px;">
160
+ <table style="width: 100%; border-collapse: collapse; text-align: center; font-family: sans-serif; background: #fff; min-width: 800px;">
161
+ <thead>
162
+ <tr style="background-color: #4f46e5; color: white;">
163
+ <th style="padding: 14px; text-align: left; padding-left: 20px; border-radius: 8px 0 0 0;">Benchmark</th>
164
+ <th style="padding: 14px; font-size: 1.1em;">Next 2.5 (4B) 🚀</th>
165
+ <th style="padding: 14px;">Qwen 3.5 (4B)</th>
166
+ <th style="padding: 14px;">Gemini-2.5 Flash-Lite</th>
167
+ <th style="padding: 14px;">GPT-5-Nano</th>
168
+ <th style="padding: 14px; border-radius: 0 8px 0 0;">Llama-3.2 (11B Vision)</th>
169
+ </tr>
170
+ </thead>
171
+ <tbody style="color: #333;">
172
+ <tr style="border-bottom: 1px solid #f1f5f9;">
173
+ <td style="padding: 12px; text-align: left; padding-left: 20px;">MMMU (General VQA)</td>
174
+ <td style="padding: 12px; font-weight: bold; color: #10b981;">79.5%</td>
175
+ <td style="padding: 12px;">77.6%</td>
176
+ <td style="padding: 12px;">73.4%</td>
177
+ <td style="padding: 12px;">75.8%</td>
178
+ <td style="padding: 12px;">71.2%</td>
179
+ </tr>
180
+ <tr style="border-bottom: 1px solid #f1f5f9; background-color: rgba(79, 70, 229, 0.05);">
181
+ <td style="padding: 12px; text-align: left; padding-left: 20px;">MathVision</td>
182
+ <td style="padding: 12px; font-weight: bold; color: #10b981;">76.8%</td>
183
+ <td style="padding: 12px;">74.6%</td>
184
+ <td style="padding: 12px;">52.1%</td>
185
+ <td style="padding: 12px;">62.2%</td>
186
+ <td style="padding: 12px;">50.5%</td>
187
+ </tr>
188
+ <tr style="border-bottom: 1px solid #f1f5f9;">
189
+ <td style="padding: 12px; text-align: left; padding-left: 20px;">OCRBench</td>
190
+ <td style="padding: 12px; font-weight: bold; color: #10b981;">86.5%</td>
191
+ <td style="padding: 12px;">85.0%</td>
192
+ <td style="padding: 12px;">82.5%</td>
193
+ <td style="padding: 12px;">75.3%</td>
194
+ <td style="padding: 12px;">74.1%</td>
195
+ </tr>
196
+ <tr style="border-bottom: 1px solid #f1f5f9; background-color: rgba(79, 70, 229, 0.05);">
197
+ <td style="padding: 12px; text-align: left; padding-left: 20px;">VideoMME (w/ sub)</td>
198
+ <td style="padding: 12px; font-weight: bold; color: #10b981;">84.8%</td>
199
+ <td style="padding: 12px;">83.5%</td>
200
+ <td style="padding: 12px;">74.6%</td>
201
+ <td style="padding: 12px;">71.7%</td>
202
+ <td style="padding: 12px;">68.9%</td>
203
+ </tr>
204
+ <tr style="border-bottom: 1px solid #f1f5f9;">
205
+ <td style="padding: 12px; text-align: left; padding-left: 20px;">CountBench (Spatial)</td>
206
+ <td style="padding: 12px; font-weight: bold; color: #10b981;">97.5%</td>
207
+ <td style="padding: 12px;">96.3%</td>
208
+ <td style="padding: 12px;">79.2%</td>
209
+ <td style="padding: 12px;">80.0%</td>
210
+ <td style="padding: 12px; color: #94a3b8;">--</td>
211
+ </tr>
212
+ </tbody>
213
+ </table>
214
+ </div>
215
+
216
+ <p style="font-size: 0.85em; color: #888; margin-top: 10px;"><em>* Benchmark improvements are driven by our high-quality Turkish reasoning datasets and specialized DPO alignment focusing on multi-step logic. Empty cells (--) indicate scores not officially reported for that model.</em></p>
217
+
218
+ ---
219
+
220
+ ## 🚀 Quickstart & Usage
221
+
222
+ **Next 2.5** is fully compatible with the Hugging Face `transformers` ecosystem and modern serving frameworks like `vLLM` and `SGLang`. Because it is natively multimodal, you can pass images directly into the prompt.
223
+
224
+ ### Python (Transformers)
225
+
226
+ Make sure you have the latest `transformers`, `torch`, `torchvision`, and `pillow` installed.
227
+
228
+ ```python
229
+ from transformers import AutoTokenizer, AutoModelForCausalLM, AutoProcessor
230
+ from PIL import Image
231
+ import torch
232
+
233
+ model_id = "thelamapi/next2.5"
234
+
235
+ model = AutoModelForCausalLM.from_pretrained(model_id)
236
+ processor = AutoProcessor.from_pretrained(model_id) # For vision.
237
+ tokenizer = AutoTokenizer.from_pretrained(model_id)
238
+
239
+
240
+ # Create a message in chat format
241
+ messages = [
242
+ {"role": "system","content": [{"type": "text", "text": "You are Next2.5, a smart and concise AI assistant trained by Lamapi. Always respond in the user's language. Proudly made in Turkey."}]},
243
+
244
+ {
245
+ "role": "user","content": [
246
+ {"type": "text", "text": "Write a highly optimized Rust function to calculate the Fibonacci sequence using memoization"}
247
+ ]
248
+ }
249
+ ]
250
+
251
+ # Prepare input with Tokenizer
252
+ prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=False)
253
+ inputs = processor(text=prompt, return_tensors="pt")
254
+
255
+ # Remove 'mm_token_type_ids' if it's not needed for text-only generation
256
+ if "mm_token_type_ids" in inputs:
257
+ del inputs["mm_token_type_ids"]
258
+
259
+
260
+ # Output from the model
261
+ output = model.generate(**inputs, do_sample=True, temperature=0.7, max_new_tokens=128)
262
+ print(tokenizer.decode(output[0], skip_special_tokens=True))
263
+ ```
264
+
265
+ ---
266
+
267
+ ## 🧩 Model Specifications
268
+
269
+ | Attribute | Details |
270
+ | :--- | :--- |
271
+ | **Base Architecture** | Qwen 3.5 (Causal Language Model + Vision Encoder) |
272
+ | **Parameters** | 4 Billion |
273
+ | **Context Length** | 262,144 tokens natively (Extensible to 1M+ via YaRN) |
274
+ | **Training Stage** | SFT + RLHF/DPO (Turkish + English focus) |
275
+ | **Hardware** | Runs comfortably on consumer GPUs (e.g., RTX 3060/4060 with 8GB VRAM in FP16, or less via Quantization) |
276
+ | **Capabilities** | Text Generation, Image Understanding, Video Summarization, OCR, Code Generation, Tool Use (Agentic) |
277
+
278
+ ---
279
+
280
+ ## 🎯 Ideal Use Cases
281
+
282
+ **Next 2.5 (4B)** strikes the perfect balance between high-end reasoning and hardware efficiency. It is perfectly suited for:
283
+ * 🕵️ **Complex Document Analysis:** Upload massive PDFs or images of documents and extract structured, reasoned JSON outputs.
284
+ * 🎓 **Educational Tutoring:** Its native `<think>` capabilities make it an excellent tutor that explains its mathematical steps to students.
285
+ * 🤖 **Autonomous Agents:** Strong `Tool Calling` capabilities let you build desktop agents or web-browsing bots locally.
286
+ * 🇹🇷 **Advanced Turkish NLP:** Finally, a mid-size multimodal model that understands Turkish idioms, grammar, and context as well as it does English.
287
+
288
+ ---
289
+
290
+ ## 📄 License & Open Source
291
+
292
+ Next 2.5 is released under the **Apache 2.0 License**. We support the open-source community and encourage developers to build commercial applications, conduct research, and innovate freely using this model.
293
+
294
+ ---
295
+
296
+ ## 📞 Contact & Community
297
+
298
+ * 📧 **Email:**[lamapicontact@gmail.com](mailto:lamapicontact@gmail.com)
299
+ * 🤗 **HuggingFace:** [Lamapi](https://huggingface.co/Lamapi)
300
+ * 💬 **Discord:** [Join the Lamapi Community](https://discord.gg/XgH4EpyPD2)
301
+
302
+ ---
303
+
304
+ <div align="center" style="margin-top: 40px; padding: 25px; border-top: 1px solid #e2e8f0; background: #f8fafc; border-radius: 8px;">
305
+ <p style="color: #475569; font-size: 15px; margin: 0;">
306
+ <strong>Next 2.5</strong> — Sınırları aşan görsel algı ve derin düşünme yeteneği. Türkiye'nin küresel yapay zeka vizyonu. 🌍
307
+ </p>
308
+ </div>