--- language: - tr - en - de - es - fr - ru - zh - ja - ko license: apache-2.0 tags: - turkish - türkiye - reasoning - vision-language - vlm - multimodal - lamapi - next2.5 - qwen3.5 - gemma-3 - text-generation - image-text-to-text - open-source - 4b - edge-ai - large-language-model - llm - thinking-mode pipeline_tag: image-text-to-text datasets: - mlabonne/FineTome-100k - CognitiveKernel/CognitiveKernel-Pro-SFT - OpenSPG/KAG-Thinker-training-dataset - Gryphe/ChatGPT-4o-Writing-Prompts library_name: transformers ---
<think>...</think> blocks to reason through complex logic, math, and coding tasks before answering.| Benchmark | Next 2.5 (4B) 🚀 | Qwen 3.5 (4B) | Gemma-3 (4B) | Phi-4-Mini (3.8B) | Llama-3.2 (3B) |
|---|---|---|---|---|---|
| MMLU-Pro | 81.6% | 79.1% | 76.5% | 78.2% | 68.4% |
| MMLU-Redux | 90.2% | 88.8% | 86.1% | 87.5% | 79.5% |
| IFEval (Instruction) | 91.2% | 89.8% | 85.4% | 88.1% | 77.4% |
| HMMT (Reasoning) | 78.3% | 74.0% | 70.2% | 72.8% | -- |
| LiveCodeBench v6 | 58.4% | 55.8% | 51.0% | 54.2% | 45.1% |
| TAU2-Bench (Agent) | 82.1% | 79.9% | 72.4% | 75.0% | -- |
| Benchmark | Next 2.5 (4B) 🚀 | Qwen 3.5 (4B) | Gemini-2.5 Flash-Lite | GPT-5-Nano | Llama-3.2 (11B Vision) |
|---|---|---|---|---|---|
| MMMU (General VQA) | 79.5% | 77.6% | 73.4% | 75.8% | 71.2% |
| MathVision | 76.8% | 74.6% | 52.1% | 62.2% | 50.5% |
| OCRBench | 86.5% | 85.0% | 82.5% | 75.3% | 74.1% |
| VideoMME (w/ sub) | 84.8% | 83.5% | 74.6% | 71.7% | 68.9% |
| CountBench (Spatial) | 97.5% | 96.3% | 79.2% | 80.0% | -- |
* Benchmark improvements are driven by our high-quality Turkish reasoning datasets and specialized DPO alignment focusing on multi-step logic. Empty cells (--) indicate scores not officially reported for that model.
--- ## 🚀 Quickstart & Usage **Next 2.5** is fully compatible with the Hugging Face `transformers` ecosystem and modern serving frameworks like `vLLM` and `SGLang`. Because it is natively multimodal, you can pass images directly into the prompt. ### Python (Transformers) Make sure you have the latest `transformers`, `torch`, `torchvision`, and `pillow` installed. ```python from transformers import AutoTokenizer, AutoModelForCausalLM, AutoProcessor from PIL import Image import torch model_id = "thelamapi/next2.5" model = AutoModelForCausalLM.from_pretrained(model_id) processor = AutoProcessor.from_pretrained(model_id) # For vision. tokenizer = AutoTokenizer.from_pretrained(model_id) # Create a message in chat format messages = [ {"role": "system","content": [{"type": "text", "text": "You are Next2.5, a smart and concise AI assistant trained by Lamapi. Always respond in the user's language. Proudly made in Turkey."}]}, { "role": "user","content": [ {"type": "text", "text": "Write a highly optimized Rust function to calculate the Fibonacci sequence using memoization"} ] } ] # Prepare input with Tokenizer prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=False) inputs = processor(text=prompt, return_tensors="pt") # Remove 'mm_token_type_ids' if it's not needed for text-only generation if "mm_token_type_ids" in inputs: del inputs["mm_token_type_ids"] # Output from the model output = model.generate(**inputs, do_sample=True, temperature=0.7, max_new_tokens=128) print(tokenizer.decode(output[0], skip_special_tokens=True)) ``` --- ## 🧩 Model Specifications | Attribute | Details | | :--- | :--- | | **Base Architecture** | Qwen 3.5 (Causal Language Model + Vision Encoder) | | **Parameters** | 4 Billion | | **Context Length** | 262,144 tokens natively (Extensible to 1M+ via YaRN) | | **Training Stage** | SFT + RLHF/DPO (Turkish + English focus) | | **Hardware** | Runs comfortably on consumer GPUs (e.g., RTX 3060/4060 with 8GB VRAM in FP16, or less via Quantization) | | **Capabilities** | Text Generation, Image Understanding, Video Summarization, OCR, Code Generation, Tool Use (Agentic) | --- ## 🎯 Ideal Use Cases **Next 2.5 (4B)** strikes the perfect balance between high-end reasoning and hardware efficiency. It is perfectly suited for: * 🕵️ **Complex Document Analysis:** Upload massive PDFs or images of documents and extract structured, reasoned JSON outputs. * 🎓 **Educational Tutoring:** Its native `Next 2.5 — Sınırları aşan görsel algı ve derin düşünme yeteneği. Türkiye'nin küresel yapay zeka vizyonu. 🌍