Lamapi
/

next-4b

+---
+language: tr
+license: mit
+tags:
+- turkish
+- türkiye
+- english
+- ai
+- lamapi
+- gemma3
+- next
+- next-x1
+- efficient
+- text-generation
+- open-source
+- 4b
+- huggingface
+- large-language-model
+- llm
+- causal
+- transformer
+- artificial-intelligence
+- machine-learning
+- ai-research
+- natural-language-processing
+- nlp
+- finetuned
+- lightweight
+- creative
+- summarization
+- question-answering
+- chat-model
+- generative-ai
+- optimized-model
+- unsloth
+- trl
+- sft
+pipeline_tag: text-generation
+metrics:
+- bleu
+- accuracy
+---
+# 🚀 Next 4B
+### *Türkiye’s First Vision-Language Model — Efficient, Multimodal, and Reasoning-Focused*
+[![License: MIT](https://img.shields.io/badge/License-MIT-blue.svg)](https://opensource.org/licenses/MIT)
+[![Language: English](https://img.shields.io/badge/Language-English-blue.svg)]()
+[![HuggingFace](https://img.shields.io/badge/🤗-Lamapi/Next--X1-V-orange.svg)](https://huggingface.co/Lamapi/next-x1)
+---
+## 📖 Overview
+**Next 4B** is a **4-billion parameter multimodal Vision-Language Model (VLM)** based on **Gemma 3**, fine-tuned to handle **both text and images** efficiently. It is **Türkiye’s first open-source vision-language model**, designed for:
+* Understanding and generating **text and image descriptions**.
+* Efficient reasoning and context-aware multimodal outputs.
+* Native Turkish support with multilingual capabilities.
+* Low-resource deployment using **8-bit quantization** for consumer-grade GPUs.
+This model is ideal for **researchers, developers, and organizations** who need a **high-performance multimodal AI** capable of **visual understanding, reasoning, and creative generation**.
+---
+## 🎯 Goals
+1. **Multimodal Intelligence:** Understand and reason over images and text.
+2. **Efficiency:** Run on modest GPUs using 8-bit quantization.
+3. **Accessibility:** Open-source availability for research and applications.
+4. **Cultural Relevance:** Optimized for Turkish language and context while remaining multilingual.
+---
+## ✨ Key Features
+| Feature                           | Description                                                             |
+| --------------------------------- | ----------------------------------------------------------------------- |
+| 🔋 Efficient Architecture         | Optimized for low VRAM; supports 8-bit quantization for consumer GPUs.  |
+| 🖼️ Vision-Language Capable       | Understands images, captions them, and performs visual reasoning tasks. |
+| 🇹🇷 Multilingual & Turkish-Ready | Handles complex Turkish text with high accuracy.                        |
+| 🧠 Advanced Reasoning             | Supports logical and analytical reasoning for both text and images.     |
+| 📊 Consistent & Reliable Outputs  | Reproducible responses across multiple runs.                            |
+| 🌍 Open Source                    | Transparent, community-driven, and research-friendly.                   |
+---
+## 📐 Model Specifications
+| Specification      | Details                                                                            |
+| ------------------ | ---------------------------------------------------------------------------------- |
+| Base Model         | Gemma 3                                                                       |
+| Parameter Count    | 4 Billion                                                                          |
+| Architecture       | Transformer, causal LLM + Vision Encoder                                           |
+| Fine-Tuning Method | Instruction & multimodal fine-tuning (SFT) on Turkish and multilingual datasets    |
+| Optimizations      | Q8_0, F16, F32 quantizations for low VRAM and high VRAM usage                       |
+| Modalities         | Text & Image                                                                       |
+| Use Cases          | Image captioning, multimodal QA, text generation, reasoning, creative storytelling |
+---
+## 🚀 Installation & Usage
+### Python Example
+```python
+from unsloth import FastModel
+from transformers import TextStreamer
+from PIL import Image
+model_path = "Lamapi/next-x1-v-7b"
+# Load 4-bit model for low VRAM
+model, tokenizer = FastModel.from_pretrained(model_path, load_in_4bit=True)
+# Example multimodal prompt
+messages = [
+    {"role": "system", "content": "You are a creative, reasoning-focused vision-language assistant."},
+    {"role": "user", "content": "Describe the content of this image and its possible context."},
+]
+image = Image.open("example.jpg")  # Your input image
+# Prepare prompt
+prompt = tokenizer.apply_chat_template(messages, add_generation_prompt=True)
+streamer = TextStreamer(tokenizer, skip_prompt=True)
+inputs = tokenizer(prompt, images=[image], return_tensors="pt").to(model.device)
+# Generate multimodal output
+_ = model.generate(**inputs, streamer=streamer, max_new_tokens=300, temperature=0.7, top_p=0.9)
+```
+---
+### 💡 Usage Examples
+| Category             | Example Prompt                                               |
+| -------------------- | ------------------------------------------------------------ |
+| 🖼️ Image Captioning | "Generate a detailed caption for this image in Turkish."     |
+| 🗣️ Conversation     | "Explain the relationship between the objects in the image." |
+| 📊 Analytical        | "Analyze this chart and summarize key points."               |
+| ✍️ Creative          | "Write a story based on the image content."                  |
+| 🎓 Cultural          | "Describe historical or cultural elements in the image."     |
+---
+## 📊 Performance & Benchmarks
+Next-X1-V 7B has been evaluated for **text and image understanding**, reasoning, and generation:
+* **Perplexity (Turkish text):** ~12–15
+* **Tokens/sec on 4-bit consumer GPUs:** 500–1200
+* **Image captioning accuracy:** High fidelity for complex scenes
+* **Multimodal reasoning:** Consistent and coherent across images and text
+> Indicates competitive performance for a **7B multimodal model**, deployable on standard GPUs with low latency.
+---
+## 📄 License
+This project is licensed under the **MIT License** — free to use, modify, and distribute. Attribution is appreciated.
+---
+## 📞 Contact & Support
+* 📧 **Email:** [lamapicontact@gmail.com](mailto:lamapicontact@gmail.com)
+* 🤗 **HuggingFace:** [Lamapi](https://huggingface.co/Lamapi)
+---
+> **Next 4B** — Türkiye’s **first vision-language AI**, combining **multimodal understanding, reasoning, and efficiency**.
+[![Follow on HuggingFace](https://img.shields.io/badge/Follow-HuggingFace-yellow?logo=huggingface)](https://huggingface.co/Lamapi)