Instructions to use Lamapi/next-12b-Q2_K-GGUF with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use Lamapi/next-12b-Q2_K-GGUF with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("image-text-to-text", model="Lamapi/next-12b-Q2_K-GGUF")

# Load model directly
from transformers import AutoModel
model = AutoModel.from_pretrained("Lamapi/next-12b-Q2_K-GGUF", dtype="auto")

llama-cpp-python

How to use Lamapi/next-12b-Q2_K-GGUF with llama-cpp-python:

# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="Lamapi/next-12b-Q2_K-GGUF",
	filename="next-12b-q2_k.gguf",
)

output = llm(
	"Once upon a time,",
	max_tokens=512,
	echo=True
)
print(output)

Notebooks
Google Colab
Kaggle
Local Apps Settings

llama.cpp

How to use Lamapi/next-12b-Q2_K-GGUF with llama.cpp:

Install (macOS, Linux)

curl -LsSf https://llama.app/install.sh | sh
# Start a local OpenAI-compatible server with a web UI:
llama serve -hf Lamapi/next-12b-Q2_K-GGUF:Q2_K
# Run inference directly in the terminal:
llama cli -hf Lamapi/next-12b-Q2_K-GGUF:Q2_K

Install from WinGet (Windows)

winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama serve -hf Lamapi/next-12b-Q2_K-GGUF:Q2_K
# Run inference directly in the terminal:
llama cli -hf Lamapi/next-12b-Q2_K-GGUF:Q2_K

Use pre-built binary

# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf Lamapi/next-12b-Q2_K-GGUF:Q2_K
# Run inference directly in the terminal:
./llama-cli -hf Lamapi/next-12b-Q2_K-GGUF:Q2_K

Build from source code

git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf Lamapi/next-12b-Q2_K-GGUF:Q2_K
# Run inference directly in the terminal:
./build/bin/llama-cli -hf Lamapi/next-12b-Q2_K-GGUF:Q2_K

Use Docker

docker model run hf.co/Lamapi/next-12b-Q2_K-GGUF:Q2_K

LM Studio
Jan

vLLM

How to use Lamapi/next-12b-Q2_K-GGUF with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "Lamapi/next-12b-Q2_K-GGUF"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Lamapi/next-12b-Q2_K-GGUF",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker

docker model run hf.co/Lamapi/next-12b-Q2_K-GGUF:Q2_K

SGLang

How to use Lamapi/next-12b-Q2_K-GGUF with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "Lamapi/next-12b-Q2_K-GGUF" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Lamapi/next-12b-Q2_K-GGUF",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "Lamapi/next-12b-Q2_K-GGUF" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Lamapi/next-12b-Q2_K-GGUF",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Ollama
How to use Lamapi/next-12b-Q2_K-GGUF with Ollama:
```
ollama run hf.co/Lamapi/next-12b-Q2_K-GGUF:Q2_K
```

Unsloth Studio

How to use Lamapi/next-12b-Q2_K-GGUF with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for Lamapi/next-12b-Q2_K-GGUF to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for Lamapi/next-12b-Q2_K-GGUF to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for Lamapi/next-12b-Q2_K-GGUF to start chatting

Atomic Chat new
Docker Model Runner
How to use Lamapi/next-12b-Q2_K-GGUF with Docker Model Runner:
```
docker model run hf.co/Lamapi/next-12b-Q2_K-GGUF:Q2_K
```

Lemonade

How to use Lamapi/next-12b-Q2_K-GGUF with Lemonade:

Pull the model

# Download Lemonade from https://lemonade-server.ai/
lemonade pull Lamapi/next-12b-Q2_K-GGUF:Q2_K

Run and chat with the model

lemonade run user.next-12b-Q2_K-GGUF-Q2_K

List all available models

lemonade list

🚀 Next 12B (m200)

Türkiye's Advanced Vision-Language Model — High Performance, Multimodal, and Enterprise-Ready

📖 Overview

Next 12B is a 12-billion parameter multimodal Vision-Language Model (VLM) based on Gemma 3, fine-tuned to deliver exceptional performance in both text and image understanding. This is Türkiye's most advanced open-source vision-language model, designed for:

Superior understanding and generation of text and image descriptions.
Advanced reasoning and context-aware multimodal outputs.
Professional-grade Turkish support with extensive multilingual capabilities.
Enterprise-ready deployment with optimized quantization options.

This model is ideal for enterprises, researchers, and organizations who need a state-of-the-art multimodal AI capable of complex visual understanding, advanced reasoning, and creative generation.

Next 12B sets new standards for medium-sized models across all major benchmarks.

Model	MMLU (5-shot) %	MMLU-Pro %	GSM8K %	MATH %
Next 12B Version m200	91.8	78.4	94.3	81.2
Next 4B preview Version s325	84.6	66.9	82.7	70.5
Qwen 2.5 14B	79.9	68.3	87.5	74.3
Llama 3.1 8B	73.0	62.4	80.6	51.9

Next 12B approaches frontier model performance while maintaining efficiency.

Model	MMLU (5-shot) %	MMLU-Pro %	GSM8K %	MATH %
Next Z1 Version l294	97.3	94.2	97.7	93.2
Next 12B Version m200	91.8	78.4	94.3	81.2
GPT 4o	88.7	72.6	92.3	76.6
Claude Sonnet 4	~88.3	75.8	90.8	78.3

🚀 Installation & Usage

Use with vision:

from transformers import AutoTokenizer, AutoModelForCausalLM, AutoProcessor
from PIL import Image
import torch

model_id = "Lamapi/next-12b"

model = AutoModelForCausalLM.from_pretrained(model_id)
processor = AutoProcessor.from_pretrained(model_id) # For vision.
tokenizer = AutoTokenizer.from_pretrained(model_id)

# Read image
image = Image.open("image.jpg")

# Create a message in chat format
messages = [
  {"role": "system","content": [{"type": "text", "text": "You are Next-X1, a smart and concise AI assistant trained by Lamapi. Always respond in the user's language. Proudly made in Turkey."}]},

  {
      "role": "user","content": [{"type": "image", "image": image},
      {"type": "text", "text": "Who is in this image?"}
    ]
  }
]

# Prepare input with Tokenizer
prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = processor(text=prompt, images=[image], return_tensors="pt")

# Output from the model
output = model.generate(**inputs, max_new_tokens=50)
print(tokenizer.decode(output[0], skip_special_tokens=True))

Who is in this image?

The image shows Mustafa Kemal Atatürk, the founder and first President of the Republic of Turkey.

Use without vision:

from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

model_id = "Lamapi/next-12b"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id)

# Chat message
messages = [
    {"role": "system", "content": "You are Next-X1, a smart and concise AI assistant trained by Lamapi. Always respond in the user's language. Proudly made in Turkey."},
    {"role": "user", "content": "Hello, how are you?"}
]

# Prepare input with Tokenizer
prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(prompt, return_tensors="pt")

# Output from the model
output = model.generate(**inputs, max_new_tokens=50)
print(tokenizer.decode(output[0], skip_special_tokens=True))

Hello, how are you?

I'm fine, thank you. How are you?

🎯 Goals

Advanced Multimodal Intelligence: Superior understanding and reasoning over images and text.
Enterprise-Grade Performance: High accuracy and reliability for production deployments.
Efficiency: Optimized for professional GPUs with flexible quantization options.
Accessibility: Open-source availability for research and commercial applications.
Cultural Excellence: Best-in-class Turkish language support while maintaining multilingual capabilities.

✨ Key Features

Feature	Description
🔋 Optimized Architecture	Balanced performance and efficiency; supports multiple quantization formats.
🖼️ Advanced Vision-Language	Deep understanding of images with sophisticated visual reasoning capabilities.
🇹🇷 Professional Turkish Support	Industry-leading Turkish language performance with extensive multilingual reach.
🧠 Superior Reasoning	State-of-the-art logical and analytical reasoning for complex tasks.
📊 Production-Ready	Reliable, consistent outputs suitable for enterprise applications.
🌍 Open Source	Transparent, community-driven, and commercially friendly.

📐 Model Specifications

Specification	Details
Base Model	Gemma 3
Parameter Count	12 Billion
Architecture	Transformer, causal LLM + Enhanced Vision Encoder
Fine-Tuning Method	Advanced instruction & multimodal fine-tuning (SFT) on curated Turkish and multilingual datasets
Optimizations	Q8_0, Q4_K_M, F16, F32 quantizations for flexible deployment options
Modalities	Text & Image
Use Cases	Advanced image captioning, multimodal QA, text generation, complex reasoning, creative storytelling, enterprise applications

💡 Performance Highlights

MMLU Excellence: 91.8% on MMLU benchmark, demonstrating comprehensive knowledge across diverse domains
Mathematical Prowess: 81.2% on MATH benchmark, excelling in complex mathematical reasoning
Problem Solving: 94.3% on GSM8K, showcasing superior word problem solving capabilities
Professional Reasoning: 78.4% on MMLU-Pro, handling advanced professional-level questions

🎨 Use Cases

Enterprise Content Generation: High-quality multilingual content creation
Advanced Visual Analysis: Detailed image understanding and description
Educational Applications: Complex tutoring and explanation systems
Research Assistance: Literature review and data analysis
Creative Writing: Story generation and creative content
Technical Documentation: Code documentation and technical writing
Customer Support: Multilingual customer service automation
Data Extraction: Visual document processing and information extraction

📄 License

This project is licensed under the MIT License — free to use, modify, and distribute for commercial and non-commercial purposes. Attribution is appreciated.

📞 Contact & Support

📧 Email: lamapicontact@gmail.com
🤗 HuggingFace: Lamapi

Next 12B — Türkiye's most advanced vision-language AI, combining state-of-the-art multimodal understanding, superior reasoning, and enterprise-grade reliability.