Gemma 4 26B A4B (MoE) — Claude 4.6 Opus Reasoning LoRA

🧠 PEFT LoRA adapter (ไม่ใช่ full model — ต้องใช้คู่กับ base model )

LoRA ฝึกด้วย Unsloth SFT จาก Borcherding/Gemma4-26B-A4B-Claude-4.6-Opus-Reasoning-Distilled — rank=8, alpha=8, target attention+MLP, 944 MB

⚠️ โมเดลนี้เป็น Mixture-of-Experts (26B total, 4B active) — ใช้ VRAM น้อยกว่า Dense 31B ⚠️ GGUF version ถอดเฉพาะ attention+MLP (ไม่มี expert tensors) — 18 MB


📦 สิ่งที่อยู่ใน Repo นี้

ไฟล์ คำอธิบาย
PEFT LoRA weights (ใช้กับ transformers/peft)
LoRA config (rank=8, alpha=8)
GGUF format (attention+MLP only, ไม่รวม experts)

🚀 Quick Start

PEFT (transformers)

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel

# 1. โหลด base model (MoE — 4B active, ประหยัด VRAM)
base_model = AutoModelForCausalLM.from_pretrained(
    "google/gemma-4-26B-A4B-it",
    torch_dtype=torch.bfloat16,
    device_map="auto",
)
tokenizer = AutoTokenizer.from_pretrained("google/gemma-4-26B-A4B-it")

# 2. โหลด LoRA adapter
model = PeftModel.from_pretrained(base_model, "hotdogs/gemma4-26b-opus-lora")

# 3. ใช้งาน
messages = [{"role": "user", "content": "Solve this step by step: 3x + 7 = 22"}]
inputs = tokenizer.apply_chat_template(messages, return_tensors="pt").to("cuda")
outputs = model.generate(**inputs, max_new_tokens=512, temperature=0.7)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

llama.cpp (GGUF)

# GGUF เป็น attention+MLP only (18 MB)
./llama-server \
  -m gemma-4-26B-A4B-it-Q4_K_M.gguf \
  --lora gguf/adapter_model.gguf \
  --lora-scaled gguf/adapter_model.gguf:1.0 \
  --host 0.0.0.0 --port 8080 \
  --ctx-size 8192 -fa --jinja

Ollama Modelfile

FROM gemma4:26b
ADAPTER ./gguf/adapter_model.gguf
PARAMETER temperature 0.7
SYSTEM "You are a thoughtful AI that reasons step by step."

📊 Adapter Details

Parameter Value
Base Model
Source
Training Unsloth SFT
Rank 8
Alpha 8
Target Modules Attention + MLP + Experts (full LoRA)
PEFT Size 944 MB
GGUF Size 18 MB (attn+MLP only)

⚠️ GGUF note: expert tensors (120 tensors, MoE-specific) ถอดออกเพราะ llama.cpp ยังไม่รองรับ — เหลือเฉพาะ attention + MLP (410 tensors)


🙏 Credits


📜 License

Apache 2.0

Downloads last month
86
GGUF
Model size
9.29M params
Architecture
gemma4
Hardware compatibility
Log In to add your hardware

We're not able to determine the quantization variants.

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for hotdogs/gemma4-26b-opus-lora

Adapter
(36)
this model