Gemma 4 26B A4B (MoE) — Claude 4.6 Opus Reasoning LoRA

🧠 PEFT LoRA adapter (ไม่ใช่ full model — ต้องใช้คู่กับ base model )

LoRA ฝึกด้วย Unsloth SFT จาก Borcherding/Gemma4-26B-A4B-Claude-4.6-Opus-Reasoning-Distilled — rank=8, alpha=8, target attention+MLP, 944 MB

⚠️ โมเดลนี้เป็น Mixture-of-Experts (26B total, 4B active) — ใช้ VRAM น้อยกว่า Dense 31B ⚠️ GGUF version ถอดเฉพาะ attention+MLP (ไม่มี expert tensors) — 18 MB

📦 สิ่งที่อยู่ใน Repo นี้

ไฟล์	คำอธิบาย
	PEFT LoRA weights (ใช้กับ transformers/peft)
	LoRA config (rank=8, alpha=8)
	GGUF format (attention+MLP only, ไม่รวม experts)

🚀 Quick Start

PEFT (transformers)

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel

# 1. โหลด base model (MoE — 4B active, ประหยัด VRAM)
base_model = AutoModelForCausalLM.from_pretrained(
    "google/gemma-4-26B-A4B-it",
    torch_dtype=torch.bfloat16,
    device_map="auto",
)
tokenizer = AutoTokenizer.from_pretrained("google/gemma-4-26B-A4B-it")

# 2. โหลด LoRA adapter
model = PeftModel.from_pretrained(base_model, "hotdogs/gemma4-26b-opus-lora")

# 3. ใช้งาน
messages = [{"role": "user", "content": "Solve this step by step: 3x + 7 = 22"}]
inputs = tokenizer.apply_chat_template(messages, return_tensors="pt").to("cuda")
outputs = model.generate(**inputs, max_new_tokens=512, temperature=0.7)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

llama.cpp (GGUF)

# GGUF เป็น attention+MLP only (18 MB)
./llama-server \
  -m gemma-4-26B-A4B-it-Q4_K_M.gguf \
  --lora gguf/adapter_model.gguf \
  --lora-scaled gguf/adapter_model.gguf:1.0 \
  --host 0.0.0.0 --port 8080 \
  --ctx-size 8192 -fa --jinja

Ollama Modelfile

FROM gemma4:26b
ADAPTER ./gguf/adapter_model.gguf
PARAMETER temperature 0.7
SYSTEM "You are a thoughtful AI that reasons step by step."

📊 Adapter Details

Parameter	Value
Base Model
Source
Training	Unsloth SFT
Rank	8
Alpha	8
Target Modules	Attention + MLP + Experts (full LoRA)
PEFT Size	944 MB
GGUF Size	18 MB (attn+MLP only)

⚠️ GGUF note: expert tensors (120 tensors, MoE-specific) ถอดออกเพราะ llama.cpp ยังไม่รองรับ — เหลือเฉพาะ attention + MLP (410 tensors)

🙏 Credits

Training: Borcherding — Claude 4.6 Opus Reasoning Distilled
GGUF Conversion & Curation: UKA (Hermes Agent, Nous Research)
Base Model: Google / Unsloth — gemma-4-26b-a4b-it

📜 License

Apache 2.0

Downloads last month: 20

GGUF

Model size

9.29M params

Architecture

gemma4

Hardware compatibility

We're not able to determine the quantization variants.

View all variants

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for hotdogs/gemma4-26b-opus-lora

Base model

google/gemma-4-26B-A4B

Finetuned

google/gemma-4-26B-A4B-it

Adapter

(74)

this model