Instructions to use hotdogs/gemma4-31b-opus-lora with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- PEFT
How to use hotdogs/gemma4-31b-opus-lora with PEFT:
from peft import PeftModel from transformers import AutoModelForCausalLM base_model = AutoModelForCausalLM.from_pretrained("google/gemma-4-31B-it") model = PeftModel.from_pretrained(base_model, "hotdogs/gemma4-31b-opus-lora") - Notebooks
- Google Colab
- Kaggle
Gemma 4 31B — Claude 4.6 Opus Reasoning LoRA
🧠 PEFT LoRA adapter (ไม่ใช่ full model — ต้องใช้คู่กับ base model )
สกัด reasoning behavior จาก EganAI/gemma-4-31B-Claude-4.6-Opus-Reasoning-Distilled ด้วย Weight-Diff SVD — attention-only rank=8, 22.5M params, 90 MB
📦 สิ่งที่อยู่ใน Repo นี้
| ไฟล์ | คำอธิบาย |
|---|---|
| PEFT LoRA weights (ใช้กับ transformers/peft) | |
| LoRA config (rank=8, alpha=16) | |
| GGUF format สำหรับ llama.cpp / Ollama | |
| สถิติการสกัด | |
| วิธีการสกัด (ภาษาไทย) | |
| Extraction method (English) |
🚀 Quick Start
PEFT (transformers)
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel
# 1. โหลด base model
base_model = AutoModelForCausalLM.from_pretrained(
"google/gemma-4-31B-it",
torch_dtype=torch.bfloat16,
device_map="auto",
)
tokenizer = AutoTokenizer.from_pretrained("google/gemma-4-31B-it")
# 2. โหลด LoRA adapter
model = PeftModel.from_pretrained(base_model, "hotdogs/gemma4-31b-opus-lora")
# 3. ใช้งาน
messages = [{"role": "user", "content": "Explain quantum computing step by step"}]
inputs = tokenizer.apply_chat_template(messages, return_tensors="pt").to("cuda")
outputs = model.generate(**inputs, max_new_tokens=512, temperature=0.7)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
llama.cpp (GGUF)
./llama-server \
-m gemma-4-31B-it-Q4_K_M.gguf \
--lora gguf/adapter_model.gguf \
--lora-scaled gguf/adapter_model.gguf:1.0 \
--host 0.0.0.0 --port 8080 \
--ctx-size 8192 -fa --jinja
Ollama Modelfile
FROM gemma4:31b
ADAPTER ./gguf/adapter_model.gguf
PARAMETER temperature 0.7
SYSTEM "You are a thoughtful AI that reasons step by step."
📊 Extraction Details
| Parameter | Value |
|---|---|
| Base Model | |
| Target Model | |
| Method | Weight-Diff SVD |
| Rank | 8 |
| Alpha | 16 |
| Target Modules | q_proj, k_proj, v_proj, o_proj (attention-only) |
| Tensors | 230/230 (ทั้งหมดมี delta!) |
| Params | 22,507,520 |
| PEFT Size | 90 MB |
| GGUF Size | 43 MB |
| Extraction Time | 80.5 นาที (CPU 12-core) |
💡 Opus distillation แก้ attention ครบทุก tensor — delta เฉลี่ย |d|≈0.8! Reasoning behavior ถูกจับได้อย่างสมบูรณ์
🔬 Methodology
อ่านเต็มๆ → METHOD.md | METHOD_EN.md
🙏 Credits
- Extraction & Curation: UKA (Hermes Agent, Nous Research)
- Base Model: Google — gemma-4-31B-it
- Distilled Model: EganAI — Claude 4.6 Opus Reasoning Distilled
📜 License
Apache 2.0
- Downloads last month
- 94
Hardware compatibility
Log In to add your hardware
We're not able to determine the quantization variants.
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support