Microfactory Node: 3D Printer (LoRA v3 QAT)

I trained this LoRA on top of the QAT-trained gemma-4-E4B-it-qat-q4_0-unquantized base. It runs parallel to v2: the same O'Brien judgment, but I wanted to see if fine-tuning on a Quantization-Aware-Trained base keeps more quality after q4_0 GGUF conversion.

What it does

Give it a print job — material, geometry, room temperature and humidity — and it returns structured Advice JSON:

Settings: nozzle_temp, bed_temp, retraction_mm, fan_pct, first_layer_fan_pct
Risk regions: where on the part, what risk, why, anchor hint
Reasoning: what transfers from prior knowledge and why

Training

Parameter	Value
Base model	`google/gemma-4-E4B-it-qat-q4_0-unquantized`
Method	LoRA (PEFT)
Rank	r=4, α=8
Epochs	1
Learning rate	2e-4
Batch size	2 × 4 gradient accumulation
Max sequence length	1536
Dataset	180 train / 80 eval (live-generated on Modal A10G)
GPU	NVIDIA A10G (24GB)
Framework	TRL SFTTrainer + transformers 5.x

Same low-rank, single-epoch setup as v2. The variable is the QAT base.

Dataset

I generated the training set by driving the base model across a grid of 4 materials × 5 geometries × 3 temperatures × 3 humidities (train), with 2 temperatures × 2 humidities held out for eval. Each example is a chat-format pair: system prompt describing the job → structured Advice JSON response.

I kept targets noisy — temperature=0.7, top_p=0.95 — to prevent template memorization.

Usage

from peft import PeftModel
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

tok = AutoTokenizer.from_pretrained("google/gemma-4-E4B-it-qat-q4_0-unquantized")
base = AutoModelForCausalLM.from_pretrained(
    "google/gemma-4-E4B-it-qat-q4_0-unquantized",
    dtype=torch.bfloat16,
    device_map="auto"
)
tuned = PeftModel.from_pretrained(base, "kylebrodeur/microfactory-node-lora-v3-qat")

messages = [{"role": "user", "content": "Your prompt here"}]
inputs = tok.apply_chat_template(messages, return_tensors="pt", add_generation_prompt=True).to(tuned.device)
out = tuned.generate(**inputs, max_new_tokens=512, do_sample=True, temperature=0.7)
print(tok.decode(out[0], skip_special_tokens=True))

Safety

This adapter proposes settings. It does not validate them. A deterministic Spine clamps every proposed value against hard material bounds before any printer sees them. The LoRA gives the opinion; the Spine has the veto.

Iteration history

Version	Base	Rank	Epochs	Dataset	Result
v1	gemma-3-1b-it	r=16	3	deterministic	❌ Parroted template
v2	gemma-4-E4B-it	r=4	1	live-generated	✅ Well-Tuned
v3	gemma-4-E4B-it-qat-q4_0-unquantized	r=4	1	live-generated	✅ Well-Tuned (QAT-trained — better fidelity after q4_0 quant)

v1 taught me what not to do. v3 tests whether QAT pre-training helps the quantized artifact.

Limitations

This adapter is narrow by design, and it will fail loudly outside that narrow band.

Materials and geometries outside the training grid — The grid covered four materials and five geometries. Hand it an exotic filament or an unusual geometry and it will guess confidently. That guess is extrapolation, not recall.
Humid PETG stringing — Small Gemmas can return perfectly valid JSON with bad physics. During early driving I saw a lesson recommend slightly higher nozzle temperature to fight humid-PETG stringing, when the correct move is lower. Schema validation does not catch that. The human reads the plan before it runs.
Multi-tool or multi-material prints — These were not in the training grid. Expect invented tool-change behavior.
ABS without an enclosure — The model may propose settings that ignore chamber drafts. The Spine clamps individual values, but it does not model enclosure physics.
Mechanically risky combinations — Very small layer heights paired with aggressive retraction can pass JSON schema and still fail on the bed. That is why La Forge inspects and the human decides.
No live sensor feedback — It predicts from precedent and stops. It does not see actual bed adhesion, layer curling, or nozzle state. The printer and the human close the loop.
QAT-specific quant mismatch — The QAT base was trained for q4_0. If you pick q4_k_m you get a balanced default, but it is slightly off the quant the base prepared for. Use q4_0 for highest fidelity.
Single-epoch, low-rank LoRA on a specialized base — It has not deeply rewritten base knowledge, and the QAT base is already a specialized artifact. Ask it something far from 3D printing and it may behave less like general Gemma than v2 does. That is the trade-off.

Try it via GGUF (Ollama / llama.cpp)

Two quantized GGUFs of this adapter, merged into the QAT base, are published. Both live in kylebrodeur/microfactory-node-gguf and on the public Ollama registry:

Quant	HF Hub file	`ollama run …` (registry tag)	Why pick this one
q4_k_m	`microfactory-node-v3-qat.gguf` (5.1 GB)	`kylebrodeur/microfactory-node-v3-qat`	Balanced default
q4_0 (QAT-native)	`microfactory-node-v3-qat-q4_0.gguf` (4.9 GB)	`kylebrodeur/microfactory-node-v3-qat:q4_0`	Highest fidelity — this is the quant the QAT base was trained for

# Public Ollama registry (one-liner)
ollama run kylebrodeur/microfactory-node-v3-qat        # q4_k_m, recommended
ollama run kylebrodeur/microfactory-node-v3-qat:q4_0   # QAT-native quant

# Direct from HF Hub (template/system/params auto-applied)
ollama run hf.co/kylebrodeur/microfactory-node-gguf:microfactory-node-v3-qat.gguf
ollama run hf.co/kylebrodeur/microfactory-node-gguf:microfactory-node-v3-qat-q4_0.gguf

See the full publishing runbook for the merge → quantize → upload pipeline. The non-QAT sibling lives at microfactory-node-lora-v2.

License

This adapter inherits the Gemma license from its base model.

Downloads last month: 54

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for kylebrodeur/microfactory-node-lora-v3-qat

Base model

google/gemma-4-E4B

Finetuned

google/gemma-4-E4B-it