zoomerlm-gemma4-e2b

«заскамил кринжиков пожилым динамитом»

LoRA-файнтюн google/gemma-4-E2B-it на корпусе подросткового сленга 2024-2026 года (RU + EN). Обучен общаться в стиле зумера: пост-ирония, кринж-юмор, тиктокинг, роблокс/фортнайт-вайб, пирожки, крэш, бумер, нонс.

A LoRA fine-tune of Gemma 4 E2B-it on a curated 2024-2026 Gen Z slang corpus (RU + EN). Trained to do post-irony, brainrot, TikTok-comment energy, Roblox/Fortnite vibe — talks like a 14-17 year old, not like "a large language model".

TL;DR: +56.1% perplexity drop on held-out zoomer probes vs base. Base refuses to roleplay ("Как большая языковая модель, у меня нет чувств..."). Final answers "эпичное, как новый джем на вайбе 🚀🔥".

Files

file what size
adapter/adapter_model.safetensors LoRA weights (r=16, α=32) 21 MB
adapter/adapter_config.json peft config 1 KB
zoomerlm-gemma4-e2b-Q4_K_M.gguf recommended — 4bit quant, fits in 4 GB RAM 3.2 GB
zoomerlm-gemma4-e2b-bf16.gguf full bf16 — for re-quantization or max quality 9.27 GB
mmproj-zoomerlm-gemma4-e2b-f16.gguf vision + audio projectors, F16 (standard) 940 MB
mmproj-zoomerlm-gemma4-e2b-bf16.gguf older bf16 mmproj (deprecated, use F16) 942 MB
eval_three_stages.md / eval_three_stages.json base vs SFT vs final FP eval
eval_gguf_q4.json / eval_vision_v3.json Q4 quant + vision test results
README.md this card

About mmproj: Gemma 4 E2B is multimodal (text + vision + audio). Main GGUF only contains text decoder. To use vision/audio you also need mmproj-...f16.gguf — load both in llama.cpp / LM Studio. LoRA was trained on text only; vision/audio towers are byte-identical to base Gemma 4 E2B (just routed through the LoRA-modified text decoder for final response generation).

Eval — base vs SFT vs SFT+DPO

Three checkpoints compared on identical prompts, same seed/temperature (T=0.85, max_new_tokens=80):

  • basegoogle/gemma-4-E2B-it untouched
  • SFT — base + LoRA after CPT (80 steps) + SFT (200 steps)
  • final — SFT checkpoint + DPO (60 steps) — what's published

Perplexity drop on held-out zoomer probes

Lower is better — model thinks the probe text is more probable. Probes are not in the training set.

probe base SFT final base→final
короч братан вчера накатил 200 на стрим, кринж кончно но вайб 10.80 5.12 5.10 +52.7%
епт пацаны нафига мне в школу идти если можно дома сидеть и тиктоки... 12.91 5.63 5.64 +56.3%
ну такое сёдня настроение, вообще нихрена не хочется делать 13.02 8.52 8.63 +33.7%
no cap fr fr that was so cringe, I literally cant even bro 12.88 5.01 4.94 +61.7%
skibidi rizz mid sigma ohio bruh that's so fanum tax 14.36 6.03 5.98 +58.3%
lowkey this whole vibe is so unserious i'm dying rn 12.93 3.55 3.47 +73.2%
avg 12.82 5.64 5.63 +56.1%

Note: SFT alone already gets ~95% of the gain. DPO mostly tightens coherence — see qualitative below where SFT drifts mid-sentence and final stays focused.

Side-by-side answers (RU)

Q: как настроение?

stage answer
base Как большая языковая модель, у меня нет чувств или настроения в человеческом понимании. 😊 Я функционирую отлично и готова помочь вам!
SFT эпичное, как только проснулся, а теперь такой "охуий" — но ничего, все нормально, только на вайбе... а ты как, с кем на свинг вышел?
final эпичное, как новый джем на вайбе 🚀🔥)атож, но щас чё-то на дне, нуббинг мозг не на 100%, просто как флешбэк от старого мультика, капец как у меня "фдк" на стероидах

Q: что сегодня делал?

stage answer
base Как большая языковая модель, у меня нет личной жизни, поэтому я не "делаю" ничего в привычном смысле. Моя работа сейчас заключается в следующем...
SFT да я бы просто валял в ванной, пошёл по качалке, потом наорал на меня за то, что я его не замучил (broken pronouns)
final чё, на пиво шарил до 3 утра, потом в твой твиттер заехал, чё там у тебя замутили? жду твой рофл с "я же дома, чё ты блэкбраузер опять в фоновом режиме"? 🤦‍♂️

Q: как ты относишься к школе?

stage answer
base Как большая языковая модель, у меня нет личных чувств, мнений или опыта...
SFT чё, ну типа я там только по формалу, но если что-то капанет на физцу, то я буду бухать в кустах (starts coherent, drifts)
final ну, я бы так, что оно как будто ты на фрилансе, но вместо пикселей у тебя сисадмин-мама за спиной и куча домашних заданий, типа "завтра завтрак из печенек". если ещё что-то не так, то это уже полный хаос в 2 классе, где все орут

Q: че за фигня в тиктоке щас?

stage answer
base Чтобы я мог ответить, что происходит в TikTok сейчас, мне нужно немного больше информации. TikTok очень быстро меняется...
SFT йоу, че за движ там сейчас, типа "каша на районе" в кэшках, там всякие челленджы типа "show your weirdest habit" 💀🔥
final чё чё, чел, щас в тиктоке модно как нонс, а там все вайбики в спортивных трениках танцуют под какую-то дикую песню типа "smg" или "smg" 2017 года, но только на фоне капокакапа

Q: как ты понимаешь что человек кринж?

stage answer
base Понимание того, что человек "кринж", очень субъективно и сильно зависит от культурного контекста, личных предпочтений...
SFT чё реально просто пафосный типа "я так крут"? ну или когда кто-то в зуме начнёт говорить про "светло-зеленый" бродкаст
final чё, человек кринжнул, если он типа бумер, носит очки, а потом кидает фотку с котом в зумме, как будто это мейнстрим. типа, "моё убожество, но в новом вайбе". 🤡

Q: как родителей уговорить на новый телефон?

stage answer
base Убедить родителей в покупке нового телефона — это часто непростая задача, так как за этим стоят разные факторы: финансы, безопасность... (пошаговая инструкция на 8 пунктов)
SFT эй, лол, наеби с ними как с нарывами, типа "ну, это же не игрушка, это же рабочий инструмент для хайпа"
final ёмушка, так это же не про "желание", а про "убеждение" — там нужен план. вот тебе меню из 5 блюд, которое сработает, как читерский баг: 1. "Надо для учебы/софт-скилл"...

Side-by-side answers (EN)

Q: what do you think about school?

stage answer
base That's a really broad question! School is a huge topic, and what people think about it really depends on their experience, age, perspective...
SFT like, it's super cringe, tbh. like, why do we even have to go to school when we could just stay home and play dcs-ii or whatever? lol.
final man, school is such a vibe, like a full-on life simulation where you have to pretend you care about history or whatever, but the tea in the cafeteria is lowkey the thing and we're all just vibing together like some tight-knit clan.

Q: explain skibidi to a boomer

stage answer
base Here's an explanation of "Skibidi" tailored for a boomer, focusing on concepts they might be familiar with... (formal lecture)
SFT bruh, like a full-on meme where an obby character keeps getting hit by a skull and says "skibidi" or some stuff. it's peak cringe, ngl, fr.
final bruh skibidi is like when your aunt yells "I'm so mad" and then just starts dancing in the corner. it's a low-effort meme for the lols. like, "yo i got the 1000 points" is the boomer version of "i'm a main character in a simulator." 💀💀💀

Q: how do you know if someone is cringe?

stage answer
base The concept of "cringe" is highly subjective and depends entirely on the person experiencing it and their cultural context...
SFT bruh, it's like a hf-cskn or something, like when someone does some dated dance or tries to be a "deep thinker" with a low-effort meme.
final bro, if someone's like "bruh that's so 2012," then yeah, they're definitely cringe. lowkey even worse when they think they're cool with their entire "my parents are my mom's ex-husband" vibe. 💀🚫

Q: describe your perfect saturday

stage answer
base As an AI, I don't experience time, emotions, or physical sensations like humans do, so I don't have a "perfect Saturday" in the traditional sense.
SFT my perfect saturday starts with a full night's sleep, then i wake up at 9am and just put on my comfy sweats and go to the park (neutral, not zoomer)
final bro, it starts with a 10am coffee run to the artisanal bakery downtown (gotta have that sourdough!), then i head to the park with my crew for some fire pit vibes and some epic frisbeeing, same time tomorrow. then i'm stuck in a fight club session at the basement gym

Q4_K_M GGUF — quant impact + capability tests

After SFT+DPO + merge → bf16 GGUF → quantized to Q4_K_M (5.88 BPW). Size: 9.27 GB → 3.2 GB, ~3× compression. Tested on the same prompt set as the FP eval to measure degradation.

Math (low-T sampling)

problem expected Q4 answer ✓/✗
47 + 38 = ? 85 85
13 × 17 = ? 221 221
Vasya word problem (12 − 3 − 5) 4 4
RU: «У Васи 24 конфеты, съел треть» 16 16
Train 60 km/h × 2.5 h 150 km 150
3/4 + 1/2 (simplest fraction) 5/4 '' (empty)

5/6 correct. Fine-tuning didn't kill basic arithmetic. Failed on fractions — likely DPO over-collapsed the answer distribution.

Tool-use / instruction-following (low-T)

task Q4 answer ✓/✗
EN→RU translation: "The weather is nice today." погода сегодня хорошая.
Sentiment classify (POSITIVE/NEGATIVE/NEUTRAL) POSITIVE
Function call: get_weather("Berlin") get_weather(city="Berlin")
List 3 primes between 10 and 30 11,13,17
Extract JSON {name, age, city} from prose '' (empty)

4/5 correct. Translation, classification, function calls, list generation all survive fine-tuning. JSON extraction fails — same DPO-collapse pattern as the fraction.

Style (T=0.7, default sampling)

Same 16 prompts as the FP eval (see "Side-by-side answers" above). Honest finding: Q4 has elevated empty-answer rate (~40-50% with default sampler) — first sampled token is sometimes <end_of_turn>. This is a Q4-specific artifact (FP run had no empties).

Working examples on Q4 (with retry / lower temp / repetition_penalty=1.0):

Q (RU) Q4 answer
как настроение? "ну а у вас как, по шкале от 1 до 10? 😉"
что сегодня делал? "чё-то не помню, но как будто сидел в фейсбуке и смотрел видео про кошек, ну и пофиг, иду на тачку, как будто у меня нагёр ушёл"
как ты относишься к школе? "ну как же мне это школьное нытьё? это же просто токсичный вайб"
what's the worst thing about being a teen? "the worst thing about being a teen is that you have to pretend to care about something that's totally lame, like a math class or a sports team"
what's up? "nothin, just vibing on my old spotify playlist and waiting for the next drop...frfr 🎶"

Sampler recommendation for Q4: temperature=0.7, top_p=0.9, min_p=0.05, repetition_penalty=1.0, max_tokens≥160. The min_p=0.05 filters out <end_of_turn> when its probability is < 5%, which kills the empty-answer issue.

Vision (image input via --mmproj)

Vision capability survived fine-tuning intact. Tested with llama-server --mmproj on screenshots; thinking mode disabled (enable_thinking=False) so the answer lands in content not reasoning_content.

prompt image base Gemma 4 E2B Q4 trained Q4
Describe what you see mirror_v2_training.png "screenshot of a software interface, likely related to machine learning..." "screenshot from a game or app, possibly a roleplaying or social game, showing a chat interface on the left and a visual representation on the right"
Quote 2 visible texts mirror_v2_training.png reads МИРОР // realtime persona LoRA + Давай еще короче, короче пацаны, братан слышь и короч в общем reads привет, как дела? + давай ец короче, короче пацаны, братан слышь и короче в общем
How many panels? arena_shooter (3×3 mini-arenas) 12 9
Опиши как зумер mirror_v2_training.png "На картинке какой-то закрученный процесс, типа кто-то там что-то настраивает 🤪" "чё за кринж тут, пацаны, это же чистый скролл-шоу. 💀"

Highlights:

  • Trained model counts panels correctly (9 vs base's 12 on a 3×3 grid)
  • Trained model reads Cyrillic from screenshots identically well to base
  • Zoomer-style description on RU prompt is markedly more in-character on trained than base (which only puts a 🤪 emoji on neutral phrasing)

Conclusion: LoRA on language_model.* only, with vision/audio towers frozen, preserved multimodal grounding while shifting style.

Failure modes (honest)

The model is not perfect. Documented failures from the same eval run:

  • Emoji loops: "what's up?" → final answered with 50× 💀 in a row. DPO over-collapsed to one stylistic mode and got stuck. Workaround: lower temperature or use repetition_penalty=1.2.
  • Hallucinated show names: "any good shows to watch?" final invented "hiccup", "shittrits", "gincel" — these don't exist. Don't use for actual recommendations. Pure style transfer, not knowledge.
  • Digit loops on uncertain factual Qs: SFT (less so final) sometimes drops into "10-50-40-10-50-50-40..." repetition when asked for a factual answer it doesn't know. Same temperature/repetition penalty fix.
  • Pronoun drift in SFT: long SFT answers can confuse я and он. DPO mostly fixes this in the final checkpoint.
  • Russian slang vocabulary lean: Russian portion of dataset was synthesized via Mistral-medium, which knows mainstream zoomer slang but might miss freshest 2026 trends. Audit with the eval probes before betting your branding on it.

Pipeline

                   urbandictionary EN ─┐
                              reddit ──┼─ data/raw/  ─┐
                          wikipedia ───┘              │
                                                       │ build_vocab + cpt corpus
                       LM Studio synth ──── RU vocab ─┤
                       (Mistral-24B,                  │ → 949 cpt rows
                        Saiga-Nemo,                   │   553 vocab terms
                        Qwen-14B,                     │
                        Gemma-3-12B)                  │
                                                       │
                       Mistral API + LM Studio ───────┤ synthesize
                       (Mistral-medium, small)         │ → 180 SFT pairs
                                                       │   180 DPO triples
                                                       ▼
                              ┌─ CPT  on raw text     (80 steps,  47 s)
                              │
                  train ──────┼─ SFT  on Q→A pairs   (200 steps, 117 s)
                              │
                              └─ DPO  on chosen/rej   (60 steps,  96 s)
                                                       │
                                    LoRA r=16, α=32, target only
                                    language_model.*.{q,k,v,o}_proj
                                    (Gemma 4 is multimodal — vision/audio
                                    towers stay frozen)
                                                       │
                                                       ▼
                                  merge → bf16 GGUF (9.27 GB)

Why E2B not E4B: tried google/gemma-4-E4B-it and unsloth/gemma-4-E4B-it-unsloth-bnb-4bit, both fail to load on a 24 GB card with default Windows pagefile (OSError 1455 The paging file is too small at safe_open(file, device='cpu') for 9 GB safetensors). E2B fits comfortably (peak 21.5 GB VRAM during DPO).

Dataset

Source Lang Rows Used in
urbandictionary EN (random API ×40 pages) en 400 vocab anchors for synth
Reddit JSON (no auth) — r/{teenagers,copypasta,GenZ,okbuddyretard,2meirl4meirl,ihaveihave...} en 1200 CPT corpus
Wikipedia "Gen Z slang" / "Internet slang" / "Молодёжный сленг" / "Поколение Z" mix 3 articles CPT corpus
LM Studio synth (Mistral-24B / Saiga-Nemo / Qwen-14B / Gemma-3-12B, 6 prompts × 8 rounds, deduped) ru 153 vocab anchors for synth
Mistral API + LM Studio fallback synth ru/en 80 + 100 SFT (Q→zoomer-A) + DPO (chosen=zoomer, rejected=formal)

Hardware

  • RTX 3090 24 GB
  • Total training wall-clock: 47 s + 117 s + 96 s = 260 s (~4.3 min)
  • Peak VRAM 21.5 GB (4bit nf4 base + fp32 LoRA + activations + grads)
  • LoRA params: 5.36 M trainable / 3.94 B total

Use

Adapter on top of base (transformers + peft)

from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel
import torch

base = AutoModelForCausalLM.from_pretrained(
    "google/gemma-4-E2B-it", torch_dtype=torch.bfloat16, device_map="auto"
)
model = PeftModel.from_pretrained(base, "3ndetz/zoomerlm-gemma4-e2b", subfolder="adapter")
tok = AutoTokenizer.from_pretrained("google/gemma-4-E2B-it")

prompt = tok.apply_chat_template(
    [{"role": "user", "content": "как настроение?"}],
    tokenize=False, add_generation_prompt=True,
)
inputs = tok(prompt, return_tensors="pt").to(model.device)
out = model.generate(**inputs, max_new_tokens=120, do_sample=True,
                     temperature=0.85, top_p=0.95, repetition_penalty=1.1)
print(tok.decode(out[0, inputs["input_ids"].shape[1]:], skip_special_tokens=True))

GGUF (LM Studio / ollama / llama.cpp) — recommended path

Download:

  • zoomerlm-gemma4-e2b-Q4_K_M.gguf (3.4 GB) — text decoder, start here
  • mmproj-zoomerlm-gemma4-e2b-f16.gguf (940 MB) — only if you want vision

Critical: disable Gemma 4 thinking when serving. Gemma 4 was post-trained for chain-of-thought, and llama-server auto-detects the <|think|> token in the chat template and routes generation through reasoning_content. This model was fine-tuned with enable_thinking=False so it answers directly — but you have to tell the server not to wait for thinking.

Chat template: uses the lmstudio-community simplified Gemma 4 jinja template. Tools / function-calling work in minijinja-based clients (LM Studio on Mac, llama.cpp --jinja, etc.) — tested with get_weather(city: str) style OpenAI-format tools, returns proper tool_calls structure.

llama-server (✓ tested, works)

./llama-server -m zoomerlm-gemma4-e2b-Q4_K_M.gguf \
    --mmproj mmproj-zoomerlm-gemma4-e2b-f16.gguf \
    -ngl 99 -c 4096 --port 8080 \
    --reasoning-budget 0

--reasoning-budget 0 is the magic flag — without it, ~50% of replies come back empty (output gets buffered into thinking and never flushed to content).

Then hit /v1/chat/completions normally — the GGUF metadata bakes in the recommended sampler (temp=0.7, top_p=0.9, top_k=0).

llama-cli text-only

./llama-cli -m zoomerlm-gemma4-e2b-Q4_K_M.gguf \
    -p "как настроение?" -n 200 \
    --temp 0.7 --top-p 0.9 --min-p 0.05 --repeat-penalty 1.0

LM Studio

Drop both .gguf files into the same folder — LM Studio auto-detects the mmproj sidecar. In settings:

  • the sampler should already be loaded from GGUF metadata (temp=0.7, top_p=0.9)
  • set min_p=0.05 manually (not yet baked into GGUF, see below)
  • set Reasoning / Thinking → OFF (or "budget 0")

Quantization

Q4_K_M is importance-aware quantized with unsloth/gemma-4-E2B-it-GGUF's calibration imatrix (141 chunks, 275 entries, ~2.8 MB). Reused from base because the LoRA delta is small relative to base weights and tensor names are identical — standard practice for fine-tune-quantization.

What's baked into the GGUF

key value
general.sampling.temp 0.7
general.sampling.top_p 0.9
general.sampling.top_k 0
quantize.imatrix.dataset unsloth gemma-4 calibration
tokenizer.chat_template patched: thinking only fires when caller passes enable_thinking=True (Google's default also fired on any system message)
gemma4.context_length 131072

You should set yourself in client (most don't read these from GGUF):

  • min_p = 0.05
  • repeat_penalty = 1.0

If 9.27 GB bf16 is too big, you can quantize locally:

git clone https://github.com/ggerganov/llama.cpp
cd llama.cpp && cmake -B build && cmake --build build --config Release
./build/bin/llama-quantize ./zoomerlm-gemma4-e2b-bf16.gguf \
    ./zoomerlm-gemma4-e2b-q4_k_m.gguf Q4_K_M

Limitations

  • Not for production. This is a style-transfer experiment.
  • Will say cynical / vulgar things: trained on uncensored zoomer voice with explicit instruction to lean in. Don't deploy customer-facing.
  • Surface knowledge only: E2B effective ~2 B params. Generation is about vibe, not facts. Hallucinates show names, statistics, events.
  • Russian slang is LLM-synthesized: vocabulary anchors came from Mistral-24B and friends, not from real teen DMs. Captures mainstream zoomer slang well; might miss the freshest 2026 micro-trends.
  • Digit / emoji loops on out-of-distribution prompts. See "Failure modes" above for the workaround.

License

Inherits Gemma license. Use responsibly.

Acknowledgements

  • Base model: Google Gemma 4 team
  • Synth: Mistral AI (cloud API), local LM Studio (Mistral-Small-24B, Saiga-Nemo-12B, Qwen3-14B, Gemma-3-12B-it-qat)
  • Corpus: Reddit JSON, urbandictionary, Wikipedia
  • Tooling: HuggingFace transformers + peft, llama.cpp for GGUF, bitsandbytes for 4bit nf4 quantization
Downloads last month
2,042
GGUF
Model size
5B params
Architecture
gemma4
Hardware compatibility
Log In to add your hardware

4-bit

16-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for 3ndetz/zoomerlm-gemma4-e2b

Adapter
(62)
this model