--- license: gemma language: - ko - en library_name: transformers pipeline_tag: text-generation tags: - awaxis - think - gemma - gemma-4 - reasoning - distillation - ko - en base_model: - TeichAI/gemma-4-31B-it-Claude-Opus-Distill-v2 - google/gemma-4-31B-it model-index: - name: AWAXIS-Think-31B results: - task: type: text-generation name: GPQA Diamond (20Q greedy, max_new_tokens=4096) dataset: name: GPQA Diamond (subset n=20, seed=42) type: Idavidrein/gpqa config: gpqa_diamond metrics: - type: accuracy value: 60.0 name: accuracy - task: type: text-generation name: CLIcK (Korean cultural-linguistic, n=200, alpha grid best) dataset: name: CLIcK type: EunsuKim/CLIcK metrics: - type: accuracy value: 86.0 name: accuracy --- # AWAXIS-Think-31B **AWAXIS-Think-31B** is a 31B-parameter Korean/English reasoning model built via the Darwin V8 FFN-crossbreed merge engine. ## Build recipe (honest disclosure) - **Mother (kept full)**: [TeichAI/gemma-4-31B-it-Claude-Opus-Distill-v2](https://huggingface.co/TeichAI/gemma-4-31B-it-Claude-Opus-Distill-v2) ??reasoning-distill base, retained 100% (incl. `` chain-of-thought style) - **Father (FFN donor)**: [google/gemma-4-31B-it](https://huggingface.co/google/gemma-4-31B-it) ??base Gemma-4 FFN tensors blended at **慣 = 0.1** - **Method**: per-layer FFN blend `w = w_mother*(1-慣) + w_father*慣` on `mlp.{gate,up,down}_proj` + `pre/post_feedforward_layernorm` for all 60 language-model layers; grid search 慣??0.1, 0.2, 0.3, 0.4} on CLIcK-50 ??best 慣=0.1 (CLIcK-200 = 86.0%) - **Architecture**: `Gemma4ForConditionalGeneration` (multimodal wrapper; text generation primary) - **Tokenizer**: Gemma-4 (vocab 262,144) ## Measured benchmarks | Benchmark | Setting | Result | |-----------|---------|--------| | GPQA Diamond 20Q (seed 42) | greedy, max_new_tokens=**4096**, 2-way DP | **12/20 = 60.0%** (16/20 still hit token cap, 0 null) | | GPQA Diamond 20Q (seed 42) | greedy, max_new_tokens=**2048** | 9/20 = 45.0% (16/20 truncated, 2 null) ??*truncation artifact, included for transparency* | | CLIcK (Korean) 200Q | greedy 慣-grid winner | 86.0% | ### Honest caveats - GPQA 60% is from **n=20** (small sample). 16/20 still hit the 4096-token cap ??real ceiling may be higher with longer generation budget. - Comparison to random baseline: GPQA random 25% ??+35pp clear learning signal. - The full GPQA Diamond (198Q) and other broad suites have not yet been measured for this exact merged artifact. - The model retains the **Mother's `...` reasoning template** ??strip via post-processing if undesired. ## Intended use - Korean/English step-by-step reasoning, instruction following, knowledge QA - The `Think` suffix reflects the inherited Opus-distilled chain-of-thought behavior ## Out-of-scope / limitations - Not a final clinical/legal advisor; outputs may be confidently wrong on hard graduate-level questions (40% wrong on the GPQA-20 set). - Inherits Gemma-4 base limitations (multimodal wrapper retained; image inputs not the primary use-case here). - Subject to Gemma Terms of Use; see parent model cards for derivative-use clauses. ## Inference ```python from transformers import AutoTokenizer, AutoModelForCausalLM import torch tok = AutoTokenizer.from_pretrained("Anserwise/AWAXIS-Think-31B", trust_remote_code=True) model = AutoModelForCausalLM.from_pretrained( "Anserwise/AWAXIS-Think-31B", dtype=torch.bfloat16, device_map="auto", trust_remote_code=True, attn_implementation="eager", # required for the Gemma4 multimodal wrapper ) msgs = [{"role": "user", "content": "?쒓뎅?대줈 ?먯떊???뚭컻??二쇱꽭??"}] text = tok.apply_chat_template(msgs, tokenize=False, add_generation_prompt=True) inp = tok(text, return_tensors="pt").to(model.device) out = model.generate(**inp, max_new_tokens=2048, do_sample=False) print(tok.decode(out[0][inp["input_ids"].shape[-1]:], skip_special_tokens=True)) ``` ## License Gemma Terms of Use (inherited from base). Use of this model is bound by [Google Gemma Terms](https://ai.google.dev/gemma/terms). ## Acknowledgements - TeichAI for the Opus-Distill base - Google DeepMind for Gemma-4 --- *Built with Darwin V8 FFN-crossbreed merge engine. Measured numbers above are exact; nothing inflated.*