--- base_model: schneewolflabs/A0i-12B datasets: - schneewolflabs/BigDenker-SFT library_name: transformers pipeline_tag: text-generation tags: - reasoning - thinking - chain-of-thought - mistral - sft - lora license: apache-2.0 --- # A1 A1 is a reasoning-tuned version of [`schneewolflabs/A0i-12B`](https://huggingface.co/schneewolflabs/A0i-12B) (Mistral Nemo–class, 12B). It was supervised-fine-tuned on [`schneewolflabs/BigDenker-SFT`](https://huggingface.co/datasets/schneewolflabs/BigDenker-SFT) to produce explicit `` chain-of-thought before its final answer. ## What's different from the base model The base A0i-12B does not reason — given a prompt it answers directly. A1 produces a reasoning trace inside `` and then the answer, in the Qwen3 thinking convention. The reasoning tokens were added **without resizing the vocabulary**. A0i's tokenizer ships with 986 unused reserved slots (``…``); ten of these were repurposed in place (token IDs unchanged, so embeddings were *not* resized): | ID | token | ID | token | |----|-------|----|-------| | 14 | `` | 19 | `` | | 15 | `` | 20 | `<\|vision_start\|>` | | 16 | `` | 21 | `<\|vision_end\|>` | | 17 | `` | 22 | `<\|image_pad\|>` | | 18 | `` | 23 | `<\|video_pad\|>` | Before training, these rows (which were zero/untrained in the base) were initialized from the mean of their surface string's sub-token embeddings (computed separately for `embed_tokens` and `lm_head`, which are untied), with a small symmetry-breaking perturbation. They were then trained as part of the finetune. > Note: the vision/tool tokens exist for chat-template completeness. A1 is a **text-only** model and was not trained on vision or tool-calling data. ## Usage A1 uses a Qwen3-style chat template (bundled as `chat_template.jinja`). The template injects `\n` as the assistant prefix, so the model continues with its reasoning, emits ``, then the answer. **Always use the chat template** — using the model without it will not trigger reasoning. ```python import torch from transformers import AutoModelForCausalLM, AutoTokenizer tok = AutoTokenizer.from_pretrained("schneewolflabs/A1") model = AutoModelForCausalLM.from_pretrained( "schneewolflabs/A1", dtype=torch.bfloat16, device_map="auto" ) messages = [ {"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "A bat and ball cost $1.10. The bat is $1.00 more than the ball. How much is the ball?"}, ] enc = tok.apply_chat_template( messages, add_generation_prompt=True, return_tensors="pt", return_dict=True ).to(model.device) out = model.generate(**enc, max_new_tokens=1024, do_sample=False) print(tok.decode(out[0][enc["input_ids"].shape[1]:], skip_special_tokens=False)) ``` The output has the form `…reasoning…\n\n…final answer…<|im_end|>`. ## Training - **Method:** SFT, 1 epoch, via the Merlina training system (grimoire `SFTLoss`, prompt-masked completion). - **Adaptation:** LoRA (r=64, α=128, dropout 0.05) on attention + MLP projections, plus `embed_tokens` and `lm_head` as fully-trained `modules_to_save` (required so the repurposed token rows actually learn). - **Hyperparameters:** lr 2e-5 (cosine, 5% warmup), effective batch 16 (bs 1 × grad-accum 16), max sequence length 4096, bf16, seed 42. - **Split:** 90% train / 10% held-out eval (random, seed 42). - **Result:** train loss 1.22 → 0.687; held-out eval loss ≈ 0.653 (eval ≤ train — no overfitting at 1 epoch). - The conservative learning rate plus the semantic embedding initialization were chosen to add reasoning while limiting drift of the base model's general token representations (the full embedding/lm_head matrices were trainable). ## Evaluation notes Behavioral checks show coherent step-by-step reasoning that **generalizes beyond the training distribution** — e.g. it solves the bat-and-ball problem correctly ($0.05) and explicitly rejects the common intuitive-trap answer. ## Limitations - **Always-on thinking:** the template starts every assistant turn with ``; the model reasons even on trivial prompts. A non-thinking path exists via the template (`enable_thinking=False` injects an empty ``) but was not specifically tuned. - **Single-source SFT:** trained on one dataset/style (BigDenker), so reasoning phrasing is fairly homogeneous. - **One epoch / conservative LR:** a deliberate, safe first pass — not exhaustively tuned. - Inherits all limitations and biases of the base model and the SFT data. ## Provenance Base: `schneewolflabs/A0i-12B` · Data: `schneewolflabs/BigDenker-SFT` · Tokenizer/template: repurposed reserved tokens + Qwen3 chat template.