--- license: apache-2.0 language: - en library_name: transformers pipeline_tag: text-generation tags: - qwen3 - wind-edge - custom-code - edge-llm - instruct - distillation base_model: - North-ML1/Wind-Edge-1.6-Base --- # Wind-Edge-1.6-Instruct Wind-Edge-1.6-Instruct is a compact custom Qwen3-compatible assistant model for local and edge inference. It was built from a depth-pruned Wind-Edge base and tuned with a Claude-heavy public distillation SFT mix, code/math instruction data, and a final behavior polish pass. This is a small model. It is intended for short answers, simple coding help, summaries, and lightweight local assistant use. It is not a replacement for large reasoning models. ## Recommended Usage Use `trust_remote_code=True`; the custom loader re-applies tied weights from `model.safetensors`. ```python import torch from transformers import AutoModelForCausalLM, AutoTokenizer repo = "arthu1/Wind-Edge-1.6-Instruct" tokenizer = AutoTokenizer.from_pretrained(repo, trust_remote_code=True) model = AutoModelForCausalLM.from_pretrained( repo, trust_remote_code=True, torch_dtype=torch.bfloat16, device_map="auto", ) messages = [{"role": "user", "content": "Who are you?"}] prompt = tokenizer.apply_chat_template( messages, tokenize=False, add_generation_prompt=True, enable_thinking=False, ) inputs = tokenizer(prompt, return_tensors="pt").to(model.device) out = model.generate( **inputs, max_new_tokens=256, do_sample=True, temperature=0.6, top_p=0.9, repetition_penalty=1.06, eos_token_id=[ tokenizer.eos_token_id, tokenizer.convert_tokens_to_ids("<|im_end|>"), ], ) print(tokenizer.decode(out[0][inputs.input_ids.shape[-1]:], skip_special_tokens=True)) ``` ## Suggested Settings For chat: - `enable_thinking=False` - `temperature=0.55-0.7` - `top_p=0.85-0.92` - `repetition_penalty=1.05-1.08` - `max_new_tokens=128-512` For deterministic tests: - `do_sample=False` - `repetition_penalty=1.06` - Keep prompts short and direct. The bundled chat template injects a minimal default identity system message if no system message is supplied: ```text You are Wind-Edge-1.6, a compact AI assistant model. You are not a human. ``` ## Training Summary - Source family: Qwen3-compatible Wind-Edge architecture - Base: depth-pruned and healed Wind-Edge base from Qwen3-0.6B-compatible weights - Final SFT: - 12M tokens of no-thinking distillation SFT - Claude-style public distillation data plus OpenOrca, OpenHermes, Open-Platypus, OpenCoder, and OpenMathInstruct - Bad self-identity teacher rows filtered - 6M-token system-template adaptation pass - 2M-token local quality polish for identity, simple arithmetic, list sorting, and concise coding behavior ## Quick Sanity Outputs Expected behavior after the final polish: - `hi` -> short greeting as Wind-Edge-1.6 - `Who are you?` -> identifies as Wind-Edge-1.6, not human - `sort this list: [3, 1, 2]` -> `[1, 2, 3]` - `60 miles in 1.5 hours` -> `40 mph` ## Limitations Wind-Edge-1.6-Instruct is small and can still make arithmetic, factual, and reasoning mistakes. It may overgeneralize from prompts, and it is best used with concise instructions and verification for important work. ## Citation See `wind_edge_1_6_paper.html` in this repository for a short technical write-up of the build and tuning process.