---
license: apache-2.0
language:
- en
library_name: transformers
pipeline_tag: text-generation
tags:
- qwen3
- wind-edge
- custom-code
- edge-llm
- instruct
- distillation
base_model:
- North-ML1/Wind-Edge-1.6-Base
---

# Wind-Edge-1.6-Instruct

Wind-Edge-1.6-Instruct is a compact custom Qwen3-compatible assistant model for local and edge inference. It was built from a depth-pruned Wind-Edge base and tuned with a Claude-heavy public distillation SFT mix, code/math instruction data, and a final behavior polish pass.

This is a small model. It is intended for short answers, simple coding help, summaries, and lightweight local assistant use. It is not a replacement for large reasoning models.

## Recommended Usage

Use `trust_remote_code=True`; the custom loader re-applies tied weights from `model.safetensors`.

```python
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

repo = "arthu1/Wind-Edge-1.6-Instruct"
tokenizer = AutoTokenizer.from_pretrained(repo, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
    repo,
    trust_remote_code=True,
    torch_dtype=torch.bfloat16,
    device_map="auto",
)

messages = [{"role": "user", "content": "Who are you?"}]
prompt = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True,
    enable_thinking=False,
)
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)

out = model.generate(
    **inputs,
    max_new_tokens=256,
    do_sample=True,
    temperature=0.6,
    top_p=0.9,
    repetition_penalty=1.06,
    eos_token_id=[
        tokenizer.eos_token_id,
        tokenizer.convert_tokens_to_ids("<|im_end|>"),
    ],
)
print(tokenizer.decode(out[0][inputs.input_ids.shape[-1]:], skip_special_tokens=True))
```

## Suggested Settings

For chat:

- `enable_thinking=False`
- `temperature=0.55-0.7`
- `top_p=0.85-0.92`
- `repetition_penalty=1.05-1.08`
- `max_new_tokens=128-512`

For deterministic tests:

- `do_sample=False`
- `repetition_penalty=1.06`
- Keep prompts short and direct.

The bundled chat template injects a minimal default identity system message if no system message is supplied:

```text
You are Wind-Edge-1.6, a compact AI assistant model. You are not a human.
```

## Training Summary

- Source family: Qwen3-compatible Wind-Edge architecture
- Base: depth-pruned and healed Wind-Edge base from Qwen3-0.6B-compatible weights
- Final SFT:
  - 12M tokens of no-thinking distillation SFT
  - Claude-style public distillation data plus OpenOrca, OpenHermes, Open-Platypus, OpenCoder, and OpenMathInstruct
  - Bad self-identity teacher rows filtered
  - 6M-token system-template adaptation pass
  - 2M-token local quality polish for identity, simple arithmetic, list sorting, and concise coding behavior

## Quick Sanity Outputs

Expected behavior after the final polish:

- `hi` -> short greeting as Wind-Edge-1.6
- `Who are you?` -> identifies as Wind-Edge-1.6, not human
- `sort this list: [3, 1, 2]` -> `[1, 2, 3]`
- `60 miles in 1.5 hours` -> `40 mph`

## Limitations

Wind-Edge-1.6-Instruct is small and can still make arithmetic, factual, and reasoning mistakes. It may overgeneralize from prompts, and it is best used with concise instructions and verification for important work.

## Citation

See `wind_edge_1_6_paper.html` in this repository for a short technical write-up of the build and tuning process.