arthu1's picture
Update README.md
1f27e5d verified
---
license: apache-2.0
language:
- en
library_name: transformers
pipeline_tag: text-generation
tags:
- qwen3
- wind-edge
- custom-code
- edge-llm
- instruct
- distillation
base_model:
- North-ML1/Wind-Edge-1.6-Base
---
# Wind-Edge-1.6-Instruct
Wind-Edge-1.6-Instruct is a compact custom Qwen3-compatible assistant model for local and edge inference. It was built from a depth-pruned Wind-Edge base and tuned with a Claude-heavy public distillation SFT mix, code/math instruction data, and a final behavior polish pass.
This is a small model. It is intended for short answers, simple coding help, summaries, and lightweight local assistant use. It is not a replacement for large reasoning models.
## Recommended Usage
Use `trust_remote_code=True`; the custom loader re-applies tied weights from `model.safetensors`.
```python
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
repo = "arthu1/Wind-Edge-1.6-Instruct"
tokenizer = AutoTokenizer.from_pretrained(repo, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
repo,
trust_remote_code=True,
torch_dtype=torch.bfloat16,
device_map="auto",
)
messages = [{"role": "user", "content": "Who are you?"}]
prompt = tokenizer.apply_chat_template(
messages,
tokenize=False,
add_generation_prompt=True,
enable_thinking=False,
)
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
out = model.generate(
**inputs,
max_new_tokens=256,
do_sample=True,
temperature=0.6,
top_p=0.9,
repetition_penalty=1.06,
eos_token_id=[
tokenizer.eos_token_id,
tokenizer.convert_tokens_to_ids("<|im_end|>"),
],
)
print(tokenizer.decode(out[0][inputs.input_ids.shape[-1]:], skip_special_tokens=True))
```
## Suggested Settings
For chat:
- `enable_thinking=False`
- `temperature=0.55-0.7`
- `top_p=0.85-0.92`
- `repetition_penalty=1.05-1.08`
- `max_new_tokens=128-512`
For deterministic tests:
- `do_sample=False`
- `repetition_penalty=1.06`
- Keep prompts short and direct.
The bundled chat template injects a minimal default identity system message if no system message is supplied:
```text
You are Wind-Edge-1.6, a compact AI assistant model. You are not a human.
```
## Training Summary
- Source family: Qwen3-compatible Wind-Edge architecture
- Base: depth-pruned and healed Wind-Edge base from Qwen3-0.6B-compatible weights
- Final SFT:
- 12M tokens of no-thinking distillation SFT
- Claude-style public distillation data plus OpenOrca, OpenHermes, Open-Platypus, OpenCoder, and OpenMathInstruct
- Bad self-identity teacher rows filtered
- 6M-token system-template adaptation pass
- 2M-token local quality polish for identity, simple arithmetic, list sorting, and concise coding behavior
## Quick Sanity Outputs
Expected behavior after the final polish:
- `hi` -> short greeting as Wind-Edge-1.6
- `Who are you?` -> identifies as Wind-Edge-1.6, not human
- `sort this list: [3, 1, 2]` -> `[1, 2, 3]`
- `60 miles in 1.5 hours` -> `40 mph`
## Limitations
Wind-Edge-1.6-Instruct is small and can still make arithmetic, factual, and reasoning mistakes. It may overgeneralize from prompts, and it is best used with concise instructions and verification for important work.
## Citation
See `wind_edge_1_6_paper.html` in this repository for a short technical write-up of the build and tuning process.