Text Generation
Transformers
Safetensors
English
wind_edge
qwen3
wind-edge
custom-code
edge-llm
instruct
distillation
conversational
custom_code
Instructions to use North-ML1/Wind-Edge-1.6-Instruct with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use North-ML1/Wind-Edge-1.6-Instruct with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="North-ML1/Wind-Edge-1.6-Instruct", trust_remote_code=True) messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoModelForCausalLM model = AutoModelForCausalLM.from_pretrained("North-ML1/Wind-Edge-1.6-Instruct", trust_remote_code=True, dtype="auto") - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use North-ML1/Wind-Edge-1.6-Instruct with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "North-ML1/Wind-Edge-1.6-Instruct" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "North-ML1/Wind-Edge-1.6-Instruct", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/North-ML1/Wind-Edge-1.6-Instruct
- SGLang
How to use North-ML1/Wind-Edge-1.6-Instruct with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "North-ML1/Wind-Edge-1.6-Instruct" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "North-ML1/Wind-Edge-1.6-Instruct", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "North-ML1/Wind-Edge-1.6-Instruct" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "North-ML1/Wind-Edge-1.6-Instruct", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use North-ML1/Wind-Edge-1.6-Instruct with Docker Model Runner:
docker model run hf.co/North-ML1/Wind-Edge-1.6-Instruct
| license: apache-2.0 | |
| language: | |
| - en | |
| library_name: transformers | |
| pipeline_tag: text-generation | |
| tags: | |
| - qwen3 | |
| - wind-edge | |
| - custom-code | |
| - edge-llm | |
| - instruct | |
| - distillation | |
| base_model: | |
| - North-ML1/Wind-Edge-1.6-Base | |
| # Wind-Edge-1.6-Instruct | |
| Wind-Edge-1.6-Instruct is a compact custom Qwen3-compatible assistant model for local and edge inference. It was built from a depth-pruned Wind-Edge base and tuned with a Claude-heavy public distillation SFT mix, code/math instruction data, and a final behavior polish pass. | |
| This is a small model. It is intended for short answers, simple coding help, summaries, and lightweight local assistant use. It is not a replacement for large reasoning models. | |
| ## Recommended Usage | |
| Use `trust_remote_code=True`; the custom loader re-applies tied weights from `model.safetensors`. | |
| ```python | |
| import torch | |
| from transformers import AutoModelForCausalLM, AutoTokenizer | |
| repo = "arthu1/Wind-Edge-1.6-Instruct" | |
| tokenizer = AutoTokenizer.from_pretrained(repo, trust_remote_code=True) | |
| model = AutoModelForCausalLM.from_pretrained( | |
| repo, | |
| trust_remote_code=True, | |
| torch_dtype=torch.bfloat16, | |
| device_map="auto", | |
| ) | |
| messages = [{"role": "user", "content": "Who are you?"}] | |
| prompt = tokenizer.apply_chat_template( | |
| messages, | |
| tokenize=False, | |
| add_generation_prompt=True, | |
| enable_thinking=False, | |
| ) | |
| inputs = tokenizer(prompt, return_tensors="pt").to(model.device) | |
| out = model.generate( | |
| **inputs, | |
| max_new_tokens=256, | |
| do_sample=True, | |
| temperature=0.6, | |
| top_p=0.9, | |
| repetition_penalty=1.06, | |
| eos_token_id=[ | |
| tokenizer.eos_token_id, | |
| tokenizer.convert_tokens_to_ids("<|im_end|>"), | |
| ], | |
| ) | |
| print(tokenizer.decode(out[0][inputs.input_ids.shape[-1]:], skip_special_tokens=True)) | |
| ``` | |
| ## Suggested Settings | |
| For chat: | |
| - `enable_thinking=False` | |
| - `temperature=0.55-0.7` | |
| - `top_p=0.85-0.92` | |
| - `repetition_penalty=1.05-1.08` | |
| - `max_new_tokens=128-512` | |
| For deterministic tests: | |
| - `do_sample=False` | |
| - `repetition_penalty=1.06` | |
| - Keep prompts short and direct. | |
| The bundled chat template injects a minimal default identity system message if no system message is supplied: | |
| ```text | |
| You are Wind-Edge-1.6, a compact AI assistant model. You are not a human. | |
| ``` | |
| ## Training Summary | |
| - Source family: Qwen3-compatible Wind-Edge architecture | |
| - Base: depth-pruned and healed Wind-Edge base from Qwen3-0.6B-compatible weights | |
| - Final SFT: | |
| - 12M tokens of no-thinking distillation SFT | |
| - Claude-style public distillation data plus OpenOrca, OpenHermes, Open-Platypus, OpenCoder, and OpenMathInstruct | |
| - Bad self-identity teacher rows filtered | |
| - 6M-token system-template adaptation pass | |
| - 2M-token local quality polish for identity, simple arithmetic, list sorting, and concise coding behavior | |
| ## Quick Sanity Outputs | |
| Expected behavior after the final polish: | |
| - `hi` -> short greeting as Wind-Edge-1.6 | |
| - `Who are you?` -> identifies as Wind-Edge-1.6, not human | |
| - `sort this list: [3, 1, 2]` -> `[1, 2, 3]` | |
| - `60 miles in 1.5 hours` -> `40 mph` | |
| ## Limitations | |
| Wind-Edge-1.6-Instruct is small and can still make arithmetic, factual, and reasoning mistakes. It may overgeneralize from prompts, and it is best used with concise instructions and verification for important work. | |
| ## Citation | |
| See `wind_edge_1_6_paper.html` in this repository for a short technical write-up of the build and tuning process. |