Instructions to use North-ML1/willow-alpha-gguf with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- llama-cpp-python
How to use North-ML1/willow-alpha-gguf with llama-cpp-python:
# !pip install llama-cpp-python from llama_cpp import Llama llm = Llama.from_pretrained( repo_id="North-ML1/willow-alpha-gguf", filename="forge-1v-f16.gguf", )
llm.create_chat_completion( messages = [ { "role": "user", "content": "What is the capital of France?" } ] ) - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- llama.cpp
How to use North-ML1/willow-alpha-gguf with llama.cpp:
Install from brew
brew install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf North-ML1/willow-alpha-gguf:F16 # Run inference directly in the terminal: llama-cli -hf North-ML1/willow-alpha-gguf:F16
Install from WinGet (Windows)
winget install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf North-ML1/willow-alpha-gguf:F16 # Run inference directly in the terminal: llama-cli -hf North-ML1/willow-alpha-gguf:F16
Use pre-built binary
# Download pre-built binary from: # https://github.com/ggerganov/llama.cpp/releases # Start a local OpenAI-compatible server with a web UI: ./llama-server -hf North-ML1/willow-alpha-gguf:F16 # Run inference directly in the terminal: ./llama-cli -hf North-ML1/willow-alpha-gguf:F16
Build from source code
git clone https://github.com/ggerganov/llama.cpp.git cd llama.cpp cmake -B build cmake --build build -j --target llama-server llama-cli # Start a local OpenAI-compatible server with a web UI: ./build/bin/llama-server -hf North-ML1/willow-alpha-gguf:F16 # Run inference directly in the terminal: ./build/bin/llama-cli -hf North-ML1/willow-alpha-gguf:F16
Use Docker
docker model run hf.co/North-ML1/willow-alpha-gguf:F16
- LM Studio
- Jan
- vLLM
How to use North-ML1/willow-alpha-gguf with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "North-ML1/willow-alpha-gguf" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "North-ML1/willow-alpha-gguf", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/North-ML1/willow-alpha-gguf:F16
- Ollama
How to use North-ML1/willow-alpha-gguf with Ollama:
ollama run hf.co/North-ML1/willow-alpha-gguf:F16
- Unsloth Studio
How to use North-ML1/willow-alpha-gguf with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for North-ML1/willow-alpha-gguf to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for North-ML1/willow-alpha-gguf to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for North-ML1/willow-alpha-gguf to start chatting
- Atomic Chat new
- Docker Model Runner
How to use North-ML1/willow-alpha-gguf with Docker Model Runner:
docker model run hf.co/North-ML1/willow-alpha-gguf:F16
- Lemonade
How to use North-ML1/willow-alpha-gguf with Lemonade:
Pull the model
# Download Lemonade from https://lemonade-server.ai/ lemonade pull North-ML1/willow-alpha-gguf:F16
Run and chat with the model
lemonade run user.willow-alpha-gguf-F16
List all available models
lemonade list
Willow Alpha
An early-stage version of Forge-1V
Small language model research by North ML.
Overview
Willow Alpha is an early-stage base model checkpoint in the Forge-1V model line.
This model is currently experimental and should be treated as a research checkpoint rather than a polished assistant model. It is useful for testing architecture, pretraining quality, tokenizer behavior, evaluation pipelines, and future SFT/RLHF improvements.
Model Details
| Field | Value |
|---|---|
| Model name | Willow Alpha |
| Project | Forge-1V |
| Organization | North ML |
| Model type | Causal Language Model |
| Language | English |
| License | MIT |
| Status | Early-stage / Alpha |
Evaluation Results
All benchmarks below were run in 0-shot mode.
| Benchmark | Metric | Score | Runtime |
|---|---|---|---|
| HellaSwag | acc_norm | 26.71% | 318.67s |
| PIQA | acc_norm | 53.86% | 38.85s |
| WinoGrande | acc | 50.67% | 23.73s |
| BoolQ | acc | 40.21% | 144.80s |
| ARC-Easy | acc_norm | 34.68% | 51.41s |
| ARC-Challenge | acc_norm | 25.60% | 37.69s |
| OpenBookQA | acc_norm | 25.00% | 21.14s |
| CommonsenseQA | acc | 20.31% | 27.66s |
| LAMBADA | acc | 0.23% | 96.28s |
| BLiMP | acc | 59.23% | 354.79s |
| MMLU | acc | 23.89% | 388.62s |
| WikiText-2 | word_perplexity | 12524.42 | 182.89s |
| WikiText-2 | byte_perplexity | 5.84 | 181.42s |
| SciQ | acc_norm | 35.60% | 87.15s |
| COPA | acc | 64.00% | 17.21s |
| RACE | acc | 23.16% | 334.70s |
| SWAG | acc_norm | 29.13% | 252.00s |
| TruthfulQA MC2 | acc | 48.74% | 126.29s |
Evaluation Summary
| Category | Result |
|---|---|
| Number of completed benchmark runs | 18 |
| Successful runs | 18 |
| Failed runs | 0 |
| Best accuracy-style score | COPA โ 64.00% |
| Best language-structure score | BLiMP โ 59.23% |
| MMLU score | 23.89% |
| WikiText-2 byte perplexity | 5.84 |
| WikiText-2 word perplexity | 12524.42 |
Notes
Willow Alpha is still in a very early stage. Some results are near-random or unstable, especially on knowledge-heavy and long-context tasks.
The strongest early signals are:
- COPA: 64.00%
- BLiMP: 59.23%
- PIQA: 53.86%
- WinoGrande: 50.67%
- TruthfulQA MC2: 48.74%
The weakest areas are:
- LAMBADA
- WikiText-2 word perplexity
- CommonsenseQA
- MMLU
- RACE
These results suggest the model has some early reasoning and grammar signal, but still needs substantially more pretraining, higher-quality data, and post-training before being useful as a general assistant.
Intended Use
Willow Alpha is intended for:
- Research
- Benchmarking
- Pretraining experiments
- Fine-tuning experiments
- Small language model development
- Forge-1V pipeline testing
It is not yet recommended for production use.
Limitations
This model may:
- Produce incorrect information
- Fail basic reasoning tasks
- Struggle with factual knowledge
- Generate repetitive or low-quality text
- Perform poorly on long-context tasks
- Require additional supervised fine-tuning
Citation
@misc{willow-alpha,
title = {Willow Alpha},
author = {North ML},
year = {2026},
note = {Early-stage Forge-1V checkpoint}
}
- Downloads last month
- 389
4-bit
16-bit