Instructions to use North-ML1/willow-alpha-gguf with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Notebooks
Google Colab
Kaggle
Local Apps Settings

How to use North-ML1/willow-alpha-gguf with llama.cpp:

Install (macOS, Linux)

curl -LsSf https://llama.app/install.sh | sh
# Start a local OpenAI-compatible server with a web UI:
llama serve -hf North-ML1/willow-alpha-gguf:F16
# Run inference directly in the terminal:
llama cli -hf North-ML1/willow-alpha-gguf:F16

Install from WinGet (Windows)

winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama serve -hf North-ML1/willow-alpha-gguf:F16
# Run inference directly in the terminal:
llama cli -hf North-ML1/willow-alpha-gguf:F16

Use pre-built binary

# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf North-ML1/willow-alpha-gguf:F16
# Run inference directly in the terminal:
./llama-cli -hf North-ML1/willow-alpha-gguf:F16

Build from source code

git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf North-ML1/willow-alpha-gguf:F16
# Run inference directly in the terminal:
./build/bin/llama-cli -hf North-ML1/willow-alpha-gguf:F16

Use Docker

docker model run hf.co/North-ML1/willow-alpha-gguf:F16

LM Studio
Jan

vLLM

How to use North-ML1/willow-alpha-gguf with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "North-ML1/willow-alpha-gguf"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "North-ML1/willow-alpha-gguf",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/North-ML1/willow-alpha-gguf:F16

Ollama
How to use North-ML1/willow-alpha-gguf with Ollama:
```
ollama run hf.co/North-ML1/willow-alpha-gguf:F16
```

Unsloth Studio

How to use North-ML1/willow-alpha-gguf with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for North-ML1/willow-alpha-gguf to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for North-ML1/willow-alpha-gguf to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for North-ML1/willow-alpha-gguf to start chatting

Atomic Chat new
Docker Model Runner
How to use North-ML1/willow-alpha-gguf with Docker Model Runner:
```
docker model run hf.co/North-ML1/willow-alpha-gguf:F16
```

Lemonade

How to use North-ML1/willow-alpha-gguf with Lemonade:

Pull the model

# Download Lemonade from https://lemonade-server.ai/
lemonade pull North-ML1/willow-alpha-gguf:F16

Run and chat with the model

lemonade run user.willow-alpha-gguf-F16

List all available models

lemonade list

Willow Alpha

An early-stage version of Forge-1V

Small language model research by North ML.

Overview

Willow Alpha is an early-stage base model checkpoint in the Forge-1V model line.

This model is currently experimental and should be treated as a research checkpoint rather than a polished assistant model. It is useful for testing architecture, pretraining quality, tokenizer behavior, evaluation pipelines, and future SFT/RLHF improvements.

Model Details

Field	Value
Model name	Willow Alpha
Project	Forge-1V
Organization	North ML
Model type	Causal Language Model
Language	English
License	MIT
Status	Early-stage / Alpha

Evaluation Results

All benchmarks below were run in 0-shot mode.

Benchmark	Metric	Score	Runtime
HellaSwag	acc_norm	26.71%	318.67s
PIQA	acc_norm	53.86%	38.85s
WinoGrande	acc	50.67%	23.73s
BoolQ	acc	40.21%	144.80s
ARC-Easy	acc_norm	34.68%	51.41s
ARC-Challenge	acc_norm	25.60%	37.69s
OpenBookQA	acc_norm	25.00%	21.14s
CommonsenseQA	acc	20.31%	27.66s
LAMBADA	acc	0.23%	96.28s
BLiMP	acc	59.23%	354.79s
MMLU	acc	23.89%	388.62s
WikiText-2	word_perplexity	12524.42	182.89s
WikiText-2	byte_perplexity	5.84	181.42s
SciQ	acc_norm	35.60%	87.15s
COPA	acc	64.00%	17.21s
RACE	acc	23.16%	334.70s
SWAG	acc_norm	29.13%	252.00s
TruthfulQA MC2	acc	48.74%	126.29s

Evaluation Summary

Category	Result
Number of completed benchmark runs	18
Successful runs	18
Failed runs	0
Best accuracy-style score	COPA — 64.00%
Best language-structure score	BLiMP — 59.23%
MMLU score	23.89%
WikiText-2 byte perplexity	5.84
WikiText-2 word perplexity	12524.42

Notes

Willow Alpha is still in a very early stage. Some results are near-random or unstable, especially on knowledge-heavy and long-context tasks.

The strongest early signals are:

COPA: 64.00%
BLiMP: 59.23%
PIQA: 53.86%
WinoGrande: 50.67%
TruthfulQA MC2: 48.74%

The weakest areas are:

LAMBADA
WikiText-2 word perplexity
CommonsenseQA
MMLU
RACE

These results suggest the model has some early reasoning and grammar signal, but still needs substantially more pretraining, higher-quality data, and post-training before being useful as a general assistant.

Intended Use

Willow Alpha is intended for:

Research
Benchmarking
Pretraining experiments
Fine-tuning experiments
Small language model development
Forge-1V pipeline testing

It is not yet recommended for production use.

Limitations

This model may:

Produce incorrect information
Fail basic reasoning tasks
Struggle with factual knowledge
Generate repetitive or low-quality text
Perform poorly on long-context tasks
Require additional supervised fine-tuning

Citation

@misc{willow-alpha,
  title = {Willow Alpha},
  author = {North ML},
  year = {2026},
  note = {Early-stage Forge-1V checkpoint}
}

Downloads last month: 81

GGUF

Model size

0.3B params

Architecture

llama

Hardware compatibility

4-bit

16-bit