How to use from the
Use from the
llama-cpp-python library
# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="North-ML1/Forge-1-Mini-GGUF",
	filename="",
)
llm.create_chat_completion(
	messages = "No input example has been defined for this model task."
)

Forge 1 Mini GGUF

GGUF builds for North-ML1/Forge-1-Mini.

Use the embedded ChatML template and stop on <|im_end|>.

Modern llama.cpp conversation mode:

llama-cli -m Forge-1-Mini-Q4_K_M.gguf -cnv -p "Hi" -st -n 64 --temp 0

The GGUF metadata explicitly sets tokenizer.ggml.eot_token_id=5, where token 5 is <|im_end|>.

from llama_cpp import Llama

llm = Llama(model_path="Forge-1-Mini-Q4_K_M.gguf", n_ctx=512)
out = llm.create_chat_completion(
    messages=[{"role": "user", "content": "What is 2 + 2?"}],
    max_tokens=96,
    temperature=0.0,
    stop=["<|im_end|>"],
)
print(out["choices"][0]["message"]["content"].strip())

Expected output:

4

Files

File Quantization Size
Forge-1-Mini-F16.gguf F16 9.9 MB
Forge-1-Mini-Q8_0.gguf Q8_0 5.3 MB
Forge-1-Mini-Q4_K_M.gguf Q4_K_M 3.8 MB
Forge-1-Mini-Q3_K_M.gguf Q3_K_M 3.1 MB
Forge-1-Mini-Q2_K.gguf Q2_K 2.9 MB
Forge-1-Mini-TQ1_0.gguf TQ1_0 2.9 MB

Verification

All listed GGUF files were generated with llama.cpp llama-quantize and passed a llama-cpp-python smoke test using llama.cpp tokenization and greedy sampling:

Who are you? -> I am Forge-1-Mini, a tiny local assistant created by Arthur / North ML.
Hi -> Hi! I am Forge-1-Mini. How can I help?
What is 2 + 2? -> 4
Write a Python function that adds two numbers. -> def add(a, b): return a + b
Who is Jesus? -> Christians believe Jesus Christ is the eternal Son of God...
How should I treat someone I disagree with? -> Treat the person with dignity...

Note: this model has a 192-wide hidden dimension. Some K-quant and TQ tensors fall back to compatible GGML tensor types because those formats require 256-column divisibility. The files are valid GGUF outputs from llama.cpp and were tested after quantization.

Downloads last month
237
GGUF
Model size
5.19M params
Architecture
llama
Hardware compatibility
Log In to add your hardware

1-bit

2-bit

3-bit

4-bit

8-bit

16-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for North-ML1/Forge-1-Mini-GGUF

Quantized
(1)
this model

Collection including North-ML1/Forge-1-Mini-GGUF