| --- |
| license: mit |
| language: |
| - en |
| tags: |
| - llama |
| - pytorch |
| - causal-lm |
| - gguf |
| - north-ml |
| - forge |
| --- |
| |
| ## Forge 1 Mini |
|
|
| Forge 1 Mini is a tiny Forge-series chat model. It is intended for basic chat, simple completions, rewriting, classification, routing, and short direct answers. |
|
|
| This repo includes: |
|
|
| - `model.safetensors`: corrected Hugging Face checkpoint. |
| - `tokenizer.model`: SentencePiece tokenizer with ChatML markers. |
| - `forge-1-mini-f16.gguf`: llama.cpp-compatible F16 GGUF. |
|
|
| ### llama.cpp / llama-cpp-python |
|
|
| Use the embedded ChatML template and stop on `<|im_end|>`. |
|
|
| ```python |
| from llama_cpp import Llama |
| |
| llm = Llama(model_path="forge-1-mini-f16.gguf", n_ctx=512) |
| out = llm.create_chat_completion( |
| messages=[{"role": "user", "content": "What is 2 + 2?"}], |
| max_tokens=96, |
| temperature=0.0, |
| stop=["<|im_end|>"], |
| ) |
| print(out["choices"][0]["message"]["content"].strip()) |
| ``` |
|
|
| Expected answer: |
|
|
| ```text |
| 4 |
| ``` |
|
|
| ### Local Verification |
|
|
| The uploaded GGUF passed a llama.cpp smoke test using llama-cpp-python tokenization and greedy sampling: |
|
|
| ```text |
| Who are you? -> I am Forge-1-Mini, a tiny local assistant created by Arthur / North ML. |
| Hi -> Hi! I am Forge-1-Mini. How can I help? |
| What is 2 + 2? -> 4 |
| Write a Python function that adds two numbers. -> def add(a, b): return a + b |
| Who is Jesus? -> Christians believe Jesus Christ is the eternal Son of God... |
| How should I treat someone I disagree with? -> Treat the person with dignity... |
| ``` |
|
|
| ## Model Family Notes |
|
|
| | Model | Parameters | Hosting | Estimated Cost per 1M Output Tokens | Ability | |
| |---|---:|---|---:|---| |
| | **Forge 1 Mini** | 5.2M | Open-source, can host anywhere. | **$0.01-$0.05** | Basic chat, simple completions, rewriting, classification, routing, and short direct answers | |
| | **Forge 1** | ~40M | Open-source, can host anywhere. | **$0.10-$0.30** | Better conversational ability, basic coding, structured responses, simple reasoning, and tool routing | |
| | **Forge 1 Reasoning** | ~40M | Hosted on North servers, proprietary. | **$0.20-$1.00** | Reasoning-tuned checkpoint with planning, self-checking, multiple-pass generation, and priority processing | |
| | **Forge 1 Ultra** | ~150M | Hosted on North servers, proprietary. | **$0.15-$0.80** | Strongest native Forge model; better coding, instruction following, longer responses, tool use, and software-engineering tasks | |
|
|