How to use from
llama.cpp
Install from brew
brew install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf Jershone/Echo-1
# Run inference directly in the terminal:
llama-cli -hf Jershone/Echo-1
Install from WinGet (Windows)
winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf Jershone/Echo-1
# Run inference directly in the terminal:
llama-cli -hf Jershone/Echo-1
Use pre-built binary
# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf Jershone/Echo-1
# Run inference directly in the terminal:
./llama-cli -hf Jershone/Echo-1
Build from source code
git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf Jershone/Echo-1
# Run inference directly in the terminal:
./build/bin/llama-cli -hf Jershone/Echo-1
Use Docker
docker model run hf.co/Jershone/Echo-1
Quick Links

🚀 Echo-1 (0.5B Parameters - GGUF)

Echo-1 is an edge-optimized 0.5B parameter model based on the robust Qwen-2.5-Instruct architecture. Fine-tuned via LoRA and fully merged into a single, high-performance GGUF binary, this model balances lightning-fast local inference speeds with advanced structural reasoning and context tracking.

✨ Key Features

  • Qwen-2.5 Foundation: Inherits deep tokenization efficiency, vastly improved instruction following, and stable textual coherence from the base architecture.
  • Merged Standalone Build: Zero dependencies on external adapter weights, multi-layer configs, or Python runtimes—loadable instantly in any standard GGUF engine (llama.cpp, node-llama-cpp, Ollama).
  • Ultra-Low Memory Footprint: Extremely lightweight structure makes it ideal for local-first computing environments, private automated utilities, and background processes on low-spec consumer systems.

🧠 Prompt Structure (ChatML Syntax)

Because Echo-1 is fine-tuned on top of Qwen-2.5-Instruct, it utilizes standard ChatML template markers. For precise structural alignment and to prevent response cutoff, construct your inputs exactly like this:

<|im_start|>system
You are Echo-1, a helpful assistant.<|im_end|>
<|im_start|>user
Write a short paragraph explaining the benefits of local-first AI.<|im_end|>
<|im_start|>assistant

💻 Sample Implementation (Node.js)

You can spin this model up locally using node-llama-cpp. Ensure you append the raw ChatML structural sequences directly to your execution queries:

import {LlamaModel, LlamaContext, LlamaSequence} from "node-llama-cpp";
import path from "path";

const model = new LlamaModel({
    modelPath: path.join(__dirname, "echo-1-0.5b.gguf")
});

const context = new LlamaContext({model});
const sequence = new LlamaSequence({context});

const prompt = `<|im_start|>system\nYou are Echo-1.<|im_end|>\n<|im_start|>user\nWhat is 15 + 27?<|im_end|>\n<|im_start|>assistant\n`;
const tokens = model.tokenize(prompt);

console.log("Echo-1 response:");
const response = await sequence.evaluate(tokens);
console.log(model.detokenize(response));

📄 License

This model's merged weights are distributed under the Apache 2.0 License, strictly adhering to the foundational terms and commercial/private usage permissions granted by the original Qwen team.

Downloads last month
133
GGUF
Model size
0.5B params
Architecture
qwen2
Hardware compatibility
Log In to add your hardware

We're not able to determine the quantization variants.

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Jershone/Echo-1

Quantized
(216)
this model