How to use from the
Use from the
llama-cpp-python library
# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="Jershone/Echo-1",
	filename="Echo-1.gguf",
)
llm.create_chat_completion(
	messages = [
		{
			"role": "user",
			"content": "What is the capital of France?"
		}
	]
)

🚀 Echo-1 (0.5B Parameters - GGUF)

Echo-1 is an edge-optimized 0.5B parameter model based on the robust Qwen-2.5-Instruct architecture. Fine-tuned via LoRA and fully merged into a single, high-performance GGUF binary, this model balances lightning-fast local inference speeds with advanced structural reasoning and context tracking.

✨ Key Features

  • Qwen-2.5 Foundation: Inherits deep tokenization efficiency, vastly improved instruction following, and stable textual coherence from the base architecture.
  • Merged Standalone Build: Zero dependencies on external adapter weights, multi-layer configs, or Python runtimes—loadable instantly in any standard GGUF engine (llama.cpp, node-llama-cpp, Ollama).
  • Ultra-Low Memory Footprint: Extremely lightweight structure makes it ideal for local-first computing environments, private automated utilities, and background processes on low-spec consumer systems.

🧠 Prompt Structure (ChatML Syntax)

Because Echo-1 is fine-tuned on top of Qwen-2.5-Instruct, it utilizes standard ChatML template markers. For precise structural alignment and to prevent response cutoff, construct your inputs exactly like this:

<|im_start|>system
You are Echo-1, a helpful assistant.<|im_end|>
<|im_start|>user
Write a short paragraph explaining the benefits of local-first AI.<|im_end|>
<|im_start|>assistant

💻 Sample Implementation (Node.js)

You can spin this model up locally using node-llama-cpp. Ensure you append the raw ChatML structural sequences directly to your execution queries:

import {LlamaModel, LlamaContext, LlamaSequence} from "node-llama-cpp";
import path from "path";

const model = new LlamaModel({
    modelPath: path.join(__dirname, "echo-1-0.5b.gguf")
});

const context = new LlamaContext({model});
const sequence = new LlamaSequence({context});

const prompt = `<|im_start|>system\nYou are Echo-1.<|im_end|>\n<|im_start|>user\nWhat is 15 + 27?<|im_end|>\n<|im_start|>assistant\n`;
const tokens = model.tokenize(prompt);

console.log("Echo-1 response:");
const response = await sequence.evaluate(tokens);
console.log(model.detokenize(response));

📄 License

This model's merged weights are distributed under the Apache 2.0 License, strictly adhering to the foundational terms and commercial/private usage permissions granted by the original Qwen team.

Downloads last month
114
GGUF
Model size
0.5B params
Architecture
qwen2
Hardware compatibility
Log In to add your hardware

We're not able to determine the quantization variants.

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Jershone/Echo-1

Quantized
(213)
this model