Instructions to use Jershone/Echo-CodeEX with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- llama-cpp-python
How to use Jershone/Echo-CodeEX with llama-cpp-python:
# !pip install llama-cpp-python from llama_cpp import Llama llm = Llama.from_pretrained( repo_id="Jershone/Echo-CodeEX", filename="EchoAI-CodeEX.gguf", )
llm.create_chat_completion( messages = [ { "role": "user", "content": "What is the capital of France?" } ] ) - Notebooks
- Google Colab
- Kaggle
- Local Apps
- llama.cpp
How to use Jershone/Echo-CodeEX with llama.cpp:
Install from brew
brew install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf Jershone/Echo-CodeEX # Run inference directly in the terminal: llama-cli -hf Jershone/Echo-CodeEX
Install from WinGet (Windows)
winget install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf Jershone/Echo-CodeEX # Run inference directly in the terminal: llama-cli -hf Jershone/Echo-CodeEX
Use pre-built binary
# Download pre-built binary from: # https://github.com/ggerganov/llama.cpp/releases # Start a local OpenAI-compatible server with a web UI: ./llama-server -hf Jershone/Echo-CodeEX # Run inference directly in the terminal: ./llama-cli -hf Jershone/Echo-CodeEX
Build from source code
git clone https://github.com/ggerganov/llama.cpp.git cd llama.cpp cmake -B build cmake --build build -j --target llama-server llama-cli # Start a local OpenAI-compatible server with a web UI: ./build/bin/llama-server -hf Jershone/Echo-CodeEX # Run inference directly in the terminal: ./build/bin/llama-cli -hf Jershone/Echo-CodeEX
Use Docker
docker model run hf.co/Jershone/Echo-CodeEX
- LM Studio
- Jan
- vLLM
How to use Jershone/Echo-CodeEX with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "Jershone/Echo-CodeEX" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Jershone/Echo-CodeEX", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/Jershone/Echo-CodeEX
- Ollama
How to use Jershone/Echo-CodeEX with Ollama:
ollama run hf.co/Jershone/Echo-CodeEX
- Unsloth Studio new
How to use Jershone/Echo-CodeEX with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for Jershone/Echo-CodeEX to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for Jershone/Echo-CodeEX to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for Jershone/Echo-CodeEX to start chatting
- Pi new
How to use Jershone/Echo-CodeEX with Pi:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama-server -hf Jershone/Echo-CodeEX
Configure the model in Pi
# Install Pi: npm install -g @mariozechner/pi-coding-agent # Add to ~/.pi/agent/models.json: { "providers": { "llama-cpp": { "baseUrl": "http://localhost:8080/v1", "api": "openai-completions", "apiKey": "none", "models": [ { "id": "Jershone/Echo-CodeEX" } ] } } }Run Pi
# Start Pi in your project directory: pi
- Hermes Agent new
How to use Jershone/Echo-CodeEX with Hermes Agent:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama-server -hf Jershone/Echo-CodeEX
Configure Hermes
# Install Hermes: curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash hermes setup # Point Hermes at the local server: hermes config set model.provider custom hermes config set model.base_url http://127.0.0.1:8080/v1 hermes config set model.default Jershone/Echo-CodeEX
Run Hermes
hermes
- Docker Model Runner
How to use Jershone/Echo-CodeEX with Docker Model Runner:
docker model run hf.co/Jershone/Echo-CodeEX
- Lemonade
How to use Jershone/Echo-CodeEX with Lemonade:
Pull the model
# Download Lemonade from https://lemonade-server.ai/ lemonade pull Jershone/Echo-CodeEX
Run and chat with the model
lemonade run user.Echo-CodeEX-{{QUANT_TAG}}List all available models
lemonade list
Configure the model in Pi
# Install Pi:
npm install -g @mariozechner/pi-coding-agent# Add to ~/.pi/agent/models.json:
{
"providers": {
"llama-cpp": {
"baseUrl": "http://localhost:8080/v1",
"api": "openai-completions",
"apiKey": "none",
"models": [
{
"id": "Jershone/Echo-CodeEX"
}
]
}
}
}Run Pi
# Start Pi in your project directory:
piπ» Echo-CodeEX (0.5B Parameters - GGUF)
Echo-CodeEX is a specialized, edge-optimized 0.5B parameter variant engineered explicitly for offline programming assistance, code execution logic, and structured syntax manipulation. Built upon a fine-tuned Qwen-2.5-Instruct architecture and fully merged into a standalone GGUF binary, it balances lightning-fast syntax completion with low-resource hardware execution.
β¨ Key Features
- Syntax Grounded: Fine-tuned specifically to prioritize code construction, structural scripting loops, and algorithmic optimizations over open-ended narrative generation.
- Unified GGUF Engine: Zero dependencies on external floating adapter weights or complex Python multi-layer environments. Loadable instantly across standard local runtimes (
llama.cpp,node-llama-cpp,Ollama). - Fill-in-the-Middle (FIM) Ready: Inherits raw structural token patterns from the Qwen architecture, enabling seamless inline logic insertions and multi-line code predictions.
π§ Code Prompt Engineering Structure
To bypass open-ended conversational filler and force direct code output, structure your inputs strictly within the ChatML layout. Define the system parameters explicitly to receive clean code blocks:
<|im_start|>system
You are Echo-CodeEX, an expert code generation assistant. Respond only with structured code blocks and clean syntax commentaries.<|im_end|>
<|im_start|>user
Write a clean Python function to parse JSON strings safely.<|im_end|>
<|im_start|>assistant
π» Sample Implementation (Node.js)
You can spin this specialized model up locally inside your developer environment using node-llama-cpp:
import {LlamaModel, LlamaContext, LlamaSequence} from "node-llama-cpp";
import path from "path";
const model = new LlamaModel({
modelPath: path.join(__dirname, "echo-codeex.gguf")
});
const context = new LlamaContext({model});
const sequence = new LlamaSequence({context});
const prompt = `<|im_start|>system\nYou are Echo-CodeEX.<|im_end|>\n<|im_start|>user\nWrite a basic bash script to check if a file exists.\n<|im_end|>\n<|im_start|>assistant\n`;
const tokens = model.tokenize(prompt);
console.log("Generating script output...");
const response = await sequence.evaluate(tokens, {
temperature: 0.1 // Kept low to enforce syntax consistency over creativity
});
print(model.detokenize(response));
π License
This model's merged weights are distributed under the Apache 2.0 License, fully compliant with the core permissions and commercial deployment conditions set by the original Qwen development team.
- Downloads last month
- 108
We're not able to determine the quantization variants.
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp# Start a local OpenAI-compatible server: llama-server -hf Jershone/Echo-CodeEX