How to use from
llama.cpp
Install from brew
brew install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf teolm30/Fox-1.5
# Run inference directly in the terminal:
llama-cli -hf teolm30/Fox-1.5
Install from WinGet (Windows)
winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf teolm30/Fox-1.5
# Run inference directly in the terminal:
llama-cli -hf teolm30/Fox-1.5
Use pre-built binary
# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf teolm30/Fox-1.5
# Run inference directly in the terminal:
./llama-cli -hf teolm30/Fox-1.5
Build from source code
git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf teolm30/Fox-1.5
# Run inference directly in the terminal:
./build/bin/llama-cli -hf teolm30/Fox-1.5
Use Docker
docker model run hf.co/teolm30/Fox-1.5
Quick Links

🦊 Fox 1.5

Benchmark Board

Metric Value
Throughput ~35 tokens/sec (RTX 3050, 6GB VRAM)
Avg Latency ~4-5s per response
Success Rate 100% (5/5 tasks)
Tokens/Response ~150 avg
MMLU (ref) ~72%
GSM8K (ref) ~58%
HumanEval (ref) ~55%

Task Results

Task Prompt Check Result
Math "A farmer has 17 sheep. All but 9 run away. How many sheep left?" 9
Coding "Write a Python function to check if a number is prime." def
Knowledge "What is the capital of Greece?" athens
Logic "If all cats are animals and some animals are pets, then some cats are pets. True or false?" true
Translation "Translate to Greek: Hello, how are you?" γεια

Quick Facts

Property Value
Base Model Qwen2.5-7B-Instruct
Quantization GPTQ 4-bit
Parameters 7B
Context Length 32K tokens
Size 5.3GB
VRAM Required ~6GB
License Apache 2.0

Capabilities

  • Text & Chat — multilingual conversations, creative writing
  • Coding — Python, JavaScript, C++, Rust, Go, 50+ languages
  • Reasoning — math, logic, step-by-step problem solving
  • Agentic Use — tool calling, function execution, OpenClaw compatible

Run it

from transformers import AutoTokenizer, AutoModelForCausalLM

model_name = "teolm30/Fox-1.5"
tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype=torch.bfloat16,
    device_map="auto"
)

messages = [{"role": "user", "content": "Explain quantum entanglement in simple terms"}]
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(text, return_tensors="pt").to("cuda:0")
outputs = model.generate(**inputs, max_new_tokens=512)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

For 4-bit GPTQ loading: pip install auto-gptq optimum

Limitations

  • Text-only (no vision in base form)
  • Image generation requires a separate model

Built by T_craftClaw 🔥 | Owner: teolm30

🤖 Run with Ollama

ollama run hf.co/teolm30/Fox-1.5
Downloads last month
200
Safetensors
Model size
8B params
Tensor type
I32
·
F16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for teolm30/Fox-1.5

Base model

Qwen/Qwen2.5-7B
Quantized
(314)
this model