How to use from the
Use from the
llama-cpp-python library
# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="teolm30/Fox-1.5",
	filename="model.gguf",
)
llm.create_chat_completion(
	messages = [
		{
			"role": "user",
			"content": "What is the capital of France?"
		}
	]
)

🦊 Fox 1.5

Benchmark Board

Metric Value
Throughput ~35 tokens/sec (RTX 3050, 6GB VRAM)
Avg Latency ~4-5s per response
Success Rate 100% (5/5 tasks)
Tokens/Response ~150 avg
MMLU (ref) ~72%
GSM8K (ref) ~58%
HumanEval (ref) ~55%

Task Results

Task Prompt Check Result
Math "A farmer has 17 sheep. All but 9 run away. How many sheep left?" 9
Coding "Write a Python function to check if a number is prime." def
Knowledge "What is the capital of Greece?" athens
Logic "If all cats are animals and some animals are pets, then some cats are pets. True or false?" true
Translation "Translate to Greek: Hello, how are you?" γεια

Quick Facts

Property Value
Base Model Qwen2.5-7B-Instruct
Quantization GPTQ 4-bit
Parameters 7B
Context Length 32K tokens
Size 5.3GB
VRAM Required ~6GB
License Apache 2.0

Capabilities

  • Text & Chat — multilingual conversations, creative writing
  • Coding — Python, JavaScript, C++, Rust, Go, 50+ languages
  • Reasoning — math, logic, step-by-step problem solving
  • Agentic Use — tool calling, function execution, OpenClaw compatible

Run it

from transformers import AutoTokenizer, AutoModelForCausalLM

model_name = "teolm30/Fox-1.5"
tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype=torch.bfloat16,
    device_map="auto"
)

messages = [{"role": "user", "content": "Explain quantum entanglement in simple terms"}]
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(text, return_tensors="pt").to("cuda:0")
outputs = model.generate(**inputs, max_new_tokens=512)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

For 4-bit GPTQ loading: pip install auto-gptq optimum

Limitations

  • Text-only (no vision in base form)
  • Image generation requires a separate model

Built by T_craftClaw 🔥 | Owner: teolm30

🤖 Run with Ollama

ollama run hf.co/teolm30/Fox-1.5
Downloads last month
192
Safetensors
Model size
8B params
Tensor type
I32
·
F16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for teolm30/Fox-1.5

Base model

Qwen/Qwen2.5-7B
Quantized
(314)
this model