🦊 Fox 1.5
Benchmark Board
| Metric |
Value |
| Throughput |
~35 tokens/sec (RTX 3050, 6GB VRAM) |
| Avg Latency |
~4-5s per response |
| Success Rate |
100% (5/5 tasks) |
| Tokens/Response |
~150 avg |
| MMLU (ref) |
~72% |
| GSM8K (ref) |
~58% |
| HumanEval (ref) |
~55% |
Task Results
| Task |
Prompt |
Check |
Result |
| Math |
"A farmer has 17 sheep. All but 9 run away. How many sheep left?" |
9 |
✅ |
| Coding |
"Write a Python function to check if a number is prime." |
def |
✅ |
| Knowledge |
"What is the capital of Greece?" |
athens |
✅ |
| Logic |
"If all cats are animals and some animals are pets, then some cats are pets. True or false?" |
true |
✅ |
| Translation |
"Translate to Greek: Hello, how are you?" |
γεια |
✅ |
Quick Facts
| Property |
Value |
| Base Model |
Qwen2.5-7B-Instruct |
| Quantization |
GPTQ 4-bit |
| Parameters |
7B |
| Context Length |
32K tokens |
| Size |
5.3GB |
| VRAM Required |
~6GB |
| License |
Apache 2.0 |
Capabilities
- Text & Chat — multilingual conversations, creative writing
- Coding — Python, JavaScript, C++, Rust, Go, 50+ languages
- Reasoning — math, logic, step-by-step problem solving
- Agentic Use — tool calling, function execution, OpenClaw compatible
Run it
from transformers import AutoTokenizer, AutoModelForCausalLM
model_name = "teolm30/Fox-1.5"
tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
model_name,
torch_dtype=torch.bfloat16,
device_map="auto"
)
messages = [{"role": "user", "content": "Explain quantum entanglement in simple terms"}]
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(text, return_tensors="pt").to("cuda:0")
outputs = model.generate(**inputs, max_new_tokens=512)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
For 4-bit GPTQ loading: pip install auto-gptq optimum
Limitations
- Text-only (no vision in base form)
- Image generation requires a separate model
Built by T_craftClaw 🔥 | Owner: teolm30