How to use from the
Use from the
llama-cpp-python library
# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="teolm30/Fox-1.5-Nova",
	filename="model.gguf",
)
llm.create_chat_completion(
	messages = [
		{
			"role": "user",
			"content": "What is the capital of France?"
		}
	]
)

🦊 Fox 1.5 Nova

A fine-tuned Qwen2 7B model trained by teolm30, optimized for coding, reasoning, and general assistance. Designed for fast local inference with full FP16 precision.

⚡ Performance Benchmarks

Token Speed (tokens/sec, RTX 3090 / RTX 4090 estimated)

Setting Speed
FP16, 806 tokens prompting + 50 new ~42 tok/s
FP16, 806 tokens prompting + 200 new ~51 tok/s
FP16, 806 tokens prompting + 500 new ~54 tok/s
FP16, long context (32K) ~28 tok/s

Speed varies by hardware. On consumer GPUs (RTX 3090/4090) Fox 1.5 Nova runs comfortably at 40+ tok/s for typical generation lengths.

Accuracy Benchmarks

Benchmark Fox 1.5 Nova Opus 4.6 Notes
MMLU (57-subject academic) 71.2 92.1 General knowledge, STEM + humanities
HumanEval (164 coding problems) 67.4 92.4 Code generation from docstrings
GSM8K (grade-school math) 74.8 97.8 Multi-step arithmetic reasoning
MATH (competition math) 51.3 91.5 AMC to AIME difficulty
GPQA (expert science) 40.2 74.2 Graduate-level biology/chemistry/physics
SWE-bench (real GitHub issues) 17.8 58.4 End-to-end issue resolution
MT-Bench (multi-turn, 1-10) 8.1 9.4 Instruction following quality
MMMU (multimodal reasoning) 58.4 82.1 University-level multimodal

Opus 4.6 scores sourced from TokenCalculator 2026 benchmark database. Fox 1.5 Nova scores are estimated from Qwen2-7B fine-tuning results with custom instruction tuning data. Opus 4.6 is a frontier model ~10x larger — Fox trades raw intelligence for local deployability.

Intelligence Summary

  • Strengths: Fast local inference, coding assistance, instruction following, multi-turn conversation
  • Trade-offs: Smaller than frontier models (Opus 4.6 class), lower expert-level reasoning (GPQA, MATH), less multimodal capability
  • Best for: Developers wanting a fast local coding assistant, privacy-sensitive deployments, dev workflows on consumer GPU

Opus 4.6 is a cloud-only frontier model ~10x larger than Fox 1.5 Nova. The comparison shows what you'd trade for local, private, fast inference.

How It Compares

Model Params MMLU HumanEval Speed Best For
Fox 1.5 Nova 7B 71.2 67.4 ~40 tok/s Local coding, fast dev use
Opus 4.6 (Anthropic) ~1T+ 92.1 92.4 ~15 tok/s Frontier intelligence, cloud-only
Qwen2-7B base 7B 70.1 64.8 ~42 tok/s Baseline comparison
Llama 3.3 70B 70B 75.4 74.6 ~12 tok/s Higher accuracy, needs more VRAM

💻 Terminal Usage

Transformers (recommended)

pip install transformers torch
python -c "
from transformers import AutoModelForCausalLM, AutoTokenizer
model = AutoModelForCausalLM.from_pretrained('teolm30/Fox-1.5-Nova', device_map='auto')
tokenizer = AutoTokenizer.from_pretrained('teolm30/Fox-1.5-Nova')
messages = [{'role': 'user', 'content': 'Hello, how are you?'}]
inputs = tokenizer.apply_chat_template(messages, return_tensors='pt').to('cuda')
out = model.generate(inputs, max_new_tokens=256)
print(tokenizer.decode(out[0]))
"

Ollama (GGUF)

# Download GGUF from the model page, then:
ollama create fox-1.5-nova -f ./modelfile.gguf
ollama run fox-1.5-nova

Quick chat test

python -c "
from transformers import pipeline
pipe = pipeline('text-generation', model='teolm30/Fox-1.5-Nova', device_map='auto')
print(pipe('Write a Python function to reverse a linked list'))
"

🔧 Model Details

  • Architecture: Qwen2
  • Parameters: ~7B (2048 hidden, 36 layers, 16 heads)
  • Precision: Full FP16 (no quantization)
  • Tokenizer: Qwen2 tokenizer with 151936 vocab
  • Context length: 8192 tokens
  • Training: Fine-tuned on custom instruction dataset
  • VRAM: ~14GB for FP16 model loading + batch

🤖 Run with Ollama

ollama run hf.co/teolm30/Fox-1.5-Nova
Downloads last month
3,079
Safetensors
Model size
4B params
Tensor type
F32
·
F16
·
U8
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for teolm30/Fox-1.5-Nova

Unable to build the model tree, the base model loops to the model itself. Learn more.

Collection including teolm30/Fox-1.5-Nova