Tessera 4

The Frontier of Efficiency: ORPO-Distilled Reasoning

Tessera 4 is a specialized mini-model designed to prove that massive scale is not a requirement for world-class reasoning. By utilizing ORPO (Odds Ratio Preference Optimization) and a high-signal distillation process from DeepSeek-R1, Tessera 4 achieves frontier-level performance in logic and mathematics while remaining small enough to run on consumer hardware (8GB VRAM).

🚀 The Reasoning Breakthrough

Tessera 4 was trained with a specific focus: Logical Accuracy over General Trivia. While we purposely allowed MMLU scores to sit at 66%, the trade-off resulted in a reasoning engine that surpasses its own teacher (DeepSeek-R1) and rivals GPT-5-class thresholds on core logic benchmarks.

📊 Benchmark Comparison

Benchmark	Tessera 4	DeepSeek-R1	Llama 3.1 400B
GSM8K	95%	80.1% (Base)	90%+
ARC-Challenge	93%	90-92%	90%+
MMLU	66%	75%+	85%+

Note: Benchmarks conducted on randomized high-signal subsets to verify zero-shot reasoning capabilities.

🛠️ Technical Specifications

Training Duration: ~8 Hours
Hardware: 1x RTX 3090
Methodology: ORPO Distillation
Optimization: Focused on Chain-of-Thought (CoT) path correction, eliminating the "verbose fluff" typical of larger reasoning models.

💻 Hardware Requirements & Format

Format: Full 16 Bit, 3090
VRAM: Recommended 8GB+
Compatibility: Optimized for LM Studio, Ollama, and llama.cpp.

🧠 Reasoning Showcase

All results generated at Q4_K_M quantization (4-bit).

🔢 1. High-Precision Math (15 Factorial)

Test: Calculate 15! step-by-step. Result: 1,307,674,368,000 (100% Correct)

Tessera 4 demonstrates zero-shot numerical stability, maintaining digit precision across 14 layers of multiplication.

📐 2. Unit Conversion & Physics

Test: A train travels 60km in 45 minutes. Find the speed in km/h. Result: 80 km/h (Correct)

The model correctly identifies the need to convert minutes to hours (0.75h) before applying the distance/time formula.

👽 3. Deep Logical Branching (The 3 Aliens)

Test: A complex "Truth-Teller, Liar, Alternator" puzzle. Result: Successfully identified Z=Truth, X=Alternator, Y=Liar (Correct)

Tessera 4 successfully tracked nested state changes and caught a logical contradiction in a secondary hypothesis branch.

🚗 4. Physical Grounding (The Car Wash)

Test: 100ft walk vs drive for a car wash. Result: Drive (Correct)

The model demonstrated common-sense grounding by realizing the "car" must be physically present at the car wash, overriding the "short walking distance" heuristic.

💬 Prompt Format

To achieve the scores listed above, you must use the correct prompt template. Since this is distilled from R1, it utilizes the DeepSeek-V3/R1 style:

<|im_start|>system
You are a highly logical reasoning engine. Think step-by-step.<|im_end|>
<|im_start|>user
[Your Question Here]<|im_end|>
<|im_start|>assistant
<|thought|>