Occam-2B-CCoT (v0.1)
Presenting a new Clear Chain of Thougt (CCoT) concept
Model Overview
Occam-2B-CCoT is a 2-billion parameter Small Language Model (SLM) built on the Qwen3.5-2B architecture. It has been fine-tuned to execute highly efficient, "Clear Chain of Thought" (CCoT) reasoning.
Modern SLMs frequently fall into "hallucination loops," generating thousands of tokens in their <think> blocks before ultimately timing out and hitting the context wall. This model fixes that.
The Result: An 80%+ reduction in reasoning tokens. For developers, this means drastically faster time-to-first-token (TTFT) and massive API cost savings in production workloads.
--> Datasets and paper with detailed results will be available soon!
The Experiment: Applied Philosopical Reasoning
This fine-tuning experiment was driven by a core philosophical principle: to think clearly and precisely.
We name it in reagards to William of Occam and his principle that “Must not multiply entities beyond necessity”. Instead of allowing the model to endlessly debate itself in the latent space, Occam-2B-CCoT has been philosophically conditioned to filter out noise, structure its logic efficiently, and reasoning. It cuts away the unnecessary tokens to find the most direct path to the correct answer.
GPU Economics & Token Efficiency
In production, compute costs scale linearly with generated tokens. Occam-2B radically reduces this computational overhead:
- Massive Compute Reduction (~82%): On the MMLU-Pro benchmark, a sequential generation workload that would theoretically take the base model 73 hours (due to constant 1024-token wall hits) was completed by Occam-2B in just ~13 hours. This is an 82.4% reduction in total generated tokens.
- Smarter Zero-Shot Performance: The model has internalized logic so well that its zero-shot accuracy actually increased by +6.7% over the base model.
- Cheaper API Deployments: On graduate-level GPQA questions, Occam-2B averaged 242.5 words per correct answer compared to the base model's 651.0 words—making it 62.7% cheaper per query.
- Flawless Translations: It completely eliminates token exhaustion on nuanced translations, perfectly translating prompts across 5 languages in under 130 tokens (saving ~900 tokens per prompt).
Token Efficiency (When the Model is Correct) - MMLU-Pro
| Metric | Base Model | CCoT Model | Difference |
|---|---|---|---|
| Average Words | 682.3 | 118.6 | 82.6% fewer |
| Median Words | 480.0 | 98.0 | 79.6% fewer |
Multimodal & Agentic Workflows
- Zero Catastrophic Forgetting: This strict text-logic fine-tuning is completely non-destructive to the native early-fusion multimodal pathways. The model successfully handles complex vision tasks, like OCR coordinate mapping, with zero degradation.
- Local Tool Calling: Occam-2B excels as a CodeAgent (e.g., using Hugging Face's
smolagents). It natively writes and executes Python code blocks to interact with external tools cleanly in fractions of a second.
Known Limitations
- The 2B Parameter Reality: While its logic engine is incredibly sharp, its factual memory is bound by its 2-Billion parameter size.
- Confident Confabulations: If asked about highly specific cultural concepts or obscure trivia, it may logically fabricate an answer if it lacks the data in its weights. For strict factual use cases, pair this model with a RAG pipeline.
- The "Over-Thinker" Quirk: Because its brain is wired for strict logic, it may occasionally apply mathematical rigor to trivial daily tasks in its hidden
<think>block (e.g., spending 400 tokens mapping out a basic grocery list), though the final output remains perfectly accurate.
How to Prompt
To activate the model's efficient reasoning capabilities promt.
Recommended System Prompt (Optimized for Logic & General Use):
You are a clear and precise logical assistant.
Think exact to analyze the problem.
Once you are done thinking, provide your final answer outside the block.
For more complex assignements.
You are a clear and precise logical assistant.
Think exact to analyze the problem.
If evaluating an argument, explicitly check if the conclusion must logically follow from the premises before deciding.
Once you are done thinking, provide your final answer outside the block.
INSTRUCTIONS FOR USING TOOLS:
You have access to a tool called `your_tool_name`. YOU MUST USE IT.
To solve the user's problem, you must write a Python code block like this:
\```python
result = your_tool_name(parameter=123)
final_answer(result)
\```
- Downloads last month
- 35