NoesisLab
/

Geilim-1B-Instruct

+# Geilim-1B-Instruct (忌廉)
+> **Deep Causal Internal Reasoning**
+> No verbose CoT, no `<think>` tags, just concise answers powered by implicit reasoning.
+---
+## 💡 Introduction
+Recent advances in reasoning models (DeepSeek R1, o1) have demonstrated impressive capabilities through Chain-of-Thought (CoT) reasoning. However, we observe several critical drawbacks:
+**Problems with External CoT:**
+1. **Verbosity Tax**: Models generate hundreds of tokens in `<think>` tags before answering, increasing latency and cost
+2. **Autoregressive Dependency**: Models must "see" their reasoning to follow it, forcing sequential token generation
+3. **Token Inefficiency**: Users pay for reasoning traces they often don't need, only the final answer matters
+4. **Production Overhead**: Verbose outputs are impractical for real-time APIs and edge deployment
+**Our Insight**: What if reasoning could happen *internally* in the model's hidden states, without generating verbose traces?
+**Geilim-1B-Instruct** addresses these limitations through a hybrid architecture combining:
+- **ASPP (Adjacency-Structured Parallel Propagation)**: Graph-based causal chains for structured reasoning
+- **π-flow (Probability Flow Dynamics)**: Internal refinement in probability space without token generation
+- **Hybrid Gating**: Learnable balance between structured and attention-based processing
+The result: Deep reasoning capability with concise outputs - the best of both worlds.
+---
+## 🎯 Core Value Proposition
+**Geilim-1B-Instruct is the anti-verbose reasoning model.**
+| Model Type | Reasoning Approach | Output Style |
+|------------|-------------------|--------------|
+| **Baseline** (Llama-3.2-1B) | Limited reasoning | Direct but may lack depth |
+| **CoT Models** (DeepSeek R1, o1) | External reasoning chains | Verbose `<think>` tags, long outputs |
+| **Geilim-1B-Instruct** | **Internal reasoning** | **Concise answers, reasoning in hidden states** |
+**Key Differentiator**: Geilim performs deep causal reasoning **internally** through ASPP+π-flow architecture, then outputs only the final answer. You get the reasoning quality without the verbosity tax.
+---
+## 🏗️ Architecture Overview
+Geilim-1B-Instruct combines three key components for implicit reasoning:
+### 1. **ASPP Operator** (Adjacency-Structured Parallel Propagation)
+- **Union-Find graph structure**: Linear causal chain where each token only connects to its parent
+- **Iterative message passing**: `h_i^(t+1) = φ(h_i^(t), h_parent[i])`
+- **K-step evolution**: Adaptive 2-8 steps of causal propagation
+- **Complexity**: O(n) - efficient linear-time reasoning
+**Why it matters**: ASPP creates explicit causal relationships between tokens, allowing information to flow through a reasoning chain without generating output tokens.
+### 2. **π-flow** (Probability Flow Dynamics)
+- **Velocity field learning**: `h' = h + α * v(h)` where `v(h)` is a learned refinement
+- **Multi-step refinement**: Iterates in probability space to converge on the correct answer
+- **Gated application**: Model learns when to refine (complex questions) vs when to skip (simple questions)
+- **Internal convergence**: Reasoning happens in hidden states, not in generated text
+**Why it matters**: π-flow eliminates the need for external CoT by performing iterative refinement internally. The model "thinks" in its hidden states and outputs only the final result.
+### 3. **Hybrid Gating Mechanism**
+```
+output = gate * ASPP(x) + (1-gate) * Attention(x)
+```
+- Combines structured causal reasoning (ASPP) with flexible attention
+- Learnable balance between graph-based and sequence-based processing
+- Applied to all 30 layers of the base model (Llama-3.2-1B)
+---
+## 🧠 Why π-flow Eliminates Verbosity
+### The Problem with Traditional CoT
+**External Reasoning Models** (DeepSeek R1, o1-style):
+```
+User: What is 15 * 8?
+Model: <think>
+Let me break this down step by step:
+1. First, I'll multiply 15 by 8
+2. 15 * 8 = 15 * (10 - 2)
+3. Using distributive property: 15*10 - 15*2
+4. 150 - 30 = 120
+Therefore, the answer is 120.
+</think>
+The answer is 120.
+```
+- **Output**: 250+ characters
+- **Latency**: High (many tokens to generate)
+- **Cost**: Expensive (charged per token)
+### Geilim's Internal Reasoning
+**Geilim-1B-Instruct** (ASPP+π-flow):
+```
+User: What is 15 * 8?
+Model: 120
+```
+- **Output**: 3 characters
+- **Latency**: Low (minimal generation)
+- **Cost**: Minimal
+- **Reasoning**: Happened internally through:
+  1. ASPP causal chain propagating arithmetic relationships
+  2. π-flow refining probability distribution across answer space
+  3. Convergence to correct answer in hidden states
+---
+## 🔬 Technical Mechanism
+### How π-flow Achieves Internal Reasoning
+1. **Probability Space Operations**
+   - Instead of generating tokens to explore answers, π-flow refines probability distributions directly
+   - `v(h)`: Learned velocity field that corrects the model's initial judgment
+   - Multi-step: `h^(0) → h^(1) → h^(2)` (2 refinement steps)
+2. **Convergence Without Output**
+   - Traditional models need to "see" their reasoning to follow it (autoregressive dependency)
+   - π-flow breaks this: reasoning occurs in parallel across all positions simultaneously
+   - The model converges internally before generating any output token
+3. **Adaptive Complexity**
+   - `pi_flow_use_gate=True`: Model learns when refinement is needed
+   - Simple questions: Direct output (gate ≈ 0, skip refinement)
+   - Complex questions: Internal multi-step refinement (gate ≈ 1, apply π-flow)
+   - User always sees concise output regardless
+4. **Synergy with ASPP**
+   - ASPP provides causal structure (parent-child dependencies)
+   - π-flow refines along these dependencies
+   - **Result**: Structured reasoning (not just attention) + probabilistic convergence = deep causal understanding
+---
+## 📊 Configuration
+### Model Architecture
+- **Base Model**: Llama-3.2-1B-Instruct (1.26B params)
+- **Total Parameters**: ~1.4B (140M additional ASPP+π-flow params)
+- **Hybrid Layers**: All 30 layers (universal reasoning capability)
+### ASPP Settings
+```python
+aspp_hidden_dim: 512         # vs 2048 model hidden_size (reduce overfitting)
+aspp_num_steps: 2-8          # learnable via sigmoid gating
+aspp_dropout: 0.15
+aspp_num_neighbors: 1        # Union-Find: parent-only connections
+```
+### π-flow Settings
+```python
+pi_flow: True                # Enable probability flow refinement
+pi_flow_steps: 2             # 2-step refinement
+pi_flow_scale: 0.5           # Moderate refinement strength
+pi_flow_use_gate: True       # Adaptive gating
+```
+---
+## 🚀 Quick Start
+### Installation
+```bash
+pip install transformers torch
+```
+### Basic Usage
+```python
+from transformers import AutoTokenizer, AutoModelForCausalLM
+import torch
+# Load model
+model_path = "NoesisLab/Geilim-1B-Instruct"
+tokenizer = AutoTokenizer.from_pretrained(model_path, trust_remote_code=True)
+model = AutoModelForCausalLM.from_pretrained(
+    model_path,
+    trust_remote_code=True,
+    torch_dtype=torch.bfloat16,
+    device_map="auto",
+)
+# Generate response
+prompt = "A store has 120 apples. They sell 35 in the morning and 48 in the afternoon. How many are left?"
+messages = [{"role": "user", "content": prompt}]
+input_text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
+inputs = tokenizer(input_text, return_tensors="pt").to(model.device)
+outputs = model.generate(
+    **inputs,
+    max_new_tokens=128,
+    temperature=0.7,
+    do_sample=True,
+    top_p=0.9,
+)
+response = tokenizer.decode(outputs[0][inputs['input_ids'].shape[1]:], skip_special_tokens=True)
+print(response)  # Expected: "37" or "37 apples are left." (concise!)
+```
+### Advanced Usage
+```python
+# For math problems requiring step-by-step (if needed)
+# Note: Geilim prefers concise outputs, but can show work if prompted
+prompt = "Explain how you would solve: What is 15 * 23?"
+# For best results with implicit reasoning
+generation_config = {
+    "max_new_tokens": 128,        # Keep low to encourage conciseness
+    "temperature": 0.7,           # Moderate sampling
+    "do_sample": True,
+    "top_p": 0.9,
+    "repetition_penalty": 1.1,    # Prevent loops
+}
+```
+---
+## 🎓 Training Details
+### Dataset
+- **Mixed-Benchmark-Dataset** (composite reasoning benchmarks)
+  - 25% GSM8K (math reasoning)
+  - 30% HellaSwag (commonsense)
+  - 20% ARC (science QA)
+  - 10% OpenHermes (high-quality responses)
+  - 15% Capybara (multi-turn conversations)
+### Training Configuration
+- **Framework**: TRL SFTTrainer with packing
+- **Epochs**: 2
+- **Batch Size**: Effective 8 (per_device=2, grad_accum=4)
+- **Learning Rate**: 2e-4 with 10% warmup
+- **Precision**: bfloat16 with gradient checkpointing
+- **Optimizer**: AdamW (weight_decay=0.1, max_grad_norm=1.0)
+### Training Philosophy
+Unlike CoT models trained on verbose reasoning chains, Geilim is trained on **answer-focused data** where:
+- Correct answers are rewarded
+- Reasoning quality is learned implicitly through ASPP+π-flow gradients
+- The model learns to converge internally rather than generate external reasoning
+---
+## 📈 Evaluation
+### Reasoning Quality Tests
+Geilim is evaluated on:
+1. **Math reasoning** (GSM8K-style arithmetic)
+2. **Commonsense reasoning** (HellaSwag, PIQA)
+3. **Logic puzzles** (multi-hop deduction)
+4. **Reading comprehension** (information tracking)
+5. **Causal reasoning** (cause-effect relationships)
+### Key Metrics
+- **Answer correctness** (primary goal)
+- **Response conciseness** (< 150 chars = concise)
+- **Reasoning traces** (should be absent from output, present in hidden states)
+### Test Script
+```bash
+python test_geilim.py
+```
+Compares Geilim vs Llama-3.2-1B-Instruct baseline on 8 reasoning tasks.
+### Run Benchmarks
+```bash
+python run_lmeval.py
+```
+Evaluates on: WinoGrande, ARC (easy/challenge), HellaSwag, PIQA.
+---
+## 🎯 Use Cases
+### Ideal For:
+- **Production APIs**: Low latency, low token cost
+- **Real-time applications**: Minimal generation overhead
+- **Cost-sensitive deployments**: Pay only for the answer, not the reasoning
+- **User-facing chat**: Clean outputs without technical reasoning traces
+- **Mobile/edge devices**: Smaller token budgets
+### Not Ideal For:
+- **Educational use cases**: When you want to show reasoning steps to users
+- **Debugging/verification**: When explicit reasoning helps validate answers
+- **Research**: When analyzing reasoning chains is the goal
+---
+## 🆚 Comparison Table
+| Feature | Geilim-1B-Instruct | DeepSeek R1 | Llama-3.2-1B |
+|---------|-----------|-------------|--------------|
+| **Model Size** | 1.4B | 1.5B | 1.26B |
+| **Reasoning Type** | Internal (ASPP+π-flow) | External (CoT) | Limited |
+| **Output Style** | Concise answers | Verbose `<think>` tags | Direct answers |
+| **Latency** | Low | High (many tokens) | Low |
+| **Cost per query** | Low | High | Low |
+| **Reasoning depth** | Deep (hidden states) | Deep (explicit) | Shallow |
+| **Token efficiency** | High | Low | Medium |
+---
+## 📚 Technical References
+### Core Papers & Concepts
+- **Union-Find Data Structure**: Parent-only connections for efficient causal propagation
+- **Probability Flow ODEs**: Continuous refinement in probability space (inspired by diffusion models)
+- **Hybrid Architectures**: Combining structured (graph) and unstructured (attention) reasoning
+### Related Work
+- DeepSeek R1: External reasoning chains
+- o1 series: Long-form CoT reasoning
+- SmolLM2: Efficient small language models
+- Graph Neural Networks: Structured message passing
+---
+## 🔧 Development
+### Custom Model Registration
+- **Model type**: `asterisk` (registered with HuggingFace AutoModel)
+- **Config class**: `AsteriskConfig` (extends LlamaConfig)
+- **Model class**: `AsteriskForCausalLM` (extends LlamaForCausalLM)
+- **Loading**: Requires `trust_remote_code=True`
+### Training Your Own
+```bash
+# Install dependencies
+pip install -r requirements.txt
+# Train Geilim-1B-Instruct
+python train_geilim.py
+```
+---
+## 🌟 Key Takeaways
+1. **No verbose CoT**: Geilim performs reasoning internally, outputs concisely
+2. **ASPP+π-flow**: Causal graph structure + probability flow refinement
+3. **Deep causal understanding**: Reasoning happens in hidden states, not generated text
+4. **Production-ready**: Low latency, low cost, clean outputs
+5. **Same reasoning depth**: Matches CoT models without the verbosity
+---
+## 📝 Citation
+If you use Geilim-1B-Instruct in your research or applications, please cite:
+```bibtex
+@misc{geilim2026,
+  title={Geilim-1B-Instruct: Deep Causal Internal Reasoning via ASPP and Probability Flow},
+  author={NoesisLab},
+  year={2026},
+  howpublished={HuggingFace Model Hub},
+  url={https://huggingface.co/NoesisLab/Geilim-1B-Instruct}
+}
+```
+---
+## 🤝 Acknowledgments
+- **Base Model**: Llama-3.2-1B-Instruct by Meta
+- **Training Framework**: TRL by HuggingFace
+- **Inspiration**: DeepSeek R1 (for demonstrating value of reasoning), but pursuing conciseness
+---
+## 📄 License
+Llama 3.2 Community License
+---
+## 🔗 Links
+- **Model Hub**: https://huggingface.co/NoesisLab/Geilim-1B-Instruct
+- **Repository**: https://github.com/Liuxingyu1111111/Asterisk-R1
+---
+**Built with ❤️ for the era of efficient reasoning models.**
+*Geilim (忌廉) - Cantonese for "cream" - smooth, concise, and rich in substance.*