--- library_name: transformers language: - en tags: - reasoning - implicit-reasoning - chain-of-thought - llama - asterisk - aspp - pi-flow - deep-reasoning license: apache-2.0 base_model: meta-llama/Llama-3.2-1B-Instruct model_name: Geilim-1B-Instruct datasets: - gsm8k - hellaswag - ai2_arc pipeline_tag: text-generation inference: true --- # Geilim-1B-Instruct (忌廉) > **Deep Causal Internal Reasoning** > No verbose CoT, no `` tags, just concise answers powered by implicit reasoning. --- ## 💡 Introduction Recent advances in reasoning models (DeepSeek R1, o1) have demonstrated impressive capabilities through Chain-of-Thought (CoT) reasoning. However, we observe several critical drawbacks: **Problems with External CoT:** 1. **Verbosity Tax**: Models generate hundreds of tokens in `` tags before answering, increasing latency and cost 2. **Autoregressive Dependency**: Models must "see" their reasoning to follow it, forcing sequential token generation 3. **Token Inefficiency**: Users pay for reasoning traces they often don't need, only the final answer matters 4. **Production Overhead**: Verbose outputs are impractical for real-time APIs and edge deployment **Our Insight**: What if reasoning could happen *internally* in the model's hidden states, without generating verbose traces? **Geilim-1B-Instruct** addresses these limitations through a hybrid architecture combining: - **ASPP (Adjacency-Structured Parallel Propagation)**: Graph-based causal chains for structured reasoning - **π-flow (Probability Flow Dynamics)**: Internal refinement in probability space without token generation - **Hybrid Gating**: Learnable balance between structured and attention-based processing The result: Deep reasoning capability with concise outputs - the best of both worlds. --- ## 🎯 Core Value Proposition **Geilim-1B-Instruct is the anti-verbose reasoning model.** | Model Type | Reasoning Approach | Output Style | |------------|-------------------|--------------| | **Baseline** (Llama-3.2-1B) | Limited reasoning | Direct but may lack depth | | **CoT Models** (DeepSeek R1, o1) | External reasoning chains | Verbose `` tags, long outputs | | **Geilim-1B-Instruct** | **Internal reasoning** | **Concise answers, reasoning in hidden states** | **Key Differentiator**: Geilim performs deep causal reasoning **internally** through ASPP+π-flow architecture, then outputs only the final answer. You get the reasoning quality without the verbosity tax. --- ## 🏗️ Architecture Overview Geilim-1B-Instruct combines three key components for implicit reasoning: ### 1. **ASPP Operator** (Adjacency-Structured Parallel Propagation) - **Union-Find graph structure**: Linear causal chain where each token only connects to its parent - **Iterative message passing**: `h_i^(t+1) = φ(h_i^(t), h_parent[i])` - **K-step evolution**: Adaptive 2-8 steps of causal propagation - **Complexity**: O(n) - efficient linear-time reasoning **Why it matters**: ASPP creates explicit causal relationships between tokens, allowing information to flow through a reasoning chain without generating output tokens. ### 2. **π-flow** (Probability Flow Dynamics) - **Velocity field learning**: `h' = h + α * v(h)` where `v(h)` is a learned refinement - **Multi-step refinement**: Iterates in probability space to converge on the correct answer - **Gated application**: Model learns when to refine (complex questions) vs when to skip (simple questions) - **Internal convergence**: Reasoning happens in hidden states, not in generated text **Why it matters**: π-flow eliminates the need for external CoT by performing iterative refinement internally. The model "thinks" in its hidden states and outputs only the final result. ### 3. **Hybrid Gating Mechanism** ``` output = gate * ASPP(x) + (1-gate) * Attention(x) ``` - Combines structured causal reasoning (ASPP) with flexible attention - Learnable balance between graph-based and sequence-based processing - Applied to all 30 layers of the base model (Llama-3.2-1B) --- ## 🧠 Why π-flow Eliminates Verbosity ### The Problem with Traditional CoT **External Reasoning Models** (DeepSeek R1, o1-style): ``` User: What is 15 * 8? Model: Let me break this down step by step: 1. First, I'll multiply 15 by 8 2. 15 * 8 = 15 * (10 - 2) 3. Using distributive property: 15*10 - 15*2 4. 150 - 30 = 120 Therefore, the answer is 120. The answer is 120. ``` - **Output**: 250+ characters - **Latency**: High (many tokens to generate) - **Cost**: Expensive (charged per token) ### Geilim's Internal Reasoning **Geilim-1B-Instruct** (ASPP+π-flow): ``` User: What is 15 * 8? Model: 120 ``` - **Output**: 3 characters - **Latency**: Low (minimal generation) - **Cost**: Minimal - **Reasoning**: Happened internally through: 1. ASPP causal chain propagating arithmetic relationships 2. π-flow refining probability distribution across answer space 3. Convergence to correct answer in hidden states --- ## 🔬 Technical Mechanism ### How π-flow Achieves Internal Reasoning 1. **Probability Space Operations** - Instead of generating tokens to explore answers, π-flow refines probability distributions directly - `v(h)`: Learned velocity field that corrects the model's initial judgment - Multi-step: `h^(0) → h^(1) → h^(2)` (2 refinement steps) 2. **Convergence Without Output** - Traditional models need to "see" their reasoning to follow it (autoregressive dependency) - π-flow breaks this: reasoning occurs in parallel across all positions simultaneously - The model converges internally before generating any output token 3. **Adaptive Complexity** - `pi_flow_use_gate=True`: Model learns when refinement is needed - Simple questions: Direct output (gate ≈ 0, skip refinement) - Complex questions: Internal multi-step refinement (gate ≈ 1, apply π-flow) - User always sees concise output regardless 4. **Synergy with ASPP** - ASPP provides causal structure (parent-child dependencies) - π-flow refines along these dependencies - **Result**: Structured reasoning (not just attention) + probabilistic convergence = deep causal understanding --- ## 📊 Configuration ### Model Architecture - **Base Model**: Llama-3.2-1B-Instruct (1.26B params) - **Total Parameters**: ~1.4B (140M additional ASPP+π-flow params) - **Hybrid Layers**: All 30 layers (universal reasoning capability) ### ASPP Settings ```python aspp_hidden_dim: 512 # vs 2048 model hidden_size (reduce overfitting) aspp_num_steps: 2-8 # learnable via sigmoid gating aspp_dropout: 0.15 aspp_num_neighbors: 1 # Union-Find: parent-only connections ``` ### π-flow Settings ```python pi_flow: True # Enable probability flow refinement pi_flow_steps: 2 # 2-step refinement pi_flow_scale: 0.5 # Moderate refinement strength pi_flow_use_gate: True # Adaptive gating ``` --- ## 🚀 Quick Start ### Installation ```bash pip install transformers torch ``` ### Basic Usage ```python from transformers import AutoTokenizer, AutoModelForCausalLM import torch # Load model model_path = "NoesisLab/Geilim-1B-Instruct" tokenizer = AutoTokenizer.from_pretrained(model_path, trust_remote_code=True) model = AutoModelForCausalLM.from_pretrained( model_path, trust_remote_code=True, torch_dtype=torch.bfloat16, device_map="auto", ) # Generate response prompt = "A store has 120 apples. They sell 35 in the morning and 48 in the afternoon. How many are left?" messages = [{"role": "user", "content": prompt}] input_text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True) inputs = tokenizer(input_text, return_tensors="pt").to(model.device) outputs = model.generate( **inputs, max_new_tokens=128, temperature=0.7, do_sample=True, top_p=0.9, ) response = tokenizer.decode(outputs[0][inputs['input_ids'].shape[1]:], skip_special_tokens=True) print(response) # Expected: "37" or "37 apples are left." (concise!) ``` ### Advanced Usage ```python # For math problems requiring step-by-step (if needed) # Note: Geilim prefers concise outputs, but can show work if prompted prompt = "Explain how you would solve: What is 15 * 23?" # For best results with implicit reasoning generation_config = { "max_new_tokens": 128, # Keep low to encourage conciseness "temperature": 0.7, # Moderate sampling "do_sample": True, "top_p": 0.9, "repetition_penalty": 1.1, # Prevent loops } ``` --- ## 🎓 Training Details ### Dataset - **Mixed-Benchmark-Dataset** (composite reasoning benchmarks) - 25% GSM8K (math reasoning) - 30% HellaSwag (commonsense) - 20% ARC (science QA) - 10% OpenHermes (high-quality responses) - 15% Capybara (multi-turn conversations) ### Training Configuration - **Framework**: TRL SFTTrainer with packing - **Epochs**: 2 - **Batch Size**: Effective 8 (per_device=2, grad_accum=4) - **Learning Rate**: 2e-4 with 10% warmup - **Precision**: bfloat16 with gradient checkpointing - **Optimizer**: AdamW (weight_decay=0.1, max_grad_norm=1.0) ### Training Philosophy Unlike CoT models trained on verbose reasoning chains, Geilim is trained on **answer-focused data** where: - Correct answers are rewarded - Reasoning quality is learned implicitly through ASPP+π-flow gradients - The model learns to converge internally rather than generate external reasoning --- ## 📈 Evaluation ### Reasoning Quality Tests Geilim is evaluated on: 1. **Math reasoning** (GSM8K-style arithmetic) 2. **Commonsense reasoning** (HellaSwag, PIQA) 3. **Logic puzzles** (multi-hop deduction) 4. **Reading comprehension** (information tracking) 5. **Causal reasoning** (cause-effect relationships) ### Key Metrics - **Answer correctness** (primary goal) - **Response conciseness** (< 150 chars = concise) - **Reasoning traces** (should be absent from output, present in hidden states) --- ## 🎯 Use Cases ### Ideal For: - **Production APIs**: Low latency, low token cost - **Real-time applications**: Minimal generation overhead - **Cost-sensitive deployments**: Pay only for the answer, not the reasoning - **User-facing chat**: Clean outputs without technical reasoning traces - **Mobile/edge devices**: Smaller token budgets ### Not Ideal For: - **Educational use cases**: When you want to show reasoning steps to users - **Debugging/verification**: When explicit reasoning helps validate answers - **Research**: When analyzing reasoning chains is the goal --- ## 🆚 Comparison Table | Feature | Geilim-1B-Instruct | DeepSeek R1 | Llama-3.2-1B | |---------|-----------|-------------|--------------| | **Model Size** | 1.4B | 1.5B | 1.26B | | **Reasoning Type** | Internal (ASPP+π-flow) | External (CoT) | Limited | | **Output Style** | Concise answers | Verbose `` tags | Direct answers | | **Latency** | Low | High (many tokens) | Low | | **Cost per query** | Low | High | Low | | **Reasoning depth** | Deep (hidden states) | Deep (explicit) | Shallow | | **Token efficiency** | High | Low | Medium | --- ## 📚 Technical References ### Core Papers & Concepts - **Union-Find Data Structure**: Parent-only connections for efficient causal propagation - **Probability Flow ODEs**: Continuous refinement in probability space (inspired by diffusion models) - **Hybrid Architectures**: Combining structured (graph) and unstructured (attention) reasoning ### Related Work - DeepSeek R1: External reasoning chains - o1 series: Long-form CoT reasoning - SmolLM2: Efficient small language models - Graph Neural Networks: Structured message passing --- ## 🔧 Development ### Custom Model Registration - **Model type**: `asterisk` (registered with HuggingFace AutoModel) - **Config class**: `AsteriskConfig` (extends LlamaConfig) - **Model class**: `AsteriskForCausalLM` (extends LlamaForCausalLM) - **Loading**: Requires `trust_remote_code=True` --- ## 🌟 Key Takeaways 1. **No verbose CoT**: Geilim performs reasoning internally, outputs concisely 2. **ASPP+π-flow**: Causal graph structure + probability flow refinement 3. **Deep causal understanding**: Reasoning happens in hidden states, not generated text 4. **Production-ready**: Low latency, low cost, clean outputs 5. **Same reasoning depth**: Matches CoT models without the verbosity --- ## 📝 Citation If you use Geilim-1B-Instruct in your research or applications, please cite: ```bibtex @misc{geilim2026, title={Geilim-1B-Instruct: Deep Causal Internal Reasoning via ASPP and Probability Flow}, author={NoesisLab}, year={2026}, howpublished={HuggingFace Model Hub}, url={https://huggingface.co/NoesisLab/Geilim-1B-Instruct} } ``` --- ## 🤝 Acknowledgments - **Base Model**: Llama-3.2-1B-Instruct by Meta - **Training Framework**: TRL by HuggingFace - **Inspiration**: DeepSeek R1 (for demonstrating value of reasoning), but pursuing conciseness --- ## 📄 License Llama 3.2 Community License --- ## 🔗 Links - **Model Hub**: https://huggingface.co/NoesisLab/Geilim-1B-Instruct --- **Built with ❤️ for the era of efficient reasoning models.** *Geilim (忌廉) - Cantonese for "cream" - smooth, concise, and rich in substance.*