File size: 13,254 Bytes
36b6364
 
 
 
 
 
 
 
 
 
 
 
 
d90def1
36b6364
 
 
 
 
 
 
3dd228c
36b6364
 
10bc31b
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
d90def1
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
---
library_name: transformers
language:
- en
tags:
- reasoning
- implicit-reasoning
- chain-of-thought
- llama
- asterisk
- aspp
- pi-flow
- deep-reasoning
license: apache-2.0
base_model: meta-llama/Llama-3.2-1B-Instruct
model_name: Geilim-1B-Instruct
datasets:
- gsm8k
- hellaswag
- ai2_arc
pipeline_tag: text-generation
inference: true
---

# Geilim-1B-Instruct (εΏŒε»‰)

> **Deep Causal Internal Reasoning**
> No verbose CoT, no `<think>` tags, just concise answers powered by implicit reasoning.

---

## πŸ’‘ Introduction

Recent advances in reasoning models (DeepSeek R1, o1) have demonstrated impressive capabilities through Chain-of-Thought (CoT) reasoning. However, we observe several critical drawbacks:

**Problems with External CoT:**
1. **Verbosity Tax**: Models generate hundreds of tokens in `<think>` tags before answering, increasing latency and cost
2. **Autoregressive Dependency**: Models must "see" their reasoning to follow it, forcing sequential token generation
3. **Token Inefficiency**: Users pay for reasoning traces they often don't need, only the final answer matters
4. **Production Overhead**: Verbose outputs are impractical for real-time APIs and edge deployment

**Our Insight**: What if reasoning could happen *internally* in the model's hidden states, without generating verbose traces?

**Geilim-1B-Instruct** addresses these limitations through a hybrid architecture combining:
- **ASPP (Adjacency-Structured Parallel Propagation)**: Graph-based causal chains for structured reasoning
- **Ο€-flow (Probability Flow Dynamics)**: Internal refinement in probability space without token generation
- **Hybrid Gating**: Learnable balance between structured and attention-based processing

The result: Deep reasoning capability with concise outputs - the best of both worlds.

---

## 🎯 Core Value Proposition

**Geilim-1B-Instruct is the anti-verbose reasoning model.**

| Model Type | Reasoning Approach | Output Style |
|------------|-------------------|--------------|
| **Baseline** (Llama-3.2-1B) | Limited reasoning | Direct but may lack depth |
| **CoT Models** (DeepSeek R1, o1) | External reasoning chains | Verbose `<think>` tags, long outputs |
| **Geilim-1B-Instruct** | **Internal reasoning** | **Concise answers, reasoning in hidden states** |

**Key Differentiator**: Geilim performs deep causal reasoning **internally** through ASPP+Ο€-flow architecture, then outputs only the final answer. You get the reasoning quality without the verbosity tax.

---

## πŸ—οΈ Architecture Overview

Geilim-1B-Instruct combines three key components for implicit reasoning:

### 1. **ASPP Operator** (Adjacency-Structured Parallel Propagation)
- **Union-Find graph structure**: Linear causal chain where each token only connects to its parent
- **Iterative message passing**: `h_i^(t+1) = Ο†(h_i^(t), h_parent[i])`
- **K-step evolution**: Adaptive 2-8 steps of causal propagation
- **Complexity**: O(n) - efficient linear-time reasoning

**Why it matters**: ASPP creates explicit causal relationships between tokens, allowing information to flow through a reasoning chain without generating output tokens.

### 2. **Ο€-flow** (Probability Flow Dynamics)
- **Velocity field learning**: `h' = h + Ξ± * v(h)` where `v(h)` is a learned refinement
- **Multi-step refinement**: Iterates in probability space to converge on the correct answer
- **Gated application**: Model learns when to refine (complex questions) vs when to skip (simple questions)
- **Internal convergence**: Reasoning happens in hidden states, not in generated text

**Why it matters**: Ο€-flow eliminates the need for external CoT by performing iterative refinement internally. The model "thinks" in its hidden states and outputs only the final result.

### 3. **Hybrid Gating Mechanism**
```
output = gate * ASPP(x) + (1-gate) * Attention(x)
```
- Combines structured causal reasoning (ASPP) with flexible attention
- Learnable balance between graph-based and sequence-based processing
- Applied to all 30 layers of the base model (Llama-3.2-1B)

---

## 🧠 Why Ο€-flow Eliminates Verbosity

### The Problem with Traditional CoT

**External Reasoning Models** (DeepSeek R1, o1-style):
```
User: What is 15 * 8?

Model: <think>
Let me break this down step by step:
1. First, I'll multiply 15 by 8
2. 15 * 8 = 15 * (10 - 2)
3. Using distributive property: 15*10 - 15*2
4. 150 - 30 = 120
Therefore, the answer is 120.
</think>

The answer is 120.
```
- **Output**: 250+ characters
- **Latency**: High (many tokens to generate)
- **Cost**: Expensive (charged per token)

### Geilim's Internal Reasoning

**Geilim-1B-Instruct** (ASPP+Ο€-flow):
```
User: What is 15 * 8?

Model: 120
```
- **Output**: 3 characters
- **Latency**: Low (minimal generation)
- **Cost**: Minimal
- **Reasoning**: Happened internally through:
  1. ASPP causal chain propagating arithmetic relationships
  2. Ο€-flow refining probability distribution across answer space
  3. Convergence to correct answer in hidden states

---

## πŸ”¬ Technical Mechanism

### How Ο€-flow Achieves Internal Reasoning

1. **Probability Space Operations**
   - Instead of generating tokens to explore answers, Ο€-flow refines probability distributions directly
   - `v(h)`: Learned velocity field that corrects the model's initial judgment
   - Multi-step: `h^(0) β†’ h^(1) β†’ h^(2)` (2 refinement steps)

2. **Convergence Without Output**
   - Traditional models need to "see" their reasoning to follow it (autoregressive dependency)
   - Ο€-flow breaks this: reasoning occurs in parallel across all positions simultaneously
   - The model converges internally before generating any output token

3. **Adaptive Complexity**
   - `pi_flow_use_gate=True`: Model learns when refinement is needed
   - Simple questions: Direct output (gate β‰ˆ 0, skip refinement)
   - Complex questions: Internal multi-step refinement (gate β‰ˆ 1, apply Ο€-flow)
   - User always sees concise output regardless

4. **Synergy with ASPP**
   - ASPP provides causal structure (parent-child dependencies)
   - Ο€-flow refines along these dependencies
   - **Result**: Structured reasoning (not just attention) + probabilistic convergence = deep causal understanding

---

## πŸ“Š Configuration

### Model Architecture
- **Base Model**: Llama-3.2-1B-Instruct (1.26B params)
- **Total Parameters**: ~1.4B (140M additional ASPP+Ο€-flow params)
- **Hybrid Layers**: All 30 layers (universal reasoning capability)

### ASPP Settings
```python
aspp_hidden_dim: 512         # vs 2048 model hidden_size (reduce overfitting)
aspp_num_steps: 2-8          # learnable via sigmoid gating
aspp_dropout: 0.15
aspp_num_neighbors: 1        # Union-Find: parent-only connections
```

### Ο€-flow Settings
```python
pi_flow: True                # Enable probability flow refinement
pi_flow_steps: 2             # 2-step refinement
pi_flow_scale: 0.5           # Moderate refinement strength
pi_flow_use_gate: True       # Adaptive gating
```

---

## πŸš€ Quick Start

### Installation
```bash
pip install transformers torch
```

### Basic Usage
```python
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

# Load model
model_path = "NoesisLab/Geilim-1B-Instruct"
tokenizer = AutoTokenizer.from_pretrained(model_path, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
    model_path,
    trust_remote_code=True,
    torch_dtype=torch.bfloat16,
    device_map="auto",
)

# Generate response
prompt = "A store has 120 apples. They sell 35 in the morning and 48 in the afternoon. How many are left?"
messages = [{"role": "user", "content": prompt}]

input_text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(input_text, return_tensors="pt").to(model.device)

outputs = model.generate(
    **inputs,
    max_new_tokens=128,
    temperature=0.7,
    do_sample=True,
    top_p=0.9,
)

response = tokenizer.decode(outputs[0][inputs['input_ids'].shape[1]:], skip_special_tokens=True)
print(response)  # Expected: "37" or "37 apples are left." (concise!)
```

### Advanced Usage
```python
# For math problems requiring step-by-step (if needed)
# Note: Geilim prefers concise outputs, but can show work if prompted
prompt = "Explain how you would solve: What is 15 * 23?"

# For best results with implicit reasoning
generation_config = {
    "max_new_tokens": 128,        # Keep low to encourage conciseness
    "temperature": 0.7,           # Moderate sampling
    "do_sample": True,
    "top_p": 0.9,
    "repetition_penalty": 1.1,    # Prevent loops
}
```

---

## πŸŽ“ Training Details

### Dataset
- **Mixed-Benchmark-Dataset** (composite reasoning benchmarks)
  - 25% GSM8K (math reasoning)
  - 30% HellaSwag (commonsense)
  - 20% ARC (science QA)
  - 10% OpenHermes (high-quality responses)
  - 15% Capybara (multi-turn conversations)

### Training Configuration
- **Framework**: TRL SFTTrainer with packing
- **Epochs**: 2
- **Batch Size**: Effective 8 (per_device=2, grad_accum=4)
- **Learning Rate**: 2e-4 with 10% warmup
- **Precision**: bfloat16 with gradient checkpointing
- **Optimizer**: AdamW (weight_decay=0.1, max_grad_norm=1.0)

### Training Philosophy
Unlike CoT models trained on verbose reasoning chains, Geilim is trained on **answer-focused data** where:
- Correct answers are rewarded
- Reasoning quality is learned implicitly through ASPP+Ο€-flow gradients
- The model learns to converge internally rather than generate external reasoning

---

## πŸ“ˆ Evaluation

### Reasoning Quality Tests
Geilim is evaluated on:
1. **Math reasoning** (GSM8K-style arithmetic)
2. **Commonsense reasoning** (HellaSwag, PIQA)
3. **Logic puzzles** (multi-hop deduction)
4. **Reading comprehension** (information tracking)
5. **Causal reasoning** (cause-effect relationships)

### Key Metrics
- **Answer correctness** (primary goal)
- **Response conciseness** (< 150 chars = concise)
- **Reasoning traces** (should be absent from output, present in hidden states)

---

## 🎯 Use Cases

### Ideal For:
- **Production APIs**: Low latency, low token cost
- **Real-time applications**: Minimal generation overhead
- **Cost-sensitive deployments**: Pay only for the answer, not the reasoning
- **User-facing chat**: Clean outputs without technical reasoning traces
- **Mobile/edge devices**: Smaller token budgets

### Not Ideal For:
- **Educational use cases**: When you want to show reasoning steps to users
- **Debugging/verification**: When explicit reasoning helps validate answers
- **Research**: When analyzing reasoning chains is the goal

---

## πŸ†š Comparison Table

| Feature | Geilim-1B-Instruct | DeepSeek R1 | Llama-3.2-1B |
|---------|-----------|-------------|--------------|
| **Model Size** | 1.4B | 1.5B | 1.26B |
| **Reasoning Type** | Internal (ASPP+Ο€-flow) | External (CoT) | Limited |
| **Output Style** | Concise answers | Verbose `<think>` tags | Direct answers |
| **Latency** | Low | High (many tokens) | Low |
| **Cost per query** | Low | High | Low |
| **Reasoning depth** | Deep (hidden states) | Deep (explicit) | Shallow |
| **Token efficiency** | High | Low | Medium |

---

## πŸ“š Technical References

### Core Papers & Concepts
- **Union-Find Data Structure**: Parent-only connections for efficient causal propagation
- **Probability Flow ODEs**: Continuous refinement in probability space (inspired by diffusion models)
- **Hybrid Architectures**: Combining structured (graph) and unstructured (attention) reasoning

### Related Work
- DeepSeek R1: External reasoning chains
- o1 series: Long-form CoT reasoning
- SmolLM2: Efficient small language models
- Graph Neural Networks: Structured message passing

---

## πŸ”§ Development

### Custom Model Registration
- **Model type**: `asterisk` (registered with HuggingFace AutoModel)
- **Config class**: `AsteriskConfig` (extends LlamaConfig)
- **Model class**: `AsteriskForCausalLM` (extends LlamaForCausalLM)
- **Loading**: Requires `trust_remote_code=True`


---

## 🌟 Key Takeaways

1. **No verbose CoT**: Geilim performs reasoning internally, outputs concisely
2. **ASPP+Ο€-flow**: Causal graph structure + probability flow refinement
3. **Deep causal understanding**: Reasoning happens in hidden states, not generated text
4. **Production-ready**: Low latency, low cost, clean outputs
5. **Same reasoning depth**: Matches CoT models without the verbosity

---

## πŸ“ Citation

If you use Geilim-1B-Instruct in your research or applications, please cite:

```bibtex
@misc{geilim2026,
  title={Geilim-1B-Instruct: Deep Causal Internal Reasoning via ASPP and Probability Flow},
  author={NoesisLab},
  year={2026},
  howpublished={HuggingFace Model Hub},
  url={https://huggingface.co/NoesisLab/Geilim-1B-Instruct}
}
```

---

## 🀝 Acknowledgments

- **Base Model**: Llama-3.2-1B-Instruct by Meta
- **Training Framework**: TRL by HuggingFace
- **Inspiration**: DeepSeek R1 (for demonstrating value of reasoning), but pursuing conciseness

---

## πŸ“„ License

Llama 3.2 Community License

---

## πŸ”— Links

- **Model Hub**: https://huggingface.co/NoesisLab/Geilim-1B-Instruct
---

**Built with ❀️ for the era of efficient reasoning models.**

*Geilim (εΏŒε»‰) - Cantonese for "cream" - smooth, concise, and rich in substance.*