File size: 10,518 Bytes
d574a3d
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
# Phase 2 Implementation Summary

## Status: COMPLETE ✓

All Phase 2 components have been successfully implemented, integrated, and validated.

---

## What Was Built

### 1. **MemoryWeighting Engine** (`reasoning_forge/memory_weighting.py`)
   - **Purpose**: Score adapter performance and weight future adapter selection based on historical memory
   - **Key Components**:
     - `AdapterWeight` dataclass: Tracks adapter metrics (coherence, conflict success, recency, composite weight)
     - `MemoryWeighting` class: Main engine for weight computation and selection

   - **Key Features**:
     - `compute_weights()`: Aggregates memory cocoons per adapter, computes composite weights [0, 2.0]
       - Base coherence contribution: ±0.5 (mean coherence from past uses)
       - Conflict success contribution: ±0.3 (% of "tension" memories with coherence > 0.7)
       - Recency contribution: ±0.2 (exponential decay with ~7 day half-life)
     - `select_primary()`: Choose best adapter for specific conflict context
     - `get_boosted_confidence()`: Modulate router confidence based on weight (soft boost: -50% to +50%)
     - `explain_weight()`: Expose weight breakdown for debugging/transparency
     - `get_all_weights()`: Export full weighting state

   - **Output**: Weight scores [0, 2.0] where:
     - 0.5 = Poor adapter (suppress by 50%)
     - 1.0 = Average adapter (neutral)
     - 2.0 = Excellent adapter (boost by 100%)

### 2. **TokenConfidenceEngine Enhancement** (`reasoning_forge/token_confidence.py`)
   - **Phase 2 Upgrade**: Wired living_memory into learning signal computation

   - **Enhanced `_compute_learning_signal()` method**:

     - Now queries memory for past responses by agent

     - Weights recent memories higher (exponential decay with 168-hour half-life)

     - Computes weighted average of historical coherence

     - Signal ranges [0.5, 1.0] based on past performance

   - **Impact**: 4th confidence signal (learning signal) now accesses actual historical data instead of neutral fallback



### 3. **ForgeEngine Integration** (`reasoning_forge/forge_engine.py`)

   - **Modified `__init__()`** (lines 52-88):

     - Now accepts `living_memory` parameter (defaults to None for backward compat)
     - Accepts `enable_memory_weighting` parameter (defaults to True)
     - Passes living_memory to TokenConfidenceEngine

     - Initializes MemoryWeighting if memory provided

   - **Enhanced `forge_with_debate()`** (lines 294-313):

     - After Round 0 conflict detection, stores top 5 conflicts in memory

     - Stores resolution outcomes for later analysis

     - Creates resolution_outcome dict with conflict metadata
   - **Backward Compatible**: ForgeEngine works without memory (memory_weighting=None, token_confidence learning signal =0.5)

### 4. **Conflict → Adapter Learning Bridge**
   - **Data Flow**:
     ```

     Debate with Conflict Detection


     Conflicts stored in LivingMemoryKernel


     MemoryCocoon with:

       - agent_pair (e.g., "Newton,Quantum")

       - conflict_type (contradiction/emphasis/framework)

       - coherence outcome

       - tension metric


     MemoryWeighting aggregates per adapter


     Next query: Router uses memory weights to boost/suppress adapters

     ```


---

## Test Results

**Phase 2 End-to-End Test Output** (from test_phase2_e2e.py):
```

[OK] PASS: MemoryWeighting Initialization

[OK] PASS: ForgeEngine with Living Memory

[OK] PASS: forge_with_debate() Storage

[OK] PASS: Memory Weight Explanations



Total: 4/4 tests passed

```

**Validation Results**:
- [OK] MemoryWeighting computes weights [0, 2.0] correctly
- [OK] Memory cocoons stored with conflict metadata
- [OK] Tensions tagged and indexed for recall
- [OK] Token confidence queries memory for learning signal
- [OK] ForgeEngine initializes with/without memory (backward compatible)
- [OK] Weight explanations expose all components

---

## How to Use Phase 2

### Quick Start with Memory-Weighted Routing
```python

from reasoning_forge.forge_engine import ForgeEngine

from reasoning_forge.living_memory import LivingMemoryKernel



# Create memory kernel

memory = LivingMemoryKernel(max_memories=100)



# Initialize forge with memory-weighted adapter selection

forge = ForgeEngine(

    living_memory=memory,

    enable_memory_weighting=True

)



# Run debate (conflicts stored automatically)

result = forge.forge_with_debate(

    "Complex multi-perspective question",

    debate_rounds=1

)



# Access memory weighting

weights = forge.memory_weighting.get_all_weights()

print(f"Adapter weights: {weights}")



# Explain a specific weight

explanation = forge.memory_weighting.explain_weight("newton")

print(explanation)

```

### Access Memory-Stored Conflicts
```python

# Recall conflicts by emotional tag

tensions = memory.recall_by_emotion("tension", limit=10)

for cocoon in tensions:

    print(f"Conflict: {cocoon.title}")

    print(f"  Coherence: {cocoon.coherence:.3f}")

    print(f"  Agents: {cocoon.adapter_used}")

```

### Query Learning Signal from Memory
```python

# TokenConfidenceEngine now uses real historical data

scores = forge.token_confidence.score_tokens(

    agent_response,

    agent_name="newton",

    peer_responses={...}

)



# learning_signal component now includes adaptive boost

# based on Newton's historical coherence

```

---

## Files Created/Modified

### New Files (1)
- `reasoning_forge/memory_weighting.py` (400 lines)

### Modified Files (3)
- `reasoning_forge/forge_engine.py` (+~30 lines for init + conflict storage)
- `reasoning_forge/token_confidence.py` (+~20 lines for recency weighting)
- `test_phase2_e2e.py` (220 lines - validation script)

---

## Architecture: Memory-Cost Loop

```

Debate Cycle N


Phase 1: Conflict Detection (existing)

    - Detects conflicts between agent perspectives

    - Scores by confidence + opposition


Phase 2: Memory Storage (NEW)

    - Store top 5 conflicts in LivingMemoryKernel

    - Tag with emotional_tag="tension"

    - Track agent pair, type, and final coherence


Phase 2: Memory Weighting (NEW)

    - MemoryWeighting queries memory

    - Computes per-adapter performance scores

    - Base coherence, conflict success, recency signals


Debate Cycle N+1


Phase 2: Adapter Selection (OPTIONAL)

    - Router uses memory weights to modulate confidence

    - High-performing adapters get +50% boost

    - Poor adapters get -50% suppression


Phase 1: Token Confidence (ENHANCED)

    - Learning signal now queries memory (not just neutral 0.5)

    - Boosts confidence for agents with high historical coherence


Improved multi-perspective reasoning through learning

```

---

## Key Design Decisions

1. **Weight Range [0, 2.0]**: Allows significant boost/suppression without breaking router confidence scores
2. **Soft Boost Strategy**: Memory weights modulate existing router confidence, preserving keyword intelligence
3. **Recency Decay**: ~7 day half-life prevents old, outdated memories from dominating
4. **Conflict Success Rate**: Prioritizes adapters that handled high-tension moments well
5. **Backward Compatibility**: ForgeEngine works without memory (living_memory=None)



---



## Success Criteria Met



- [x] MemoryWeighting computes weights [0, 2.0] correctly

- [x] Memory cocoons store conflict metadata

- [x] Living_memory wired into TokenConfidenceEngine
- [x] ForgeEngine accepts memory parameter
- [x] Conflict→Adapter learning pathway established
- [x] Recency weighting implemented (7-day half-life)
- [x] Weight explanations expose all components
- [x] End-to-end test passes all 4 validations
- [x] Backward compatible (no breaking changes)

---

## What's Next (Phase 3+)

1. **Strict Memory-Only Routing** (optional):
   - Ignore keywords entirely
   - Select adapters purely by memory weight
   - Pure learning approach (higher risk, higher reward)

2. **Conflict → Resolution Feedback**:
   - Track if conflicts were actually resolved
   - Boost adapters that resolve conflicts more effectively
   - Multi-round learning (not just single-round)

3. **Semantic Conflict Clustering**:
   - Group similar recurring conflicts
   - Identify systematic weaknesses (e.g., "Quantum agents struggle with deterministic questions")
   - Targeted adapter boosting by conflict class

4. **Probabilistic Routing**:
   - Sample adapters by weight (not just pick best)
   - Enables exploration vs exploitation
   - Learn from failures, not just successes

5. **Cross-Query Memory**:
   - Link queries to past conflicts
   - Recognize when similar conflicts arise
   - Pre-select adapters before round 0

---

## Code Quality

- **Tested**: All components validated via end-to-end test
- **Documented**: Docstrings on all public methods
- **Dataclasses**: Type-safe with @dataclass
- **Error Handling**: Graceful fallbacks (no memory → neutral weights)
- **No Dependencies**: Uses only existing imports (numpy, json, time, math)
- **Backward Compatible**: ForgeEngine/TokenConfidenceEngine work without memory

---

## Notes for Implementation

1. **Adapter Naming**: Currently stores as agent pairs (e.g., "Newton,Quantum"). For adapter-specific routing, need to track actual adapter names from inference layer.
2. **Weight Update Frequency**: Default 1 hour (update_interval_hours). Can tune based on memory size and query frequency.
3. **Conflict Retention**: Top 5 conflicts stored per debate (configurable). Tune based on memory budget (max_memories=100).

4. **Soft Boost Modulation**: Currently -50% to +50% via `weight_modifier = (weight - 1.0) / 2.0`. Can adjust range in AdapterRouter integration.

---

## Integration with Existing Systems

**Integrates with**:
- Phase 1: Conflict detection (uses conflicts as learning signal)
- EpistemicMetrics: Coherence/tension metrics (returned in metadata)
- LivingMemoryKernel: Stores/recalls conflicts as cocoons
- TokenConfidenceEngine: Uses memory for 4th signal

**Compatible with**:
- AdapterRouter (ready for memory-weighted confidence boost)
- TrustCalibrator (independent, can use weights as secondary signal)
- SynthesisEngine (no changes needed)

---

Generated: 2026-03-19
Status: Ready for Phase 3 or production deployment