TMCRA Training Notes

This document summarizes the training direction behind the graph scorer package included in this release.

Training Goal

TMCRA trains graph-scoring components for long-memory retrieval. The goal is to help the runtime decide which memory nodes and graph paths should be surfaced to an answer model for a given user query.

The trained model is not intended to replace the answer LLM. It is responsible for memory selection:

identify relevant memory nodes
score graph paths between related memory events
preserve useful cross-turn and cross-session links
reduce noisy or stale evidence before answer generation

Model Components

The released model directory contains two main runtime scorers:

node_scorer.pt: scores candidate memory nodes.
path_scorer.pt: scores graph paths and tunnel links between memory nodes.

The training output also includes:

best checkpoints
last checkpoints
epoch and step checkpoints
training summary and logs
export manifest

Training Data Direction

Training data is built around dialogue-memory behavior rather than isolated QA pairs. Samples are designed to teach the model how memory should connect across turns and sessions.

The major training directions include:

direct user facts
assistant-provided details
preference/profile signals
temporal state changes
old-value vs current-value selection
cross-session event links
multi-evidence aggregation
evidence-positive vs noise/negative memory separation
unit-to-unit coverage for count/sum/compare tasks

Graph Memory Supervision

Each training example is converted into graph-oriented supervision. Instead of only asking whether a text chunk is relevant, TMCRA trains over:

memory node relevance
event-unit relevance
path usefulness
tunnel/link usefulness
evidence role
currentness and temporal state
whether a candidate should be injected into answer context

This allows the runtime to learn memory structure, not only lexical similarity.

Writer and Scorer Separation

TMCRA separates memory writing from graph scoring.

The writer extracts candidate memory records from dialogue. The graph model then learns how those records should be selected and connected during retrieval.

This separation is important because a long-memory system needs two different abilities:

write useful memory units from conversation
retrieve and connect the right units later under noise

Training Output Included

The packaged model output is located at:

models/action_frame_tunnel_graph548_tunnel_fusion_train_20260524_042557/

Runtime files:

node_scorer.pt
path_scorer.pt
export_manifest.json

Full training trace:

checkpoints/
node_scorer_best.pt
path_scorer_best.pt
node_scorer_last.pt
path_scorer_last.pt
train_summary.json
train.log
training_issues.jsonl

Current Training Lessons

The current baseline shows that TMCRA has strong single-session fact recall and assistant-detail recall. It also has working temporal and preference layers.

The main remaining training targets are:

stronger multi-session aggregation
better unit coverage for count/sum/compare questions
deeper temporal graph planning
query-graph to memory-graph matching
more stable preference abstraction under indirect user requests

These directions are the next step for improving TMCRA from a working long-memory runtime into a stronger general agent-memory layer.