File size: 5,543 Bytes
ce9c023
 
 
 
 
 
 
 
 
 
 
20dcaa2
ce9c023
20dcaa2
ce9c023
20dcaa2
ce9c023
20dcaa2
ce9c023
20dcaa2
ce9c023
20dcaa2
ce9c023
20dcaa2
ce9c023
94f2f56
ce9c023
 
 
 
 
 
 
94f2f56
20dcaa2
ce9c023
20dcaa2
ce9c023
20dcaa2
ce9c023
20dcaa2
 
ce9c023
 
 
 
219f661
 
ce9c023
 
 
 
20dcaa2
 
ce9c023
20dcaa2
ce9c023
20dcaa2
ce9c023
 
 
 
 
 
20dcaa2
ce9c023
20dcaa2
 
ce9c023
 
 
 
20dcaa2
 
80884fb
 
 
 
72dc73d
 
 
 
 
 
 
 
 
 
 
 
 
 
 
80884fb
 
 
 
 
 
 
 
 
 
 
72dc73d
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
20dcaa2
 
ce9c023
20dcaa2
ce9c023
20dcaa2
ce9c023
 
 
 
 
 
 
 
20dcaa2
ce9c023
20dcaa2
ce9c023
20dcaa2
ce9c023
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
---
license: mit
library_name: pytorch
tags:
  - graph-neural-network
  - language-model
  - causal-lm
  - experimental
  - token-graph
pipeline_tag: text-generation
---

# TMCRA TokenGraph-LLM Stage C

TMCRA TokenGraph-LLM is an experimental graph-native autoregressive language model prototype. It is not a Transformer wrapper and does not call an external LLM at inference time. Text is generated from token-level graph encoding, learned edge gates, graph message passing, and a dynamic graph causal decoder.

This Hugging Face repository hosts model artifacts. Full source code, training scripts, graph builders, and documentation are published in the GitHub repository:

[https://github.com/reshuibuduo/TMCRA-TokenGraph-LLM](https://github.com/reshuibuduo/TMCRA-TokenGraph-LLM)

## Current Model

Current default checkpoint:

- release line: `v0.2.0-stagec`
- package: `tmcra_tokengraph_stagec_model_package_20260606.zip`
- checkpoint inside package: `checkpoint/token_graph_dynamic_decoder_v3.pt`
- parameters: `114,615,372`
- shape: `dim=512`, `graph_layers=8`, `decoder_layers=10`
- embeddings: untied
- precision during training: `bf16`
- effective training samples: about `1.03M`
- training steps: `62,000`
- SHA256: `cc23285628eaed47c20009b6be6b5eb0600ded57ac2e09519370d97158fecd33`

Legacy v0.1 files may still be present in this repository for historical comparison. The Stage C package is the current recommended artifact.

## Package Contents

The Stage C zip contains:

```text
checkpoint/token_graph_dynamic_decoder_v3.pt
dataset_metadata/tokenizer.json
dataset_metadata/manifest.json
training_summary_stagec_public.json
docs/TMCRA_TOKENGRAPH_STAGEC_TECHNICAL_OVERVIEW.md
docs/TMCRA_TOKENGRAPH_STAGEC_TECHNICAL_OVERVIEW_ZH.md
docs/STAGEC_DETAILED_BENCHMARK_SMOKE_20260606.md
MODEL_CARD.md
PACKAGE_MANIFEST.md
SHA256SUMS.txt
```

## Full-Chain Training Code

The GitHub source repository now includes the full-chain Stage C training path:

- open-corpus schema2 conversion scripts;
- optional semantic teacher annotation through OpenAI-compatible or local Hugging Face models;
- token-level reasoning graph builders;
- `simple_plus_causal_target` graph mode;
- Stage C training and checkpoint continuation;
- graph ablation and token attribution evaluation.

Start from:

```text
docs/FULL_CHAIN_TRAINING.md
docs/FULL_CHAIN_TRAINING_ZH.md
scripts/run_stagec_full_chain_template.sh
scripts/run_stagec_sharded_training_template.sh
```

## How Next-Token Generation Works

Stage C predicts the next token through a graph-native causal path:

```mermaid
flowchart TD
    A["Text / prompt / source segments / target text"] --> B["Tokenizer"]
    B --> C["Token Graph Builder"]
    C --> D["Token nodes"]
    C --> E["Typed candidate edges"]
    D --> G["TokenGraphEncoderV3"]
    E --> G
    G --> H["Encoded context graph states"]
    H --> I["Dynamic Token Graph Decoder"]
    I --> J["Generated token node"]
    J --> I
    I --> K["Next-token distribution"]
```

```text
schema2 text
  -> token graph nodes and typed candidate edges
  -> learned edge-gated graph propagation
  -> dynamic generated-token graph nodes
  -> prefix-edge + context-edge gated decoding
  -> vocabulary logits
```

The graph builder proposes token nodes and typed candidate edges. The model then learns edge gates, propagates messages through the token graph, scores context nodes, and decodes each generated token as a dynamic graph node. The decoder combines a learned prefix message from previous generated-token nodes with a learned context message from encoded graph nodes, then maps the updated graph-decoder state to next-token logits. This keeps the main objective as next-token prediction while making generation depend on typed graph structure rather than Transformer self-attention.

Single-step decoding:

```mermaid
flowchart LR
    A["Encoded context graph<br/>N nodes"] --> D["Context gate"]
    B["Generated prefix nodes<br/>window W"] --> C["Prefix gate"]
    C --> E["Generated-token node t"]
    D --> E
    E --> F["Graph decoder update"]
    F --> G["Vocabulary logits"]
    G --> H["next token"]
    H --> I["Append as graph node"]
```

## Complexity Growth

Dense Transformer self-attention grows roughly as:

```text
O(n^2 * d)
```

Stage C replaces sequence-wide all-token attention with graph candidate edges and dynamic graph decoding:

```text
Graph encoder:        O(L_g * (N + E) * d)
Dynamic prefix path:  O(L_d * T * W * d)
Context tunnel path:  O(L_d * T * N * d)
```

where `N` is context graph nodes, `E` is candidate edges, `T` is generated length, and `W` is the bounded generated-prefix window. The current context tunnel still scans encoded context nodes; the accurate claim is not constant-time generation, but replacing dense sequence-wide self-attention with sparse typed graph propagation plus explicit context tunneling.

## Current Capability Boundary

Stage C can generate early English story-style continuations and shows measurable dependence on typed graph edges. It is not a reliable production LLM. Current weak areas include exact factual QA, numeric reasoning, robust instruction following, grammar stability, multilingual generation, and long-range concept binding.

Smoke results:

| evaluation | result |
|---|---:|
| Stage C normal total loss | 6.512117 |
| Stage C normal LM loss | 4.641285 |
| no_edges total loss | 8.310654 |
| shuffle_edges total loss | 7.702783 |
| TinyStories avg words | 73.88 |
| BLiMP smoke | 59%-64% |

These are smoke tests, not leaderboard claims.

## License

MIT.