2009YU's picture
Rename Stage C model package artifact
94f2f56 verified
|
raw
history blame contribute delete
5.54 kB
---
license: mit
library_name: pytorch
tags:
- graph-neural-network
- language-model
- causal-lm
- experimental
- token-graph
pipeline_tag: text-generation
---
# TMCRA TokenGraph-LLM Stage C
TMCRA TokenGraph-LLM is an experimental graph-native autoregressive language model prototype. It is not a Transformer wrapper and does not call an external LLM at inference time. Text is generated from token-level graph encoding, learned edge gates, graph message passing, and a dynamic graph causal decoder.
This Hugging Face repository hosts model artifacts. Full source code, training scripts, graph builders, and documentation are published in the GitHub repository:
[https://github.com/reshuibuduo/TMCRA-TokenGraph-LLM](https://github.com/reshuibuduo/TMCRA-TokenGraph-LLM)
## Current Model
Current default checkpoint:
- release line: `v0.2.0-stagec`
- package: `tmcra_tokengraph_stagec_model_package_20260606.zip`
- checkpoint inside package: `checkpoint/token_graph_dynamic_decoder_v3.pt`
- parameters: `114,615,372`
- shape: `dim=512`, `graph_layers=8`, `decoder_layers=10`
- embeddings: untied
- precision during training: `bf16`
- effective training samples: about `1.03M`
- training steps: `62,000`
- SHA256: `cc23285628eaed47c20009b6be6b5eb0600ded57ac2e09519370d97158fecd33`
Legacy v0.1 files may still be present in this repository for historical comparison. The Stage C package is the current recommended artifact.
## Package Contents
The Stage C zip contains:
```text
checkpoint/token_graph_dynamic_decoder_v3.pt
dataset_metadata/tokenizer.json
dataset_metadata/manifest.json
training_summary_stagec_public.json
docs/TMCRA_TOKENGRAPH_STAGEC_TECHNICAL_OVERVIEW.md
docs/TMCRA_TOKENGRAPH_STAGEC_TECHNICAL_OVERVIEW_ZH.md
docs/STAGEC_DETAILED_BENCHMARK_SMOKE_20260606.md
MODEL_CARD.md
PACKAGE_MANIFEST.md
SHA256SUMS.txt
```
## Full-Chain Training Code
The GitHub source repository now includes the full-chain Stage C training path:
- open-corpus schema2 conversion scripts;
- optional semantic teacher annotation through OpenAI-compatible or local Hugging Face models;
- token-level reasoning graph builders;
- `simple_plus_causal_target` graph mode;
- Stage C training and checkpoint continuation;
- graph ablation and token attribution evaluation.
Start from:
```text
docs/FULL_CHAIN_TRAINING.md
docs/FULL_CHAIN_TRAINING_ZH.md
scripts/run_stagec_full_chain_template.sh
scripts/run_stagec_sharded_training_template.sh
```
## How Next-Token Generation Works
Stage C predicts the next token through a graph-native causal path:
```mermaid
flowchart TD
A["Text / prompt / source segments / target text"] --> B["Tokenizer"]
B --> C["Token Graph Builder"]
C --> D["Token nodes"]
C --> E["Typed candidate edges"]
D --> G["TokenGraphEncoderV3"]
E --> G
G --> H["Encoded context graph states"]
H --> I["Dynamic Token Graph Decoder"]
I --> J["Generated token node"]
J --> I
I --> K["Next-token distribution"]
```
```text
schema2 text
-> token graph nodes and typed candidate edges
-> learned edge-gated graph propagation
-> dynamic generated-token graph nodes
-> prefix-edge + context-edge gated decoding
-> vocabulary logits
```
The graph builder proposes token nodes and typed candidate edges. The model then learns edge gates, propagates messages through the token graph, scores context nodes, and decodes each generated token as a dynamic graph node. The decoder combines a learned prefix message from previous generated-token nodes with a learned context message from encoded graph nodes, then maps the updated graph-decoder state to next-token logits. This keeps the main objective as next-token prediction while making generation depend on typed graph structure rather than Transformer self-attention.
Single-step decoding:
```mermaid
flowchart LR
A["Encoded context graph<br/>N nodes"] --> D["Context gate"]
B["Generated prefix nodes<br/>window W"] --> C["Prefix gate"]
C --> E["Generated-token node t"]
D --> E
E --> F["Graph decoder update"]
F --> G["Vocabulary logits"]
G --> H["next token"]
H --> I["Append as graph node"]
```
## Complexity Growth
Dense Transformer self-attention grows roughly as:
```text
O(n^2 * d)
```
Stage C replaces sequence-wide all-token attention with graph candidate edges and dynamic graph decoding:
```text
Graph encoder: O(L_g * (N + E) * d)
Dynamic prefix path: O(L_d * T * W * d)
Context tunnel path: O(L_d * T * N * d)
```
where `N` is context graph nodes, `E` is candidate edges, `T` is generated length, and `W` is the bounded generated-prefix window. The current context tunnel still scans encoded context nodes; the accurate claim is not constant-time generation, but replacing dense sequence-wide self-attention with sparse typed graph propagation plus explicit context tunneling.
## Current Capability Boundary
Stage C can generate early English story-style continuations and shows measurable dependence on typed graph edges. It is not a reliable production LLM. Current weak areas include exact factual QA, numeric reasoning, robust instruction following, grammar stability, multilingual generation, and long-range concept binding.
Smoke results:
| evaluation | result |
|---|---:|
| Stage C normal total loss | 6.512117 |
| Stage C normal LM loss | 4.641285 |
| no_edges total loss | 8.310654 |
| shuffle_edges total loss | 7.702783 |
| TinyStories avg words | 73.88 |
| BLiMP smoke | 59%-64% |
These are smoke tests, not leaderboard claims.
## License
MIT.