2009YU

Rename Stage C model package artifact

94f2f56 verified 11 days ago

5.54 kB

	---
	license: mit
	library_name: pytorch
	tags:
	- graph-neural-network
	- language-model
	- causal-lm
	- experimental
	- token-graph
	pipeline_tag: text-generation
	---

	# TMCRA TokenGraph-LLM Stage C

	TMCRA TokenGraph-LLM is an experimental graph-native autoregressive language model prototype. It is not a Transformer wrapper and does not call an external LLM at inference time. Text is generated from token-level graph encoding, learned edge gates, graph message passing, and a dynamic graph causal decoder.

	This Hugging Face repository hosts model artifacts. Full source code, training scripts, graph builders, and documentation are published in the GitHub repository:

	[https://github.com/reshuibuduo/TMCRA-TokenGraph-LLM](https://github.com/reshuibuduo/TMCRA-TokenGraph-LLM)

	## Current Model

	Current default checkpoint:

	- release line: `v0.2.0-stagec`
	- package: `tmcra_tokengraph_stagec_model_package_20260606.zip`
	- checkpoint inside package: `checkpoint/token_graph_dynamic_decoder_v3.pt`
	- parameters: `114,615,372`
	- shape: `dim=512`, `graph_layers=8`, `decoder_layers=10`
	- embeddings: untied
	- precision during training: `bf16`
	- effective training samples: about `1.03M`
	- training steps: `62,000`
	- SHA256: `cc23285628eaed47c20009b6be6b5eb0600ded57ac2e09519370d97158fecd33`

	Legacy v0.1 files may still be present in this repository for historical comparison. The Stage C package is the current recommended artifact.

	## Package Contents

	The Stage C zip contains:

	```text
	checkpoint/token_graph_dynamic_decoder_v3.pt
	dataset_metadata/tokenizer.json
	dataset_metadata/manifest.json
	training_summary_stagec_public.json
	docs/TMCRA_TOKENGRAPH_STAGEC_TECHNICAL_OVERVIEW.md
	docs/TMCRA_TOKENGRAPH_STAGEC_TECHNICAL_OVERVIEW_ZH.md
	docs/STAGEC_DETAILED_BENCHMARK_SMOKE_20260606.md
	MODEL_CARD.md
	PACKAGE_MANIFEST.md
	SHA256SUMS.txt
	```

	## Full-Chain Training Code

	The GitHub source repository now includes the full-chain Stage C training path:

	- open-corpus schema2 conversion scripts;
	- optional semantic teacher annotation through OpenAI-compatible or local Hugging Face models;
	- token-level reasoning graph builders;
	- `simple_plus_causal_target` graph mode;
	- Stage C training and checkpoint continuation;
	- graph ablation and token attribution evaluation.

	Start from:

	```text
	docs/FULL_CHAIN_TRAINING.md
	docs/FULL_CHAIN_TRAINING_ZH.md
	scripts/run_stagec_full_chain_template.sh
	scripts/run_stagec_sharded_training_template.sh
	```

	## How Next-Token Generation Works

	Stage C predicts the next token through a graph-native causal path:

	```mermaid
	flowchart TD
	A["Text / prompt / source segments / target text"] --> B["Tokenizer"]
	B --> C["Token Graph Builder"]
	C --> D["Token nodes"]
	C --> E["Typed candidate edges"]
	D --> G["TokenGraphEncoderV3"]
	E --> G
	G --> H["Encoded context graph states"]
	H --> I["Dynamic Token Graph Decoder"]
	I --> J["Generated token node"]
	J --> I
	I --> K["Next-token distribution"]
	```

	```text
	schema2 text
	-> token graph nodes and typed candidate edges
	-> learned edge-gated graph propagation
	-> dynamic generated-token graph nodes
	-> prefix-edge + context-edge gated decoding
	-> vocabulary logits
	```

	The graph builder proposes token nodes and typed candidate edges. The model then learns edge gates, propagates messages through the token graph, scores context nodes, and decodes each generated token as a dynamic graph node. The decoder combines a learned prefix message from previous generated-token nodes with a learned context message from encoded graph nodes, then maps the updated graph-decoder state to next-token logits. This keeps the main objective as next-token prediction while making generation depend on typed graph structure rather than Transformer self-attention.

	Single-step decoding:

	```mermaid
	flowchart LR
	A["Encoded context graph<br/>N nodes"] --> D["Context gate"]
	B["Generated prefix nodes<br/>window W"] --> C["Prefix gate"]
	C --> E["Generated-token node t"]
	D --> E
	E --> F["Graph decoder update"]
	F --> G["Vocabulary logits"]
	G --> H["next token"]
	H --> I["Append as graph node"]
	```

	## Complexity Growth

	Dense Transformer self-attention grows roughly as:

	```text
	O(n^2 * d)
	```

	Stage C replaces sequence-wide all-token attention with graph candidate edges and dynamic graph decoding:

	```text
	Graph encoder: O(L_g * (N + E) * d)
	Dynamic prefix path: O(L_d * T * W * d)
	Context tunnel path: O(L_d * T * N * d)
	```

	where `N` is context graph nodes, `E` is candidate edges, `T` is generated length, and `W` is the bounded generated-prefix window. The current context tunnel still scans encoded context nodes; the accurate claim is not constant-time generation, but replacing dense sequence-wide self-attention with sparse typed graph propagation plus explicit context tunneling.

	## Current Capability Boundary

	Stage C can generate early English story-style continuations and shows measurable dependence on typed graph edges. It is not a reliable production LLM. Current weak areas include exact factual QA, numeric reasoning, robust instruction following, grammar stability, multilingual generation, and long-range concept binding.

	Smoke results:

	\| evaluation \| result \|
	\|---\|---:\|
	\| Stage C normal total loss \| 6.512117 \|
	\| Stage C normal LM loss \| 4.641285 \|
	\| no_edges total loss \| 8.310654 \|
	\| shuffle_edges total loss \| 7.702783 \|
	\| TinyStories avg words \| 73.88 \|
	\| BLiMP smoke \| 59%-64% \|

	These are smoke tests, not leaderboard claims.

	## License

	MIT.