Viharikvs
/

CMBATRM

Text Generation

Model card Files Files and versions

CMBATRM / README.md

Viharikvs's picture

Model card updated after epoch 1

3363076 verified 3 months ago

|

history blame contribute delete

886 Bytes

	---
	base_model: t5-small
	tags: [trm, act, recursive, text-generation, wikitext]
	metrics: [loss, lm_loss, ponder_loss, perplexity_lm]
	---
	# TRM-Text1 (ACT)

	TRM-Text1 (ACT) is a causal language model based on a Tiny Recursive Reasoning Model (TRM) with Adaptive Computation Time (ACT) for per-token variable depth.

	- Architecture: TRM (causal) + ACT halting
	- Training Data: wikitext-103-raw-v1
	- Tokenizer: t5-small (SentencePiece)
	- Vocab Size: 32100
	- Objective: Causal Language Modeling (next-token)
	- Seq Len: 1024

	Note: This model uses the T5 SentencePiece tokenizer. Perplexity numbers on WT103
	reported here are not directly comparable to GPT-2 BPE-based PPLs.

	### Latest Performance (Epoch 1)
	- Validation Loss: 4.8248
	- Validation LM Loss: 4.8149
	- Validation Ponder Loss: 1.0064
	- Validation Perplexity (LM-only): 123.34