Viharikvs commited on
Commit
90fc620
·
verified ·
1 Parent(s): e44c389

Model card updated after epoch 0

Browse files
Files changed (1) hide show
  1. README.md +24 -0
README.md ADDED
@@ -0,0 +1,24 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ base_model: t5-small
3
+ tags: [trm, act, recursive, text-generation, wikitext]
4
+ metrics: [loss, lm_loss, ponder_loss, perplexity_lm]
5
+ ---
6
+ # TRM-Text1 (ACT)
7
+
8
+ **TRM-Text1 (ACT)** is a causal language model based on a **Tiny Recursive Reasoning Model (TRM)** with **Adaptive Computation Time (ACT)** for per-token variable depth.
9
+
10
+ - **Architecture:** TRM (causal) + ACT halting
11
+ - **Training Data:** wikitext-103-raw-v1
12
+ - **Tokenizer:** t5-small (SentencePiece)
13
+ - **Vocab Size:** 32100
14
+ - **Objective:** Causal Language Modeling (next-token)
15
+ - **Seq Len:** 1024
16
+
17
+ Note: This model uses the T5 SentencePiece tokenizer. Perplexity numbers on WT103
18
+ reported here are not directly comparable to GPT-2 BPE-based PPLs.
19
+
20
+ ### Latest Performance (Epoch 0)
21
+ - **Validation Loss**: 4.8829
22
+ - **Validation LM Loss**: 4.8728
23
+ - **Validation Ponder Loss**: 1.0091
24
+ - **Validation Perplexity (LM-only)**: 130.69