TRM-textv3.5
TRM-textv3.5 is an experimental recursive Transformer language model derived from summerMC/TRM-textv3.
This checkpoint was improved through a staged rehabilitation and curriculum SFT process focused on reducing repetition collapse and improving short instruction-following behavior.
Model
- Base:
summerMC/TRM-textv3 - Architecture: TRM / recursive Transformer style causal language model
- Task: causal language modeling and instruction-style text generation
- Context length: 512 tokens
- Precision used during training: bfloat16
Training Pipeline
The model was improved using the following process:
TRM-textv3-grpo
→ hard rescue SFT
→ small curriculum SFT
→ TRM-textv3.5
The curriculum stage used response-only supervised fine-tuning with short QA, explanation, translation, and code-generation style examples.
A repetition unlikelihood term was used during rescue/curriculum training to reduce repeated-token degeneration such as:
common common common
learning learning learning
Intended Use
This model is intended for research experiments on:
- recursive Transformer language modeling
- small language model rehabilitation
- instruction tuning
- output-head collapse recovery
- repetition-collapse mitigation
- experimental LLM miniaturization
Example
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch
model_id = "summerMC/TRM-textv3.5"
tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
model_id,
trust_remote_code=True,
torch_dtype=torch.bfloat16,
device_map="auto",
)
prompt = "User: What is Python?\nAssistant:"
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
with torch.no_grad():
outputs = model.generate(
**inputs,
max_new_tokens=128,
do_sample=False,
repetition_penalty=1.25,
no_repeat_ngram_size=4,
pad_token_id=tokenizer.eos_token_id,
eos_token_id=tokenizer.eos_token_id,
)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
Recommended Generation Settings
do_sample = False
repetition_penalty = 1.2
no_repeat_ngram_size = 4
max_new_tokens = 64 to 128
For sampling:
do_sample = True
temperature = 0.6
top_p = 0.9
repetition_penalty = 1.25
no_repeat_ngram_size = 4
Limitations
This is an experimental research checkpoint.
Known limitations:
- weak factual knowledge
- limited reasoning ability
- unstable long-form generation
- possible repetition under poor decoding settings
- may produce incorrect code
- not suitable for production use
Notes
This model is part of the summerAI TRM research line investigating whether recursive/shared-block Transformer models can be made more language-model-like through staged rehabilitation, curriculum SFT, and output-distribution repair.
- Downloads last month
- 964