YAML Metadata Warning:empty or missing yaml metadata in repo card
Check out the documentation for more information.
π summerMC/TRM-textv3
TRM-textv3 is an ultra-lightweight (~84M parameters) custom Transformer model, meticulously optimized through a pipeline of Distillation, SFT, and DPO to deliver efficient conversational intelligence.
π Model Specifications
| Attribute | Detail |
|---|---|
| Model Type | Causal Language Model |
| Architecture | trm_text_ism (Single-layer Efficiency) |
| Parameters | 84,312,320 (~84M) |
| Vocabulary Size | 50,259 |
| Sequence Length | 512 tokens |
| Precision | bfloat16 |
β‘ Training Paradigm
- Distillation: Knowledge extraction from high-capacity teacher models to define the foundation.
- SFT (Supervised Fine-Tuning): Adapted using the
OpenHermes-2.5dataset for refined chat-based interaction. - DPO (Direct Preference Optimization): Final alignment stage to enhance response coherence and mitigate hallucinations.
π Usage
from transformers import AutoTokenizer, AutoModelForCausalLM
model_id = "summerMC/TRM-textv3"
tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(model_id, trust_remote_code=True)
prompt = "User: Hello! How can you help me?\nAssistant:"
inputs = tokenizer(prompt, return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=64, temperature=0.4)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
β οΈ Limitations & Notes
- Primary Language: English. While it has some exposure to Japanese, complex multi-turn dialogue in Japanese is considered experimental.
- Scale: Due to its extreme 84M scale, it is best suited for narrative assistance, simple logic tasks, and edge-device deployment where latency is critical.
- Safety: Always apply a safety layer when deploying in production environments.
- Downloads last month
- 364
Inference Providers NEW
This model isn't deployed by any Inference Provider. π Ask for provider support