Instructions to use wop/Cosmos-T2-80M-Test with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use wop/Cosmos-T2-80M-Test with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="wop/Cosmos-T2-80M-Test")# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("wop/Cosmos-T2-80M-Test", dtype="auto") - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use wop/Cosmos-T2-80M-Test with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "wop/Cosmos-T2-80M-Test" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "wop/Cosmos-T2-80M-Test", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/wop/Cosmos-T2-80M-Test
- SGLang
How to use wop/Cosmos-T2-80M-Test with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "wop/Cosmos-T2-80M-Test" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "wop/Cosmos-T2-80M-Test", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "wop/Cosmos-T2-80M-Test" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "wop/Cosmos-T2-80M-Test", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use wop/Cosmos-T2-80M-Test with Docker Model Runner:
docker model run hf.co/wop/Cosmos-T2-80M-Test
Cosmos-T2-80M-Test
Universal Kaggle-ready training notebook for the Cosmos-T2 series.
Notebook-generated card. Final metrics are filled after the Kaggle training run. This notebook is designed to stay Kaggle-friendly on 2x T4 GPUs. The goal is a reusable training recipe, not a production assistant.
Model Details
| Model class | CosmosT2_LLM |
| Architecture | Decoder-only Transformer with RoPE, RMSNorm, SwiGLU, GQA, and a configurable Engram memory path |
| Parameters | ~87.60 M |
| Layers | 12 |
| Attention heads | 8 |
| KV heads | 2 |
| d_model | 384 |
| FFN hidden | 1536 |
| Positional encoding | RoPE (rope_base=10000) |
| Normalization | RMSNorm |
| MLP | SwiGLU |
| Memory | Engram (use_engram=True, every 2 blocks) |
| Context length | 1028 |
| Training block size | 1028 |
| Tokenizer | Qwen/Qwen2.5-0.5B |
| Dataset | wop/XXXXXL-chain-of-thought |
| License | Apache-2.0 |
Why these choices
- RoPE keeps positional handling compact and avoids learned absolute embeddings.
- RMSNorm is cheaper and more stable than LayerNorm for this small decoder-only model.
- SwiGLU usually gives a better quality/compute tradeoff than a plain GELU MLP.
- GQA reduces KV cost while keeping multi-head query capacity.
- Engram gives the stack a lightweight explicit memory path for repeated reasoning patterns.
Training Summary
| Metric | Value |
|---|---|
| Rows used | 1000 |
| Approx. packed tokens | 177,844 |
| Epochs | 50 |
| Batch size | 6 |
| Peak LR | 3.00e-04 |
| Weight decay | 0.1 |
| Gradient clipping | 1.0 |
| Wall-clock time | 14m 14s |
| Final training loss | 0.0522 |
| Final training perplexity | 1.05 |
| Final validation loss | 4.2545 |
| Final validation perplexity | 70.43 |
| Best validation loss | 3.1329 |
| Best epoch | 8 |
Loss and perplexity
The notebook shows live loss and perplexity plots every 20 epochs and does not save the graph to disk.
How to Use
Quick start
import torch
from transformers import AutoTokenizer
from app import CosmosT2_LLM
tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen2.5-0.5B")
if tokenizer.pad_token is None:
tokenizer.pad_token = tokenizer.eos_token
ckpt = torch.load("$CHECKPOINT_NAME", map_location="cpu")
model = CosmosT2_LLM(**ckpt["config"])
model.load_state_dict(ckpt["model_state"])
model.eval()
prompt = tokenizer.apply_chat_template(
[
{"role": "system", "content": "Enable thinking features: INTUITION, COLD START, HOT START"},
{"role": "user", "content": "What is 12 * 7?"},
],
tokenize=False,
add_generation_prompt=True,
)
ids = tokenizer(prompt, return_tensors="pt", add_special_tokens=False).input_ids
out = model.generate(ids, max_new_tokens=120, temperature=0.8, top_k=50)
print(tokenizer.decode(out[0], skip_special_tokens=False))
Prompt format
Use the Qwen2.5 chat template. The default system prompt is:
Enable thinking features: INTUITION, COLD START, HOT START
The model will then emit a <think> block followed by an answer when it has enough signal.
Limitations
- The model is intentionally small and is still a research/demo artifact.
- Training on chain-of-thought data can overfit quickly if the corpus is tiny.
- Long-context behavior is limited by the configured block size.
- The model is not safety-aligned and should not be exposed as a public assistant without additional work.
Intended Use
- Research into small-scale pretraining and reasoning-style formatting
- Educational demos for decoder-only Transformer training
- Hugging Face Spaces or local inference demos
- Not for production use
Cosmos-T2 Series
This notebook is designed to train future Cosmos-T2 variants by changing only the config block at the top.
Citation
@misc{cosmos-t2-80m,
author = {wop},
title = {Cosmos-T2-80M: A small from-scratch chain-of-thought Transformer},
year = {2026},
publisher = {Hugging Face},
url = {https://huggingface.co/wop/Cosmos-T2-80M}
}
Acknowledgements
- Tokenizer from Qwen2.5 by Alibaba Cloud
- Training data from wop/XXXXXL-chain-of-thought
- Trained on Kaggle T4 GPUs
Dataset used to train wop/Cosmos-T2-80M-Test
Spaces using wop/Cosmos-T2-80M-Test 2
Collection including wop/Cosmos-T2-80M-Test
Evaluation results
- Final training loss (cross-entropy) on wop/XXXXXL-chain-of-thoughtself-reported0.052
- Final training perplexity on wop/XXXXXL-chain-of-thoughtself-reported1.050
- Final validation loss (cross-entropy) on wop/XXXXXL-chain-of-thoughtself-reported4.255
- Final validation perplexity on wop/XXXXXL-chain-of-thoughtself-reported70.430