TralalabsLM-160M-Test

TralalabsLM-160M-Test is a small experimental causal language model trained as a smoke-test checkpoint for the TralalabsLM model family.

This is not a production model. It was trained for a very small test run to verify that the training script, Modal GPU environment, FineWeb-Edu streaming pipeline, Qwen2.5 tokenizer, saving, and downloading workflow all work correctly.

Model Details

Field Value
Model name TralalabsLM-160M-Test
Model family TralalabsLM
Model type Causal Language Model
Architecture GPT-2 style decoder-only Transformer
Parameters 157,459,840
Rounded size 160M
Context length 2048 tokens
Tokenizer Qwen/Qwen2.5-0.5B tokenizer
Training dataset HuggingFaceFW/fineweb-edu
Training tokens 1,048,576 tokens
Optimizer updates 4
Framework PyTorch + Transformers
Training platform Modal.com
GPU used NVIDIA L40S

Intended Use

This checkpoint is intended for:

  • verifying a training pipeline
  • testing model loading with transformers
  • checking tokenizer compatibility
  • debugging text generation
  • validating upload/download workflow
  • serving as a tiny TralalabsLM test artifact

It is not intended for serious assistant use, factual answering, coding help, production deployment, safety-critical use, or benchmark comparisons.

Training Data

This model was trained on a very small streamed sample from:

HuggingFaceFW/fineweb-edu

FineWeb-Edu is an educational-quality web dataset derived from FineWeb/Common Crawl filtering. The dataset is released under the Open Data Commons Attribution License, ODC-By v1.0, and its use is also subject to Common Crawl's Terms of Use.

Because this model only saw 1,048,576 tokens, it should be treated as a technical smoke-test checkpoint, not a meaningful pretrained language model.

Tokenizer

This model uses the tokenizer from:

Qwen/Qwen2.5-0.5B

Qwen2.5 uses a unified vocabulary with 151,665 total tokens, including control tokens for general text, chat, tool use, vision, and coding.

How to Load

from transformers import AutoModelForCausalLM, AutoTokenizer

model_id = "Tralalabs/TralalabsLM-160M-Test"

tokenizer = AutoTokenizer.from_pretrained(
    model_id,
    trust_remote_code=True,
)

model = AutoModelForCausalLM.from_pretrained(
    model_id,
    trust_remote_code=True,
)

prompt = "The future of small language models is"
inputs = tokenizer(prompt, return_tensors="pt")

outputs = model.generate(
    **inputs,
    max_new_tokens=80,
    do_sample=True,
    temperature=0.8,
    top_p=0.95,
)

print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Expected Quality

This model is expected to produce low-quality, unstable, or repetitive text because it was trained for only 1M tokens.

The purpose of this release is pipeline validation, not final model quality.

Expected behavior:

  • may repeat text
  • may output broken grammar
  • may hallucinate
  • may fail at instructions
  • may produce nonsense
  • may show weak world knowledge
  • may not follow prompts reliably

Limitations

This model should not be used for:

  • medical, legal, financial, or safety-critical advice
  • factual answering
  • public chatbot deployment
  • moderation
  • autonomous agents
  • high-trust tasks
  • benchmark claims

The model may reproduce biases, toxicity, or unsafe patterns present in web-scale training data.

Training Configuration

Approximate configuration used for the smoke-test run:

n_embd: 640
n_layer: 12
n_head: 10
context_length: 2048
micro_batch_size: 8
gradient_accumulation_steps: 16
learning_rate: 3e-4
weight_decay: 0.10
dataset: HuggingFaceFW/fineweb-edu
tokenizer: Qwen/Qwen2.5-0.5B
tokens_seen: 1,048,576
update_steps: 4
parameters: 157,459,840

Environmental Impact

This was a short smoke-test run on a single Modal L40S GPU. No full-scale training run was performed for this checkpoint.

License

This model is released under the Apache License 2.0.

Attribution

Training data:

HuggingFaceFW/fineweb-edu

Tokenizer:

Qwen/Qwen2.5-0.5B

Citation

If you use this model, cite it as:

@misc{tralalabslm160mtest,
  title = {TralalabsLM-160M-Test},
  author = {Tralalabs},
  year = {2026},
  url = {https://huggingface.co/Tralalabs/TralalabsLM-160M-Test}
}

Status

This is an early experimental checkpoint.

Recommended next model:

TralalabsLM-160M-Base

trained on a much larger token budget.

Downloads last month
48
Safetensors
Model size
0.2B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Dataset used to train Tralalabs/TralalabsLM-160M-Test

Collection including Tralalabs/TralalabsLM-160M-Test