Open LM 3B โ€” Knowledge Cutoff May 2013

This is a HuggingFace-format conversion of the Apple Open LM 3B oracle model trained with a knowledge cutoff of May 2013, from the TiC-LM (Time-Continual Language Modeling) project.

Model Details

Property Value
Architecture LLaMA-style (pre-norm, SwiGLU, RoPE)
Parameters ~2.7B
Training tokens 220B
Knowledge cutoff May 2013
Vocab size 50,432
Context length 2,048
Original format Apple Open LM

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

model = AutoModelForCausalLM.from_pretrained(
    "dogtooth/open-lm-3b-201305",
    dtype=torch.bfloat16,
    device_map="auto",
    trust_remote_code=True,
)
tokenizer = AutoTokenizer.from_pretrained("EleutherAI/gpt-neox-20b")

Conversion Notes

  • Converted from the original Open LM .pt checkpoint to a custom OpenLMForCausalLM format.
  • Uses LayerNorm (not RMSNorm) to match the original Open LM training.
  • Includes QK norm (LayerNorm on Q and K projections before attention).
  • Architecture dimensions are auto-detected from checkpoint weights.
  • Requires trust_remote_code=True when loading.

Citation

@article{jain2024ticlm,
  title={Time-Continual Learning from a Streaming Language Model},
  author={Jain, Ameya and Ramesh, Aakanksha and Li, Tianjian and others},
  journal={arXiv preprint arXiv:2410.14660},
  year={2024}
}
Downloads last month
41
Safetensors
Model size
3B params
Tensor type
F32
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Paper for dogtooth/open-lm-3b-201305