Open LM 3B — Knowledge Cutoff January 2017

This is a HuggingFace-format conversion of the Apple Open LM 3B oracle model trained with a knowledge cutoff of January 2017, from the TiC-LM (Time-Continual Language Modeling) project.

Model Details

Property	Value
Architecture	LLaMA-style (pre-norm, SwiGLU, RoPE)
Parameters	~2.7B
Training tokens	220B
Knowledge cutoff	January 2017
Vocab size	50,432
Context length	2,048
Original format	Apple Open LM

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

model = AutoModelForCausalLM.from_pretrained(
    "dogtooth/open-lm-3b-201701",
    dtype=torch.bfloat16,
    device_map="auto",
    trust_remote_code=True,
)
tokenizer = AutoTokenizer.from_pretrained("EleutherAI/gpt-neox-20b")

Conversion Notes

Converted from the original Open LM .pt checkpoint to a custom OpenLMForCausalLM format.
Uses LayerNorm (not RMSNorm) to match the original Open LM training.
Includes QK norm (LayerNorm on Q and K projections before attention).
Architecture dimensions are auto-detected from checkpoint weights.
Requires trust_remote_code=True when loading.

Citation

@article{jain2024ticlm,
  title={Time-Continual Learning from a Streaming Language Model},
  author={Jain, Ameya and Ramesh, Aakanksha and Li, Tianjian and others},
  journal={arXiv preprint arXiv:2410.14660},
  year={2024}
}

Downloads last month: 142

Safetensors

Model size

3B params

Tensor type

F32

Model tree for dogtooth/open-lm-3b-201701

Finetunes

1 model

Paper for dogtooth/open-lm-3b-201701

A Large Language Model-Driven Reward Design Framework via Dynamic Feedback for Reinforcement Learning

Paper • 2410.14660 • Published Oct 18, 2024