A Large Language Model-Driven Reward Design Framework via Dynamic Feedback for Reinforcement Learning
Paper
โข
2410.14660
โข
Published
This is a HuggingFace-format conversion of the Apple Open LM 1B oracle model trained with a knowledge cutoff of July 2024, from the TiC-LM (Time-Continual Language Modeling) project.
| Property | Value |
|---|---|
| Architecture | LLaMA-style (pre-norm, SwiGLU, RoPE) |
| Parameters | ~1.4B |
| Training tokens | 220B |
| Knowledge cutoff | July 2024 |
| Vocab size | 50,432 |
| Context length | 2,048 |
| Original format | Apple Open LM |
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
model = AutoModelForCausalLM.from_pretrained(
"dogtooth/open-lm-1b-202407",
dtype=torch.bfloat16,
device_map="auto",
trust_remote_code=True,
)
tokenizer = AutoTokenizer.from_pretrained("EleutherAI/gpt-neox-20b")
.pt checkpoint to a custom OpenLMForCausalLM format.trust_remote_code=True when loading.@article{jain2024ticlm,
title={Time-Continual Learning from a Streaming Language Model},
author={Jain, Ameya and Ramesh, Aakanksha and Li, Tianjian and others},
journal={arXiv preprint arXiv:2410.14660},
year={2024}
}