| # {Model Name} |
|
|
| {Model Name} is a {X}B parameter language model trained on [{dataset}]({link}) as part of [{project/suite name}]({paper link}). {What makes this release distinctive -- e.g., "All intermediate training checkpoints are publicly available to support research on training dynamics, memorization, and emergent capabilities."} |
|
|
| {Paragraph on research motivation: what question does this model help answer? What gap does it fill in the ecosystem? See [our paper]({paper link}) for full details.} |
|
|
| <details> |
| <summary><b>Quick Start</b></summary> |
|
|
| ```python |
| from transformers import AutoModelForCausalLM, AutoTokenizer |
| import torch |
| |
| model = AutoModelForCausalLM.from_pretrained( |
| "EleutherAI/{model-name}", |
| torch_dtype=torch.float16, |
| device_map="auto", |
| ) |
| tokenizer = AutoTokenizer.from_pretrained("EleutherAI/{model-name}") |
| |
| # Perplexity on a passage |
| inputs = tokenizer("your text here", return_tensors="pt").to(model.device) |
| with torch.no_grad(): |
| loss = model(**inputs, labels=inputs["input_ids"]).loss |
| perplexity = torch.exp(loss) |
| print(f"Perplexity: {perplexity.item():.2f}") |
| |
| # Generation |
| outputs = model.generate(**inputs, max_new_tokens=128, temperature=0.7, do_sample=True) |
| print(tokenizer.decode(outputs[0], skip_special_tokens=True)) |
| ``` |
|
|
| Loading in fp16 requires approximately {X} GB of GPU memory. The full fp32 weights are {X} GB on disk. |
|
|
| </details> |
|
|
| ## Accessing Intermediate Checkpoints |
|
|
| One of the key features of this release is the availability of {N} intermediate training checkpoints, from initialization (step 0) through the final training step ({step N}). These are stored as branches in this repository. |
|
|
| ```python |
| from transformers import AutoModelForCausalLM |
| |
| # Load the model at step 1000 |
| model = AutoModelForCausalLM.from_pretrained("EleutherAI/{model-name}", revision="step1000") |
| ``` |
|
|
| Checkpoints were saved every {N} steps. The `main` branch contains the final checkpoint. {Note any exceptions or irregularities in the checkpoint schedule.} |
|
|
| This makes {Model Name} suitable for research on: |
| - How model capabilities develop over the course of training |
| - Memorization and forgetting dynamics |
| - The effect of specific training data on model behavior |
| - Checkpoint-level analysis of emergent properties |
|
|
| ## Architecture |
|
|
| {Model Name} uses a {transformer variant} architecture with {N} layers, a hidden dimension of {N}, and {N} attention heads, for a total of {X}B parameters. {Any notable choices: positional encoding scheme, activation function, tied embeddings, etc. and why.} |
|
|
| The full architectural specification: |
|
|
| | | | |
| |---|---| |
| | Parameters | {X}B | |
| | Layers | {N} | |
| | Hidden Dimension | {N} | |
| | Attention Heads | {N} | |
| | Context Length | {N} tokens | |
| | Vocabulary Size | {N} | |
|
|
| ## Training |
|
|
| ### Data |
|
|
| {Model Name} was trained on [{dataset name}]({link}), a {size in tokens}-token dataset consisting of {description}. {How the dataset was constructed, any filtering or deduplication, known characteristics.} |
|
|
| {Known biases or issues in the training data and their expected impact on model behavior.} |
|
|
| ### Procedure |
|
|
| Training used [GPT-NeoX](https://github.com/EleutherAI/gpt-neox) with [DeeperSpeed](https://github.com/EleutherAI/DeeperSpeed) on {N}x {GPU type} GPUs. The model was trained for {N} steps ({N} tokens) with a batch size of {N} tokens, using {optimizer} with a peak learning rate of {lr} and {schedule} schedule. |
|
|
| The complete training configuration is available at [{config file}]({link}). {Any notable training decisions: why this LR, why this batch size, any restarts or interventions during training.} |
|
|
| ## Evaluation |
|
|
| We evaluate {Model Name} using the [Language Model Evaluation Harness](https://github.com/EleutherAI/lm-evaluation-harness). |
|
|
| | Benchmark | Score | What it measures | |
| |---|---|---| |
| | {name} | {score} | {description} | |
|
|
| {Commentary on results: how does this compare to models of similar size? Any surprising results? Caveats about particular benchmarks?} |
|
|
| ## Limitations and Intended Use |
|
|
| {Model Name} is a raw language model released for **research purposes**. It has not been fine-tuned for instruction following, safety, or any particular downstream task. |
|
|
| **This model will produce biased, offensive, and factually incorrect text.** It reflects the biases present in its training data. Do not rely on it for factual accuracy or use it in any setting where its outputs could cause harm. |
|
|
| Intended research applications include {list of 2-3 specific research use cases this model is well-suited for}. |
|
|
| ## Reproducing This Model |
|
|
| {Model Name} is fully reproducible. {Description of what "reproducible" means here: same data order, same config, same results up to hardware nondeterminism.} |
|
|
| 1. Clone [GPT-NeoX](https://github.com/EleutherAI/gpt-neox) at `{version}` |
| 2. {Data setup -- link to data if preprocessed, or preprocessing instructions} |
| 3. {Config and launch instructions} |
|
|
| {Any known reproduction issues or tips.} |
|
|
| ## Citation |
|
|
| If you use this model in your research, please cite: |
|
|
| ```bibtex |
| @article{...} |
| ``` |
|
|
| ## About EleutherAI |
|
|
| [EleutherAI](https://eleuther.ai) is a grassroots research collective focused on open-source AI research. Find us on [Discord](https://discord.gg/eleutherai) or [GitHub](https://github.com/EleutherAI). |
|
|
| **Related resources:** |
| - [{Paper title}]({link}) |
| - [{Training data}]({link}) |
| - [{Code}]({link}) |
| - [{Related models}]({links}) |
|
|