test / README.md

Update README.md

8f13998 verified about 2 months ago

5.37 kB

	# {Model Name}

	{Model Name} is a {X}B parameter language model trained on [{dataset}]({link}) as part of [{project/suite name}]({paper link}). {What makes this release distinctive -- e.g., "All intermediate training checkpoints are publicly available to support research on training dynamics, memorization, and emergent capabilities."}

	{Paragraph on research motivation: what question does this model help answer? What gap does it fill in the ecosystem? See [our paper]({paper link}) for full details.}

	<details>
	<summary><b>Quick Start</b></summary>

	```python
	from transformers import AutoModelForCausalLM, AutoTokenizer
	import torch

	model = AutoModelForCausalLM.from_pretrained(
	"EleutherAI/{model-name}",
	torch_dtype=torch.float16,
	device_map="auto",
	)
	tokenizer = AutoTokenizer.from_pretrained("EleutherAI/{model-name}")

	# Perplexity on a passage
	inputs = tokenizer("your text here", return_tensors="pt").to(model.device)
	with torch.no_grad():
	loss = model(**inputs, labels=inputs["input_ids"]).loss
	perplexity = torch.exp(loss)
	print(f"Perplexity: {perplexity.item():.2f}")

	# Generation
	outputs = model.generate(**inputs, max_new_tokens=128, temperature=0.7, do_sample=True)
	print(tokenizer.decode(outputs[0], skip_special_tokens=True))
	```

	Loading in fp16 requires approximately {X} GB of GPU memory. The full fp32 weights are {X} GB on disk.

	</details>

	## Accessing Intermediate Checkpoints

	One of the key features of this release is the availability of {N} intermediate training checkpoints, from initialization (step 0) through the final training step ({step N}). These are stored as branches in this repository.

	```python
	from transformers import AutoModelForCausalLM

	# Load the model at step 1000
	model = AutoModelForCausalLM.from_pretrained("EleutherAI/{model-name}", revision="step1000")
	```

	Checkpoints were saved every {N} steps. The `main` branch contains the final checkpoint. {Note any exceptions or irregularities in the checkpoint schedule.}

	This makes {Model Name} suitable for research on:
	- How model capabilities develop over the course of training
	- Memorization and forgetting dynamics
	- The effect of specific training data on model behavior
	- Checkpoint-level analysis of emergent properties

	## Architecture

	{Model Name} uses a {transformer variant} architecture with {N} layers, a hidden dimension of {N}, and {N} attention heads, for a total of {X}B parameters. {Any notable choices: positional encoding scheme, activation function, tied embeddings, etc. and why.}

	The full architectural specification:

	\| \| \|
	\|---\|---\|
	\| Parameters \| {X}B \|
	\| Layers \| {N} \|
	\| Hidden Dimension \| {N} \|
	\| Attention Heads \| {N} \|
	\| Context Length \| {N} tokens \|
	\| Vocabulary Size \| {N} \|

	## Training

	### Data

	{Model Name} was trained on [{dataset name}]({link}), a {size in tokens}-token dataset consisting of {description}. {How the dataset was constructed, any filtering or deduplication, known characteristics.}

	{Known biases or issues in the training data and their expected impact on model behavior.}

	### Procedure

	Training used [GPT-NeoX](https://github.com/EleutherAI/gpt-neox) with [DeeperSpeed](https://github.com/EleutherAI/DeeperSpeed) on {N}x {GPU type} GPUs. The model was trained for {N} steps ({N} tokens) with a batch size of {N} tokens, using {optimizer} with a peak learning rate of {lr} and {schedule} schedule.

	The complete training configuration is available at [{config file}]({link}). {Any notable training decisions: why this LR, why this batch size, any restarts or interventions during training.}

	## Evaluation

	We evaluate {Model Name} using the [Language Model Evaluation Harness](https://github.com/EleutherAI/lm-evaluation-harness).

	\| Benchmark \| Score \| What it measures \|
	\|---\|---\|---\|
	\| {name} \| {score} \| {description} \|

	{Commentary on results: how does this compare to models of similar size? Any surprising results? Caveats about particular benchmarks?}

	## Limitations and Intended Use

	{Model Name} is a raw language model released for research purposes. It has not been fine-tuned for instruction following, safety, or any particular downstream task.

	This model will produce biased, offensive, and factually incorrect text. It reflects the biases present in its training data. Do not rely on it for factual accuracy or use it in any setting where its outputs could cause harm.

	Intended research applications include {list of 2-3 specific research use cases this model is well-suited for}.

	## Reproducing This Model

	{Model Name} is fully reproducible. {Description of what "reproducible" means here: same data order, same config, same results up to hardware nondeterminism.}

	1. Clone [GPT-NeoX](https://github.com/EleutherAI/gpt-neox) at `{version}`
	2. {Data setup -- link to data if preprocessed, or preprocessing instructions}
	3. {Config and launch instructions}

	{Any known reproduction issues or tips.}

	## Citation

	If you use this model in your research, please cite:

	```bibtex
	@article{...}
	```

	## About EleutherAI

	[EleutherAI](https://eleuther.ai) is a grassroots research collective focused on open-source AI research. Find us on [Discord](https://discord.gg/eleutherai) or [GitHub](https://github.com/EleutherAI).

	Related resources:
	- [{Paper title}]({link})
	- [{Training data}]({link})
	- [{Code}]({link})
	- [{Related models}]({links})