m-mia/pythia-12b-deduped
This is an unofficial safetensors mirror of EleutherAI/pythia-12b-deduped.
EleutherAI did not upload safetensors shards to the original repo as of 2026-05-28
— only pytorch_model-*.bin files exist there. This caused multi-process inference
loads to bottleneck on disk because pickle deserialization is in-process work that
doesn't share well via OS page cache. Loading 8 sweep procs of pythia-12b-deduped
from .bin took ~50 min on local NVMe; safetensors zero-copy mmap drops that to ~5 min.
Origin
Weights are bit-identical to EleutherAI/pythia-12b-deduped — converted with:
model = AutoModelForCausalLM.from_pretrained("EleutherAI/pythia-12b-deduped", torch_dtype=torch.float16)
model.save_pretrained(out_dir, safe_serialization=True, max_shard_size="10GB")
All credit and license attribution belong to EleutherAI. See the upstream model card for training details, evaluation, and citation. License (Apache-2.0) follows the upstream repo.
Citation
Cite the original Pythia paper:
@misc{biderman2023pythia,
title={Pythia: A Suite for Analyzing Large Language Models Across Training and Scaling},
author={Stella Biderman and Hailey Schoelkopf and others},
year={2023},
eprint={2304.01373},
archivePrefix={arXiv},
}
- Downloads last month
- 123
Model tree for m-mia/pythia-12b-deduped
Base model
EleutherAI/pythia-12b-deduped