m-mia/pythia-12b-deduped

This is an unofficial safetensors mirror of EleutherAI/pythia-12b-deduped.

EleutherAI did not upload safetensors shards to the original repo as of 2026-05-28 — only pytorch_model-*.bin files exist there. This caused multi-process inference loads to bottleneck on disk because pickle deserialization is in-process work that doesn't share well via OS page cache. Loading 8 sweep procs of pythia-12b-deduped from .bin took ~50 min on local NVMe; safetensors zero-copy mmap drops that to ~5 min.

Origin

Weights are bit-identical to EleutherAI/pythia-12b-deduped — converted with:

model = AutoModelForCausalLM.from_pretrained("EleutherAI/pythia-12b-deduped", torch_dtype=torch.float16)
model.save_pretrained(out_dir, safe_serialization=True, max_shard_size="10GB")

All credit and license attribution belong to EleutherAI. See the upstream model card for training details, evaluation, and citation. License (Apache-2.0) follows the upstream repo.

Citation

Cite the original Pythia paper:

@misc{biderman2023pythia,
    title={Pythia: A Suite for Analyzing Large Language Models Across Training and Scaling},
    author={Stella Biderman and Hailey Schoelkopf and others},
    year={2023},
    eprint={2304.01373},
    archivePrefix={arXiv},
}

Downloads last month: 123

Safetensors

Model size

12B params

Tensor type

F16

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for m-mia/pythia-12b-deduped

Base model

EleutherAI/pythia-12b-deduped

Finetuned

(3)

this model

Paper for m-mia/pythia-12b-deduped

Pythia: A Suite for Analyzing Large Language Models Across Training and Scaling

Paper • 2304.01373 • Published Apr 3, 2023 • 9