CPT merged full models — run `t3pilot` (t3 cross-lingual experiment)

Standalone full models = base (meta-llama/Llama-3.1-8B) with the trained LoRA adapter merged in (r=64, lr=5e-5, 30% English mixed stream, 2 epochs, frozen embeddings/lm_head). Load directly with AutoModelForCausalLM.from_pretrained, no PEFT. Per-language eval losses are in manifest.json.

Load

from transformers import AutoModelForCausalLM, AutoTokenizer
mid = "the-cramer-project/cpt-models-t3"
sub = "Llama-3.1-8B/FT-KY"
model = AutoModelForCausalLM.from_pretrained(mid, subfolder=sub, torch_dtype="bfloat16")
tok   = AutoTokenizer.from_pretrained(mid, subfolder=sub)

Models

Subfolder	Base	Language	LoRA r	LR	Target eval loss
`Llama-3.1-8B/FT-KY`	meta-llama/Llama-3.1-8B	Kyrgyz	64	5e-05	1.021923542022705
`Llama-3.1-8B/FT-KZ`	meta-llama/Llama-3.1-8B	Kazakh	64	5e-05	1.0028022527694702

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

CPT merged full models — run t3pilot (t3 cross-lingual experiment)

Load

Models

CPT merged full models — run `t3pilot` (t3 cross-lingual experiment)