CPT merged full models โ€” run t3pilot (t3 cross-lingual experiment)

Standalone full models = base (meta-llama/Llama-3.1-8B) with the trained LoRA adapter merged in (r=64, lr=5e-5, 30% English mixed stream, 2 epochs, frozen embeddings/lm_head). Load directly with AutoModelForCausalLM.from_pretrained, no PEFT. Per-language eval losses are in manifest.json.

Load

from transformers import AutoModelForCausalLM, AutoTokenizer
mid = "the-cramer-project/cpt-models-t3"
sub = "Llama-3.1-8B/FT-KY"
model = AutoModelForCausalLM.from_pretrained(mid, subfolder=sub, torch_dtype="bfloat16")
tok   = AutoTokenizer.from_pretrained(mid, subfolder=sub)

Models

Subfolder Base Language LoRA r LR Target eval loss
Llama-3.1-8B/FT-KY meta-llama/Llama-3.1-8B Kyrgyz 64 5e-05 1.021923542022705
Llama-3.1-8B/FT-KZ meta-llama/Llama-3.1-8B Kazakh 64 5e-05 1.0028022527694702
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support