Language Decoded LoRA — Condition 1: English Code

English Python code from The Stack Dedup (5k subset). Tests whether code fine-tuning improves multilingual reasoning (replicates Aryabumi et al., 2024).

Part of the Language Decoded project (Cohere's Tiny Aya Expedition).

For full experiment details, see the Language Decoded LoRA hub.

Training Data

legesher/language-decoded-data / condition-1-en-5k

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel

base_model = AutoModelForCausalLM.from_pretrained("CohereLabs/tiny-aya-base")
tokenizer = AutoTokenizer.from_pretrained("CohereLabs/tiny-aya-base")
model = PeftModel.from_pretrained(base_model, "legesher/language-decoded-lora-condition-1-en-5k")

Citation

@misc{language-decoded-2026,
  title={Language Decoded: Investigating Language-Dependent vs. Structure-Dependent Reasoning Benefits of Code},
  author={Madison Edgar and Saad Ahmed Bazaz and Tom Sherborne and Rashik Shahjahan and Khojasteh Mirza and Sarah Jawaid and Rafay Mustafa and Sohaib Ahmed Bazaz},
  year={2026},
  publisher={Hugging Face},
  url={https://huggingface.co/legesher/language-decoded-lora}
}

License

Apache 2.0

Downloads last month: -; Downloads are not tracked for this model. How to track

Model tree for legesher/language-decoded-lora-condition-1-en-5k

Base model

CohereLabs/tiny-aya-base

Adapter

(7)

this model