Language Decoded LoRA โ Condition 1: English Code
English Python code from The Stack Dedup (5k subset). Tests whether code fine-tuning improves multilingual reasoning (replicates Aryabumi et al., 2024).
Part of the Language Decoded project (Cohere's Tiny Aya Expedition).
For full experiment details, see the Language Decoded LoRA hub.
Training Data
legesher/language-decoded-data / condition-1-en-5k
Usage
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel
base_model = AutoModelForCausalLM.from_pretrained("CohereLabs/tiny-aya-base")
tokenizer = AutoTokenizer.from_pretrained("CohereLabs/tiny-aya-base")
model = PeftModel.from_pretrained(base_model, "legesher/language-decoded-lora-condition-1-en-5k")
Citation
@misc{language-decoded-2026,
title={Language Decoded: Investigating Language-Dependent vs. Structure-Dependent Reasoning Benefits of Code},
author={Madison Edgar and Saad Ahmed Bazaz and Tom Sherborne and Rashik Shahjahan and Khojasteh Mirza and Sarah Jawaid and Rafay Mustafa and Sohaib Ahmed Bazaz},
year={2026},
publisher={Hugging Face},
url={https://huggingface.co/legesher/language-decoded-lora}
}
License
Apache 2.0
Model tree for legesher/language-decoded-lora-condition-1-en-5k
Base model
CohereLabs/tiny-aya-base