File size: 6,584 Bytes
f48842c
689d1ea
f48842c
689d1ea
 
 
 
f48842c
689d1ea
 
 
 
 
 
 
 
 
f48842c
 
689d1ea
f48842c
 
 
 
 
e4f13e7
f48842c
 
 
 
 
e4f13e7
f48842c
e4f13e7
f48842c
 
 
689d1ea
f48842c
6ac3acd
 
 
 
 
 
 
 
e4f13e7
 
 
689d1ea
 
 
 
f48842c
 
 
 
 
 
 
e4f13e7
 
 
f48842c
e4f13e7
689d1ea
f48842c
e4f13e7
689d1ea
f48842c
 
 
 
689d1ea
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
f48842c
 
 
e4f13e7
f48842c
689d1ea
 
 
 
 
 
 
 
 
f48842c
689d1ea
 
 
 
f48842c
 
 
e4f13e7
 
 
 
f48842c
 
 
 
 
 
689d1ea
f48842c
 
e4f13e7
f48842c
 
 
 
 
689d1ea
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
---
license: apache-2.0
language:
  - en
  - zh
  - es
  - ur
tags:
  - lora
  - aya
  - tiny-aya
  - multilingual
  - code
  - legesher
  - tiny-aya-expedition
  - language-decoded
  - unsloth
library_name: transformers
base_model:
  - CohereLabs/tiny-aya-base
pipeline_tag: text-generation
---

# Language Decoded LoRA

QLoRA adapters fine-tuned on multilingual code conditions for the **Language Decoded** project (part of [Cohere's Tiny Aya Expedition](https://aya.for.ai)).

## Research Question

> Does fine-tuning on non-English code improve multilingual reasoning — and is the benefit language-dependent or structure-dependent?

## Base Model

All adapters are trained on [CohereLabs/tiny-aya-base](https://huggingface.co/CohereLabs/tiny-aya-base) (3.35B parameters).

## Model Structure

This repo is the canonical hub for all Language Decoded LoRA adapters, organized by experimental condition:

| Subdirectory          | Condition   | Training Data                                         |
| --------------------- | ----------- | ----------------------------------------------------- |
| `condition-1-en-32k/` | Condition 1 | English Python from The Stack Dedup (full 32k corpus) |
| `condition-1-en-5k/`  | Condition 1 | English Python from The Stack Dedup (5k subset)       |
| `condition-2-zh-5k/`  | Condition 2 | Chinese keyword-swapped Python (Legesher-transpiled)  |
| `condition-2-es-5k/`  | Condition 2 | Spanish keyword-swapped Python (Legesher-transpiled)  |
| `condition-2-ur-5k/`  | Condition 2 | Urdu keyword-swapped Python (Legesher-transpiled)     |
| `condition-3-zh-5k/`  | Condition 3 | Transpiled + native Chinese code (blended)            |

### The Experimental Ladder

- **Baseline --> 1**: Does code help at all?
- **1 --> 2**: Does the language of keywords matter?
- **2 --> 3**: Does diversity of native-language sources add value beyond keyword swap?
- **3 --> 4**: Does code written in the cultural context of a language carry unique signal?

## Usage

```python
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel

# Load base model
base_model = AutoModelForCausalLM.from_pretrained("CohereLabs/tiny-aya-base")
tokenizer = AutoTokenizer.from_pretrained("CohereLabs/tiny-aya-base")

# Load a LoRA adapter (e.g., Condition 1 — English code)
model = PeftModel.from_pretrained(base_model, "legesher/language-decoded-lora", subfolder="condition-1-en-5k")

# Load a language-specific adapter (e.g., Condition 2 — Chinese keyword-swapped)
model = PeftModel.from_pretrained(base_model, "legesher/language-decoded-lora", subfolder="condition-2-zh-5k")
```

## Training Details

| Parameter          | Value                                                                                            |
| ------------------ | ------------------------------------------------------------------------------------------------ |
| Base model         | [CohereLabs/tiny-aya-base](https://huggingface.co/CohereLabs/tiny-aya-base) (3.35B params)       |
| Method             | QLoRA 4-bit (NF4), ~5.4GB VRAM                                                                   |
| Hardware           | Kaggle T4 (16GB)                                                                                 |
| Tokenizer          | CohereLabs/tiny-aya-base                                                                         |
| Transpilation tool | [Legesher](https://github.com/legesher/legesher) v0.7.3                                          |
| Training data      | [legesher/language-decoded-data](https://huggingface.co/datasets/legesher/language-decoded-data) |

### QLoRA Hyperparameters

| Parameter       | Value                                                         |
| --------------- | ------------------------------------------------------------- |
| LoRA rank (`r`) | 16                                                            |
| LoRA alpha      | 32                                                            |
| LoRA dropout    | 0.0                                                           |
| Target modules  | q_proj, k_proj, v_proj, o_proj, up_proj, down_proj, gate_proj |
| Bias            | none                                                          |
| Task type       | CAUSAL_LM                                                     |
| PEFT version    | 0.18.1                                                        |
| Quantization    | NF4 (4-bit) via Unsloth                                       |

## Evaluation

Models are evaluated on multilingual reasoning benchmarks with dual prompts (English + language-specific):

| Benchmark | What it measures           | Examples per language |
| --------- | -------------------------- | --------------------- |
| MGSM      | Math reasoning             | 250 (full set)        |
| X-CSQA    | Commonsense reasoning      | ~1,000 (full set)     |
| XNLI      | Natural language inference | ~5,000 (full set)     |

_Results will be added as evaluation completes._

## Limitations

- **Single base model**: All adapters are trained on CohereLabs/tiny-aya-base (3.35B params). Results may not generalize to larger or architecturally different models.
- **Limited training data**: Each condition uses a 5k-file subset for QLoRA fine-tuning, constrained by Kaggle T4 hardware limits.
- **Evaluation scope**: Currently evaluated on 3 benchmarks (MGSM, X-CSQA, XNLI). Other reasoning tasks may show different patterns.
- **Consumer hardware**: Training on Kaggle T4 (16GB) with 4-bit quantization introduces approximation that may affect adapter quality compared to full-precision training.

## Related Resources

- **Training data**: [legesher/language-decoded-data](https://huggingface.co/datasets/legesher/language-decoded-data)
- **Community code**: [legesher/language-decoded-community](https://huggingface.co/datasets/legesher/language-decoded-community)
- **Experiment tracking**: [legesher/language-decoded-experiments](https://huggingface.co/datasets/legesher/language-decoded-experiments)
- **Transpilation tool**: [Legesher on GitHub](https://github.com/legesher/legesher)

## Citation

```bibtex
@misc{language-decoded-2026,
  title={Language Decoded: Investigating Language-Dependent vs. Structure-Dependent Reasoning Benefits of Code},
  author={Madison Edgar and Saad Ahmed Bazaz and Tom Sherborne and Rashik Shahjahan and Khojasteh Mirza and Sarah Jawaid and Rafay Mustafa and Sohaib Ahmed Bazaz},
  year={2026},
  publisher={Hugging Face},
  url={https://huggingface.co/legesher/language-decoded-lora}
}
```

## License

Apache 2.0