Text Generation
Transformers
Safetensors
lora
aya
tiny-aya
multilingual
code
legesher
tiny-aya-expedition
language-decoded
unsloth
arxiv:2603.11510
arxiv:2211.15533
arxiv:2510.09591
arxiv:1809.05053
arxiv:2308.16884
arxiv:2106.06937
arxiv:2210.03057
Instructions to use legesher/language-decoded-lora with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use legesher/language-decoded-lora with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="legesher/language-decoded-lora")# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("legesher/language-decoded-lora", dtype="auto") - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use legesher/language-decoded-lora with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "legesher/language-decoded-lora" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "legesher/language-decoded-lora", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/legesher/language-decoded-lora
- SGLang
How to use legesher/language-decoded-lora with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "legesher/language-decoded-lora" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "legesher/language-decoded-lora", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "legesher/language-decoded-lora" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "legesher/language-decoded-lora", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Unsloth Studio
How to use legesher/language-decoded-lora with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for legesher/language-decoded-lora to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for legesher/language-decoded-lora to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for legesher/language-decoded-lora to start chatting
Load model with FastModel
pip install unsloth from unsloth import FastModel model, tokenizer = FastModel.from_pretrained( model_name="legesher/language-decoded-lora", max_seq_length=2048, ) - Docker Model Runner
How to use legesher/language-decoded-lora with Docker Model Runner:
docker model run hf.co/legesher/language-decoded-lora
docs(readme): cond-5 refined-extractor banner + Phase 3 staleness fixes
#9
by madiedgar - opened
README.md
CHANGED
|
@@ -25,6 +25,10 @@ pipeline_tag: text-generation
|
|
| 25 |
|
| 26 |
QLoRA adapters fine-tuned on multilingual code conditions for the **Language Decoded** project (part of [Cohere's Tiny Aya Expedition](https://aya.for.ai)).
|
| 27 |
|
|
|
|
|
|
|
|
|
|
|
|
|
| 28 |
## Research Question
|
| 29 |
|
| 30 |
> Does fine-tuning on non-English code improve multilingual reasoning — and is the benefit language-dependent or structure-dependent?
|
|
@@ -35,7 +39,7 @@ All adapters are trained on [CohereLabs/tiny-aya-base](https://huggingface.co/Co
|
|
| 35 |
|
| 36 |
## Model Structure
|
| 37 |
|
| 38 |
-
This repo is the canonical hub for
|
| 39 |
|
| 40 |
| Subdirectory | Condition | Training Data |
|
| 41 |
| --------------------- | ----------- | ----------------------------------------------------- |
|
|
@@ -46,12 +50,15 @@ This repo is the canonical hub for all Language Decoded LoRA adapters, organized
|
|
| 46 |
| `condition-2-ur-5k/` | Condition 2 | Urdu keyword-swapped Python (Legesher-transpiled) |
|
| 47 |
| `condition-3-zh-5k/` | Condition 3 | Transpiled + native Chinese code (blended) |
|
| 48 |
|
|
|
|
|
|
|
| 49 |
### The Experimental Ladder
|
| 50 |
|
| 51 |
- **Baseline --> 1**: Does code help at all?
|
| 52 |
- **1 --> 2**: Does the language of keywords matter?
|
| 53 |
- **2 --> 3**: Does diversity of native-language sources add value beyond keyword swap?
|
| 54 |
- **3 --> 4**: Does code written in the cultural context of a language carry unique signal?
|
|
|
|
| 55 |
|
| 56 |
## Usage
|
| 57 |
|
|
@@ -96,22 +103,25 @@ model = PeftModel.from_pretrained(base_model, "legesher/language-decoded-lora",
|
|
| 96 |
|
| 97 |
## Evaluation
|
| 98 |
|
| 99 |
-
Models are evaluated on multilingual reasoning benchmarks with dual prompts (English + language-specific)
|
| 100 |
|
| 101 |
-
| Benchmark | What it measures | Examples per language |
|
| 102 |
-
| --------- | -------------------------- | --------------------- |
|
| 103 |
-
| MGSM | Math reasoning |
|
| 104 |
-
| X-CSQA | Commonsense reasoning | ~1,000
|
| 105 |
-
| XNLI | Natural language inference | ~5,000
|
|
|
|
|
|
|
| 106 |
|
| 107 |
-
|
| 108 |
|
| 109 |
## Limitations
|
| 110 |
|
| 111 |
- **Single base model**: All adapters are trained on CohereLabs/tiny-aya-base (3.35B params). Results may not generalize to larger or architecturally different models.
|
| 112 |
- **Limited training data**: Each condition uses a 5k-file subset for QLoRA fine-tuning, constrained by Kaggle T4 hardware limits.
|
| 113 |
-
- **Evaluation scope**: Currently evaluated on
|
| 114 |
- **Consumer hardware**: Training on Kaggle T4 (16GB) with 4-bit quantization introduces approximation that may affect adapter quality compared to full-precision training.
|
|
|
|
| 115 |
|
| 116 |
## Related Resources
|
| 117 |
|
|
|
|
| 25 |
|
| 26 |
QLoRA adapters fine-tuned on multilingual code conditions for the **Language Decoded** project (part of [Cohere's Tiny Aya Expedition](https://aya.for.ai)).
|
| 27 |
|
| 28 |
+
## ⚠️ Phase 3 eval numbers — read the experiments repo before citing
|
| 29 |
+
|
| 30 |
+
Original Phase 3 `_summary_*.json` files on [`legesher/language-decoded-experiments`](https://huggingface.co/datasets/legesher/language-decoded-experiments) **under-report cond-5 SIB-200 accuracy by 20–35pp** because the strict inference-time extractor refused native-script answers. Cite the `_summary_reparsed_*.json` siblings (refined extractor) instead. **Five** Phase 3 SIB-200 conclusions also flip win→loss against baseline once the extractor is corrected (`cond-2-es-5k`, `cond-2-es-20k`, `cond-2-ur-20k`, `cond-2-zh-20k`, `cond-3-zh-5k`), and `cond-2-ur-5k`'s gain deflates 4.4×. See the [banner on the experiments repo](https://huggingface.co/datasets/legesher/language-decoded-experiments) (top of the README) for the full picture.
|
| 31 |
+
|
| 32 |
## Research Question
|
| 33 |
|
| 34 |
> Does fine-tuning on non-English code improve multilingual reasoning — and is the benefit language-dependent or structure-dependent?
|
|
|
|
| 39 |
|
| 40 |
## Model Structure
|
| 41 |
|
| 42 |
+
This repo is the canonical hub for the trained-from-scratch LoRA adapters, organized by experimental condition:
|
| 43 |
|
| 44 |
| Subdirectory | Condition | Training Data |
|
| 45 |
| --------------------- | ----------- | ----------------------------------------------------- |
|
|
|
|
| 50 |
| `condition-2-ur-5k/` | Condition 2 | Urdu keyword-swapped Python (Legesher-transpiled) |
|
| 51 |
| `condition-3-zh-5k/` | Condition 3 | Transpiled + native Chinese code (blended) |
|
| 52 |
|
| 53 |
+
**Cond-5 (cross-lingual transfer)** is an evaluation pattern that re-uses condition-2 adapters with cross-language prompting — see [`phase3/conditions/condition-5-{zh,es,ur}-5k/`](https://huggingface.co/datasets/legesher/language-decoded-experiments/tree/main/phase3/conditions) on the experiments repo for the cross-lingual eval results.
|
| 54 |
+
|
| 55 |
### The Experimental Ladder
|
| 56 |
|
| 57 |
- **Baseline --> 1**: Does code help at all?
|
| 58 |
- **1 --> 2**: Does the language of keywords matter?
|
| 59 |
- **2 --> 3**: Does diversity of native-language sources add value beyond keyword swap?
|
| 60 |
- **3 --> 4**: Does code written in the cultural context of a language carry unique signal?
|
| 61 |
+
- **--> 5**: Does shared script or language family create transfer effects when an adapter trained on one language is prompted in another?
|
| 62 |
|
| 63 |
## Usage
|
| 64 |
|
|
|
|
| 103 |
|
| 104 |
## Evaluation
|
| 105 |
|
| 106 |
+
Models are evaluated on multilingual reasoning benchmarks with dual prompts (English + language-specific). Phase 3 adds SIB-200 and Belebele to the Phase 2 benchmark set.
|
| 107 |
|
| 108 |
+
| Benchmark | What it measures | Phase | Examples per language |
|
| 109 |
+
| --------- | -------------------------- | ----- | --------------------- |
|
| 110 |
+
| MGSM | Math reasoning | 2, 3 | 250 |
|
| 111 |
+
| X-CSQA | Commonsense reasoning | 2, 3 | ~1,000 |
|
| 112 |
+
| XNLI | Natural language inference | 2, 3 | ~5,000 |
|
| 113 |
+
| SIB-200 | Topic classification | 3 | ~204 |
|
| 114 |
+
| Belebele | Reading comprehension | 3 | ~900 |
|
| 115 |
|
| 116 |
+
Eval results live at [`legesher/language-decoded-experiments`](https://huggingface.co/datasets/legesher/language-decoded-experiments). **Cite `_summary_reparsed_*.json` files for Phase 3 numbers** — see the banner above.
|
| 117 |
|
| 118 |
## Limitations
|
| 119 |
|
| 120 |
- **Single base model**: All adapters are trained on CohereLabs/tiny-aya-base (3.35B params). Results may not generalize to larger or architecturally different models.
|
| 121 |
- **Limited training data**: Each condition uses a 5k-file subset for QLoRA fine-tuning, constrained by Kaggle T4 hardware limits.
|
| 122 |
+
- **Evaluation scope**: Currently evaluated on 5 benchmarks (MGSM, X-CSQA, XNLI, SIB-200, Belebele). Other reasoning tasks may show different patterns.
|
| 123 |
- **Consumer hardware**: Training on Kaggle T4 (16GB) with 4-bit quantization introduces approximation that may affect adapter quality compared to full-precision training.
|
| 124 |
+
- **Extractor coverage**: Phase 3 inference-time extractor under-counts native-script SIB-200 answers; refined post-hoc extractor recovers them. See the banner above and [`expedition-tiny-aya/analysis/phase-3/phase3-refined-evaluation.md`](https://github.com/legesher/research/blob/main/expedition-tiny-aya/analysis/phase-3/phase3-refined-evaluation.md) on the research repo.
|
| 125 |
|
| 126 |
## Related Resources
|
| 127 |
|