| | --- |
| | base_model: unsloth/devstral-small-2507-unsloth-bnb-4bit |
| | library_name: peft |
| | pipeline_tag: text-generation |
| | license: apache-2.0 |
| | language: |
| | - en |
| | tags: |
| | - lora |
| | - sft |
| | - transformers |
| | - trl |
| | - unsloth |
| | - code |
| | - devstral |
| | - mistral |
| | datasets: |
| | - custom |
| | model-index: |
| | - name: devstral-finetuned-lora |
| | results: |
| | - task: |
| | type: text-generation |
| | name: Code Generation |
| | dataset: |
| | name: HumanEval |
| | type: openai_humaneval |
| | metrics: |
| | - type: pass@1 |
| | value: 3.0 |
| | name: pass@1 |
| | --- |
| | |
| | # Devstral Small 2507 — Fine-tuned on AI Coding Conversations |
| |
|
| | QLoRA fine-tune of [Devstral Small 2507](https://huggingface.co/mistralai/Devstral-Small-2507) (24B) on 2,100 real AI coding assistant conversations extracted from Claude Code, Cursor, Codex CLI, and OpenCode. |
| |
|
| | ## Training Details |
| |
|
| | | Parameter | Value | |
| | |-----------|-------| |
| | | Base model | `mistralai/Devstral-Small-2507` (24B) | |
| | | Method | QLoRA (4-bit NF4, rank 32, alpha 32) | |
| | | Target modules | q, k, v, o, gate, up, down proj | |
| | | Trainable params | 184.8M / 23.8B (0.78%) | |
| | | Epochs | 3 | |
| | | Batch size | 2 x 4 grad_accum = 8 effective | |
| | | Learning rate | 2e-4 (cosine schedule) | |
| | | Optimizer | AdamW 8-bit | |
| | | Precision | bfloat16 | |
| | | Hardware | 1x NVIDIA L4 24GB (GCP g2-standard-8) | |
| | | Training time | 10.9 hours (39,402s) | |
| | | Final loss | 0.3618 | |
| | | Framework | Unsloth 2026.2.1 + TRL 0.22.2 | |
| | |
| | ## Training Data |
| | |
| | 2,100 multi-turn coding conversations (175K+ messages total before filtering) from: |
| | |
| | | Source | Conversations | |
| | |--------|--------------| |
| | | Cursor (AI Service) | 2,073 | |
| | | Cursor (Global Composer) | 1,104 | |
| | | Codex CLI | 555 | |
| | | Claude Code | 289 | |
| | | OpenCode CLI | 284 | |
| | |
| | **Preprocessing:** |
| | - Filtered conversations with <2 messages |
| | - Removed tool-call-only assistant turns (<20 chars) |
| | - Removed tool_result user messages |
| | - Merged consecutive same-role messages |
| | - Truncated messages >8000 chars |
| | - Conversations must start with user, contain at least one assistant response |
| | - Secrets redacted (4,208 redaction markers across 91 unique secrets) |
| |
|
| | ## Usage |
| |
|
| | ### With Unsloth (recommended for inference) |
| |
|
| | ```python |
| | from unsloth import FastLanguageModel |
| | |
| | model, tokenizer = FastLanguageModel.from_pretrained( |
| | model_name="YOUR_USERNAME/devstral-finetuned-lora", |
| | max_seq_length=2048, |
| | load_in_4bit=True, |
| | ) |
| | FastLanguageModel.for_inference(model) |
| | |
| | messages = [{"role": "user", "content": "Write a Python LRU cache from scratch"}] |
| | input_text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True) |
| | inputs = tokenizer(input_text, return_tensors="pt").to(model.device) |
| | |
| | outputs = model.generate(**inputs, max_new_tokens=512, do_sample=False) |
| | print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True)) |
| | ``` |
| |
|
| | ### With PEFT + Transformers |
| |
|
| | ```python |
| | from peft import PeftModel |
| | from transformers import AutoModelForCausalLM, AutoTokenizer |
| | |
| | base_model = AutoModelForCausalLM.from_pretrained( |
| | "mistralai/Devstral-Small-2507", |
| | load_in_4bit=True, |
| | ) |
| | model = PeftModel.from_pretrained(base_model, "YOUR_USERNAME/devstral-finetuned-lora") |
| | tokenizer = AutoTokenizer.from_pretrained("YOUR_USERNAME/devstral-finetuned-lora") |
| | ``` |
| |
|
| | ### Convert to MLX (Apple Silicon) |
| |
|
| | ```bash |
| | # First merge LoRA into full model, then convert |
| | pip install mlx-lm |
| | python -m mlx_lm.convert --hf-path devstral-finetuned-16bit --mlx-path devstral-mlx -q --q-bits 4 |
| | python -m mlx_lm.generate --model devstral-mlx --prompt "Write a function that..." |
| | ``` |
| |
|
| | ## Evaluation |
| |
|
| | | Benchmark | Metric | Score | Notes | |
| | |-----------|--------|-------|-------| |
| | | HumanEval | pass@1 | 3.0% (5/164) | Low score expected — model fine-tuned on conversational coding (multi-turn dialogs with tool use), not bare function completion | |
| |
|
| | **Why the low HumanEval score?** |
| |
|
| | This model was trained on real AI coding conversations with: |
| | - Multi-turn dialog context |
| | - Tool calls and results |
| | - Natural language explanations |
| | - User-assistant interaction patterns |
| |
|
| | HumanEval tests **bare function completion** without dialog context, which is a different task. The model is optimized for conversational coding assistance, not standalone code generation. |
| |
|
| | ## Limitations |
| |
|
| | - Fine-tuned on a specific user's coding style and preferences |
| | - Training data is English-only, primarily TypeScript/Python/Rust |
| | - Not a general-purpose improvement — reflects patterns from specific coding workflows |
| | - LoRA adapters only; requires the base Devstral Small 2507 model |
| |
|
| | ## License |
| |
|
| | Apache 2.0 (same as base model) |
| |
|
| | ## Compute Cost |
| |
|
| | ~$12 total on GCP (L4 GPU @ ~$1.10/hr for ~10.9 hours) |
| |
|