---
base_model: unsloth/devstral-small-2507-unsloth-bnb-4bit
library_name: peft
pipeline_tag: text-generation
license: apache-2.0
language:
- en
tags:
- lora
- sft
- transformers
- trl
- unsloth
- code
- devstral
- mistral
datasets:
- custom
model-index:
- name: devstral-finetuned-lora
  results:
  - task:
      type: text-generation
      name: Code Generation
    dataset:
      name: HumanEval
      type: openai_humaneval
    metrics:
    - type: pass@1
      value: 3.0
      name: pass@1
---

# Devstral Small 2507 — Fine-tuned on AI Coding Conversations

QLoRA fine-tune of [Devstral Small 2507](https://huggingface.co/mistralai/Devstral-Small-2507) (24B) on 2,100 real AI coding assistant conversations extracted from Claude Code, Cursor, Codex CLI, and OpenCode.

## Training Details

| Parameter | Value |
|-----------|-------|
| Base model | `mistralai/Devstral-Small-2507` (24B) |
| Method | QLoRA (4-bit NF4, rank 32, alpha 32) |
| Target modules | q, k, v, o, gate, up, down proj |
| Trainable params | 184.8M / 23.8B (0.78%) |
| Epochs | 3 |
| Batch size | 2 x 4 grad_accum = 8 effective |
| Learning rate | 2e-4 (cosine schedule) |
| Optimizer | AdamW 8-bit |
| Precision | bfloat16 |
| Hardware | 1x NVIDIA L4 24GB (GCP g2-standard-8) |
| Training time | 10.9 hours (39,402s) |
| Final loss | 0.3618 |
| Framework | Unsloth 2026.2.1 + TRL 0.22.2 |

## Training Data

2,100 multi-turn coding conversations (175K+ messages total before filtering) from:

| Source | Conversations |
|--------|--------------|
| Cursor (AI Service) | 2,073 |
| Cursor (Global Composer) | 1,104 |
| Codex CLI | 555 |
| Claude Code | 289 |
| OpenCode CLI | 284 |

**Preprocessing:**
- Filtered conversations with <2 messages
- Removed tool-call-only assistant turns (<20 chars)
- Removed tool_result user messages
- Merged consecutive same-role messages
- Truncated messages >8000 chars
- Conversations must start with user, contain at least one assistant response
- Secrets redacted (4,208 redaction markers across 91 unique secrets)

## Usage

### With Unsloth (recommended for inference)

```python
from unsloth import FastLanguageModel

model, tokenizer = FastLanguageModel.from_pretrained(
    model_name="YOUR_USERNAME/devstral-finetuned-lora",
    max_seq_length=2048,
    load_in_4bit=True,
)
FastLanguageModel.for_inference(model)

messages = [{"role": "user", "content": "Write a Python LRU cache from scratch"}]
input_text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(input_text, return_tensors="pt").to(model.device)

outputs = model.generate(**inputs, max_new_tokens=512, do_sample=False)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True))
```

### With PEFT + Transformers

```python
from peft import PeftModel
from transformers import AutoModelForCausalLM, AutoTokenizer

base_model = AutoModelForCausalLM.from_pretrained(
    "mistralai/Devstral-Small-2507",
    load_in_4bit=True,
)
model = PeftModel.from_pretrained(base_model, "YOUR_USERNAME/devstral-finetuned-lora")
tokenizer = AutoTokenizer.from_pretrained("YOUR_USERNAME/devstral-finetuned-lora")
```

### Convert to MLX (Apple Silicon)

```bash
# First merge LoRA into full model, then convert
pip install mlx-lm
python -m mlx_lm.convert --hf-path devstral-finetuned-16bit --mlx-path devstral-mlx -q --q-bits 4
python -m mlx_lm.generate --model devstral-mlx --prompt "Write a function that..."
```

## Evaluation

| Benchmark | Metric | Score | Notes |
|-----------|--------|-------|-------|
| HumanEval | pass@1 | 3.0% (5/164) | Low score expected — model fine-tuned on conversational coding (multi-turn dialogs with tool use), not bare function completion |

**Why the low HumanEval score?**

This model was trained on real AI coding conversations with:
- Multi-turn dialog context
- Tool calls and results
- Natural language explanations
- User-assistant interaction patterns

HumanEval tests **bare function completion** without dialog context, which is a different task. The model is optimized for conversational coding assistance, not standalone code generation.

## Limitations

- Fine-tuned on a specific user's coding style and preferences
- Training data is English-only, primarily TypeScript/Python/Rust
- Not a general-purpose improvement — reflects patterns from specific coding workflows
- LoRA adapters only; requires the base Devstral Small 2507 model

## License

Apache 2.0 (same as base model)

## Compute Cost

~$12 total on GCP (L4 GPU @ ~$1.10/hr for ~10.9 hours)