Qwen2.5-Coder-32B-FIM

A LoRA fine-tuned adapter for Qwen/Qwen2.5-Coder-32B, specialized for Fill-in-the-Middle (FIM) code completion on Rust codebases.

Model Details

  • Base Model: Qwen/Qwen2.5-Coder-32B
  • Fine-tuning Method: LoRA (Low-Rank Adaptation) with 4-bit quantization
  • Training Data: AST-extracted code spans from the reth Rust codebase
  • Task: Fill-in-the-Middle code completion (predict missing code between prefix and suffix)

Training Configuration

Parameter Value
LoRA Rank 64
LoRA Alpha 128
LoRA Dropout 0.05
Target Modules q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj
Quantization 4-bit (nf4, double quantization)
Learning Rate 2e-5
Epochs 3
Optimizer paged_adamw_8bit
LR Scheduler cosine
Max Sequence Length 4096
Precision bfloat16

FIM Format

This model uses the Qwen FIM token format:

<|fim_prefix|>[code before cursor]<|fim_suffix|>[code after cursor]<|fim_middle|>[generated completion]

Usage

With PEFT

from peft import PeftModel
from transformers import AutoModelForCausalLM, AutoTokenizer

base_model = AutoModelForCausalLM.from_pretrained(
    "Qwen/Qwen2.5-Coder-32B",
    device_map="auto",
    load_in_4bit=True,
)
model = PeftModel.from_pretrained(base_model, "viplismism/Qwen2.5-Coder-32B-FIM")
tokenizer = AutoTokenizer.from_pretrained("viplismism/Qwen2.5-Coder-32B-FIM")

prompt = "<|fim_prefix|>fn add(a: i32, b: i32) -> i32 {\n    <|fim_suffix|>\n}<|fim_middle|>"
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=128, temperature=0.2)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

With Ollama

After merging adapters and converting to GGUF:

ollama create qwen2.5-coder-32b-fim -f Modelfile
ollama run qwen2.5-coder-32b-fim

Training Framework

This model was trained using fim-coder-model, a pipeline that:

  1. Extracts semantic code boundaries (functions, structs, impl blocks) via Rust AST parsing
  2. Generates FIM training samples with proper prefix/suffix/middle splits
  3. Fine-tunes with LoRA + 4-bit quantization for efficient multi-GPU training

License

MIT

Downloads last month
11
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for viplismism/Qwen2.5-Coder-32B-FIM

Base model

Qwen/Qwen2.5-32B
Adapter
(1)
this model