--- language: en license: apache-2.0 tags: - code - python - docstring - mistral - qlora - peft - code-generation base_model: mistralai/Mistral-7B-v0.1 datasets: - code_search_net --- # mistral-7b-docstring Mistral 7B fine-tuned with QLoRA on Python docstring generation from CodeSearchNet. Outperforms Llama 3.3 70B — a model 10x larger — on both ROUGE-L and BERTScore on domain-specific NumPy-style docstring generation. ## Evaluation results Evaluated on 100 held-out Python functions from CodeSearchNet (never seen during training). | Model | ROUGE-L | BERTScore F1 | |---|---|---| | **Mistral 7B fine-tuned (this model)** | **0.2033** | **0.7739** | | Llama 3.3 70B via Groq | 0.1715 | 0.7594 | | Mistral 7B base (no fine-tuning) | 0.1102 | 0.7118 | The fine-tuned 7B model beats Llama 3.3 70B on ROUGE-L (+18.5%) and BERTScore (+1.9%) while being 10x smaller and running at a fraction of the inference cost. ## How to use ```python from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig from peft import PeftModel import torch BASE_MODEL = "mistralai/Mistral-7B-v0.1" # Load in 4-bit for efficient inference bnb_config = BitsAndBytesConfig( load_in_4bit=True, bnb_4bit_quant_type="nf4", bnb_4bit_compute_dtype=torch.float16, ) tokenizer = AutoTokenizer.from_pretrained(BASE_MODEL) base_model = AutoModelForCausalLM.from_pretrained( BASE_MODEL, quantization_config=bnb_config, device_map="auto", ) model = PeftModel.from_pretrained(base_model, "kk014/mistral-7b-docstring") model.eval() # Generate a docstring function_code = """ def calculate_bmi(weight_kg, height_m): return weight_kg / (height_m ** 2) """.strip() prompt = ( "You are a Python documentation expert. " "Write a clear, concise NumPy-style docstring for the following Python function.\n\n" f"### Function:\n{function_code}\n\n" "### Docstring:" ) inputs = tokenizer(prompt, return_tensors="pt").to(model.device) with torch.no_grad(): outputs = model.generate( **inputs, max_new_tokens=150, temperature=0.1, do_sample=True, pad_token_id=tokenizer.eos_token_id, ) generated = tokenizer.decode(outputs[0], skip_special_tokens=True) docstring = generated[len(prompt):].strip() print(docstring) ``` ## Training details | Parameter | Value | |---|---| | Base model | mistralai/Mistral-7B-v0.1 | | Dataset | CodeSearchNet (Python split) | | Training samples | 8,000 | | Method | QLoRA (4-bit NF4 quantisation) | | LoRA rank | 16 | | LoRA alpha | 32 | | Epochs | 1 | | Batch size | 2 (effective 16 with grad accum) | | Learning rate | 2e-4 | | Hardware | Kaggle T4 x2 (free tier) | | Training time | ~4 hours | | Framework | HuggingFace PEFT + TRL | ## Limitations - Trained on NumPy-style docstrings specifically — output style may differ for Google or Sphinx style - Best on standalone functions under ~50 lines - May repeat examples in generated output at very low temperatures - Evaluated on CodeSearchNet Python split only — performance on other codebases may vary ## Citation If you use this model, please cite the original QLoRA paper: ``` @article{dettmers2023qlora, title={QLoRA: Efficient Finetuning of Quantized LLMs}, author={Dettmers, Tim and others}, journal={arXiv preprint arXiv:2305.14314}, year={2023} } ```