kk014
/

mistral-7b-docstring

code-generation

Model card Files Files and versions

mistral-7b-docstring / README.md

kk014's picture

Update README.md

5d2bf5a verified about 1 month ago

|

History Blame Contribute Delete

3.36 kB

	---
	language: en
	license: apache-2.0
	tags:
	- code
	- python
	- docstring
	- mistral
	- qlora
	- peft
	- code-generation
	base_model: mistralai/Mistral-7B-v0.1
	datasets:
	- code_search_net
	---

	# mistral-7b-docstring

	Mistral 7B fine-tuned with QLoRA on Python docstring generation from CodeSearchNet.

	Outperforms Llama 3.3 70B — a model 10x larger — on both ROUGE-L and BERTScore on domain-specific NumPy-style docstring generation.

	## Evaluation results

	Evaluated on 100 held-out Python functions from CodeSearchNet (never seen during training).

	\| Model \| ROUGE-L \| BERTScore F1 \|
	\|---\|---\|---\|
	\| Mistral 7B fine-tuned (this model) \| 0.2033 \| 0.7739 \|
	\| Llama 3.3 70B via Groq \| 0.1715 \| 0.7594 \|
	\| Mistral 7B base (no fine-tuning) \| 0.1102 \| 0.7118 \|

	The fine-tuned 7B model beats Llama 3.3 70B on ROUGE-L (+18.5%) and BERTScore (+1.9%) while being 10x smaller and running at a fraction of the inference cost.

	## How to use

	```python
	from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig
	from peft import PeftModel
	import torch

	BASE_MODEL = "mistralai/Mistral-7B-v0.1"

	# Load in 4-bit for efficient inference
	bnb_config = BitsAndBytesConfig(
	load_in_4bit=True,
	bnb_4bit_quant_type="nf4",
	bnb_4bit_compute_dtype=torch.float16,
	)

	tokenizer = AutoTokenizer.from_pretrained(BASE_MODEL)
	base_model = AutoModelForCausalLM.from_pretrained(
	BASE_MODEL,
	quantization_config=bnb_config,
	device_map="auto",
	)
	model = PeftModel.from_pretrained(base_model, "kk014/mistral-7b-docstring")
	model.eval()

	# Generate a docstring
	function_code = """
	def calculate_bmi(weight_kg, height_m):
	return weight_kg / (height_m ** 2)
	""".strip()

	prompt = (
	"You are a Python documentation expert. "
	"Write a clear, concise NumPy-style docstring for the following Python function.\n\n"
	f"### Function:\n{function_code}\n\n"
	"### Docstring:"
	)

	inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
	with torch.no_grad():
	outputs = model.generate(
	**inputs,
	max_new_tokens=150,
	temperature=0.1,
	do_sample=True,
	pad_token_id=tokenizer.eos_token_id,
	)

	generated = tokenizer.decode(outputs[0], skip_special_tokens=True)
	docstring = generated[len(prompt):].strip()
	print(docstring)
	```

	## Training details

	\| Parameter \| Value \|
	\|---\|---\|
	\| Base model \| mistralai/Mistral-7B-v0.1 \|
	\| Dataset \| CodeSearchNet (Python split) \|
	\| Training samples \| 8,000 \|
	\| Method \| QLoRA (4-bit NF4 quantisation) \|
	\| LoRA rank \| 16 \|
	\| LoRA alpha \| 32 \|
	\| Epochs \| 1 \|
	\| Batch size \| 2 (effective 16 with grad accum) \|
	\| Learning rate \| 2e-4 \|
	\| Hardware \| Kaggle T4 x2 (free tier) \|
	\| Training time \| ~4 hours \|
	\| Framework \| HuggingFace PEFT + TRL \|

	## Limitations

	- Trained on NumPy-style docstrings specifically — output style may differ for Google or Sphinx style
	- Best on standalone functions under ~50 lines
	- May repeat examples in generated output at very low temperatures
	- Evaluated on CodeSearchNet Python split only — performance on other codebases may vary

	## Citation

	If you use this model, please cite the original QLoRA paper:

	```
	@article{dettmers2023qlora,
	title={QLoRA: Efficient Finetuning of Quantized LLMs},
	author={Dettmers, Tim and others},
	journal={arXiv preprint arXiv:2305.14314},
	year={2023}
	}
	```