File size: 1,804 Bytes
f5161e6
9e00242
f5161e6
9e00242
f5161e6
9e00242
 
 
 
 
 
 
 
 
f5161e6
 
9e00242
f5161e6
9e00242
 
 
 
 
f5161e6
9e00242
 
f5161e6
9e00242
f5161e6
9e00242
 
 
 
f5161e6
9e00242
 
f5161e6
9e00242
 
 
 
 
f5161e6
9e00242
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
---
license: mit
library_name: peft
base_model: Qwen/Qwen2.5-Coder-7B-Instruct
tags:
  - code
  - python
  - code-explanation
  - lora
  - qlora
  - peft
language:
  - en
pipeline_tag: text-generation
---

# PyExplain — Qwen2.5-Coder-7B (LoRA adapter)

A **LoRA adapter** that fine-tunes
[`Qwen/Qwen2.5-Coder-7B-Instruct`](https://huggingface.co/Qwen/Qwen2.5-Coder-7B-Instruct)
to **explain Python code in plain, beginner-friendly English** — it gives the
overall purpose, then walks through the code part by part, explaining each
programming term as it goes (for someone with zero Python knowledge).

Part of the **PyExplain** project.
👉 Code & full pipeline: https://github.com/AyushPatel2803/PyExplain

## How to use

```python
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig
from peft import PeftModel

BASE = "Qwen/Qwen2.5-Coder-7B-Instruct"
ADAPTER = "AyushPatel28/PyExplain-qwen-coder-7b"

bnb = BitsAndBytesConfig(load_in_4bit=True, bnb_4bit_quant_type="nf4",
                         bnb_4bit_compute_dtype=torch.float16)
tok = AutoTokenizer.from_pretrained(BASE)
model = AutoModelForCausalLM.from_pretrained(BASE, quantization_config=bnb, device_map="auto")
model = PeftModel.from_pretrained(model, ADAPTER)

code = "def reverse(s):\n    return s[::-1]"
msgs = [{"role": "system", "content": "Explain Python code simply and accurately."},
        {"role": "user", "content": f"Explain this code:\n```python\n{code}\n```"}]
prompt = tok.apply_chat_template(msgs, tokenize=False, add_generation_prompt=True)
inputs = tok(prompt, return_tensors="pt", add_special_tokens=False).to(model.device)
out = model.generate(**inputs, max_new_tokens=300, do_sample=False)
print(tok.decode(out[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True))