TokenBender/code_instructions_122k_alpaca_style
Viewer • Updated • 122k • 1.88k • 80
How to use koushikkb12/Qwen2.5-7B-Code-LoRA with PEFT:
from peft import PeftModel
from transformers import AutoModelForCausalLM
base_model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen2.5-7B")
model = PeftModel.from_pretrained(base_model, "koushikkb12/Qwen2.5-7B-Code-LoRA")How to use koushikkb12/Qwen2.5-7B-Code-LoRA with Transformers:
# Use a pipeline as a high-level helper
from transformers import pipeline
pipe = pipeline("text-generation", model="koushikkb12/Qwen2.5-7B-Code-LoRA")
messages = [
{"role": "user", "content": "Who are you?"},
]
pipe(messages) # Load model directly
from transformers import AutoModel
model = AutoModel.from_pretrained("koushikkb12/Qwen2.5-7B-Code-LoRA", dtype="auto")How to use koushikkb12/Qwen2.5-7B-Code-LoRA with vLLM:
# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "koushikkb12/Qwen2.5-7B-Code-LoRA"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
-H "Content-Type: application/json" \
--data '{
"model": "koushikkb12/Qwen2.5-7B-Code-LoRA",
"messages": [
{
"role": "user",
"content": "What is the capital of France?"
}
]
}'docker model run hf.co/koushikkb12/Qwen2.5-7B-Code-LoRA
How to use koushikkb12/Qwen2.5-7B-Code-LoRA with SGLang:
# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
--model-path "koushikkb12/Qwen2.5-7B-Code-LoRA" \
--host 0.0.0.0 \
--port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
-H "Content-Type: application/json" \
--data '{
"model": "koushikkb12/Qwen2.5-7B-Code-LoRA",
"messages": [
{
"role": "user",
"content": "What is the capital of France?"
}
]
}'docker run --gpus all \
--shm-size 32g \
-p 30000:30000 \
-v ~/.cache/huggingface:/root/.cache/huggingface \
--env "HF_TOKEN=<secret>" \
--ipc=host \
lmsysorg/sglang:latest \
python3 -m sglang.launch_server \
--model-path "koushikkb12/Qwen2.5-7B-Code-LoRA" \
--host 0.0.0.0 \
--port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
-H "Content-Type: application/json" \
--data '{
"model": "koushikkb12/Qwen2.5-7B-Code-LoRA",
"messages": [
{
"role": "user",
"content": "What is the capital of France?"
}
]
}'How to use koushikkb12/Qwen2.5-7B-Code-LoRA with Docker Model Runner:
docker model run hf.co/koushikkb12/Qwen2.5-7B-Code-LoRA
LoRA adapter fine-tuned on Qwen/Qwen2.5-7B for Python code generation.
| Detail | Value |
|---|---|
| Base Model | Qwen/Qwen2.5-7B (7.6B params) |
| Method | LoRA (rank 128, alpha 64) |
| Trainable Params | 323M / 7.9B (4.07%) |
| Dataset | 122K code instructions (Alpaca format) |
| Train / Eval Split | 115,861 / 6,098 |
| Epochs | 3 |
| Effective Batch Size | 32 (16 x 2 gradient accumulation) |
| Learning Rate | 2e-4 (cosine schedule, 3% warmup) |
| Max Sequence Length | 2048 (with packing) |
| Precision | bf16 |
| Training Time | ~4 hrs 26 min |
| GPU | NVIDIA RTX PRO 6000 Blackwell (96 GB) |
| Best Eval Loss | 0.7324 (step 600) |
| Eval Token Accuracy | 82.7% |
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel
import torch
base = AutoModelForCausalLM.from_pretrained(
"Qwen/Qwen2.5-7B", dtype=torch.bfloat16, device_map="auto"
)
model = PeftModel.from_pretrained(base, "koushikkb12/Qwen2.5-7B-Code-LoRA")
model = model.merge_and_unload()
tokenizer = AutoTokenizer.from_pretrained("koushikkb12/Qwen2.5-7B-Code-LoRA")
messages = [
{"role": "system", "content": "You are a highly skilled Python programmer. Write clean, efficient, and correct code."},
{"role": "user", "content": "Write a function to flatten a nested list."},
]
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(text, return_tensors="pt").to(model.device)
with torch.no_grad():
out = model.generate(**inputs, max_new_tokens=256, temperature=0.7, top_p=0.9)
print(tokenizer.decode(out[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True))
Prompt: Write a Python function that checks if a given string is a palindrome.
def is_palindrome(s):
return s == s[::-1]
Prompt: Implement a binary search algorithm in Python.
def binary_search(arr, target):
low, high = 0, len(arr) - 1
while low <= high:
mid = (low + high) // 2
if arr[mid] == target:
return mid
elif arr[mid] < target:
low = mid + 1
else:
high = mid - 1
return -1
Prompt: Write a Python class for a stack with push, pop, peek, and is_empty.
class Stack:
def __init__(self):
self.items = []
def push(self, item):
self.items.append(item)
def pop(self):
if not self.is_empty():
return self.items.pop()
def peek(self):
if not self.is_empty():
return self.items[-1]
def is_empty(self):
return len(self.items) == 0
| Step | Train Loss | Eval Loss | Eval Token Acc |
|---|---|---|---|
| 100 | 0.758 | 0.741 | 82.6% |
| 200 | 0.749 | 0.737 | 82.6% |
| 300 | 0.739 | 0.734 | 82.7% |
| 400 | 0.731 | 0.734 | 82.7% |
| 500 | 0.720 | 0.733 | 82.7% |
| 600 | 0.718 | 0.732 | 82.7% |
| 700 | 0.725 | 0.734 | 82.7% |
| 800 | 0.710 | 0.734 | 82.7% |
| 900 | 0.739 | 0.734 | 82.7% |
This adapter inherits the Apache 2.0 license from Qwen2.5-7B.
Base model
Qwen/Qwen2.5-7B