Instructions to use Kamran-56/Qwen2.5-PromptRefiner-Merged with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use Kamran-56/Qwen2.5-PromptRefiner-Merged with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="Kamran-56/Qwen2.5-PromptRefiner-Merged") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("Kamran-56/Qwen2.5-PromptRefiner-Merged") model = AutoModelForCausalLM.from_pretrained("Kamran-56/Qwen2.5-PromptRefiner-Merged") messages = [ {"role": "user", "content": "Who are you?"}, ] inputs = tokenizer.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use Kamran-56/Qwen2.5-PromptRefiner-Merged with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "Kamran-56/Qwen2.5-PromptRefiner-Merged" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Kamran-56/Qwen2.5-PromptRefiner-Merged", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/Kamran-56/Qwen2.5-PromptRefiner-Merged
- SGLang
How to use Kamran-56/Qwen2.5-PromptRefiner-Merged with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "Kamran-56/Qwen2.5-PromptRefiner-Merged" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Kamran-56/Qwen2.5-PromptRefiner-Merged", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "Kamran-56/Qwen2.5-PromptRefiner-Merged" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Kamran-56/Qwen2.5-PromptRefiner-Merged", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use Kamran-56/Qwen2.5-PromptRefiner-Merged with Docker Model Runner:
docker model run hf.co/Kamran-56/Qwen2.5-PromptRefiner-Merged
Qwen2.5-PromptRefiner-Merged
A fully merged fine-tuned version of Qwen2.5-3B-Instruct trained to transform basic, vague user prompts into high-quality, structured, and effective prompts that get significantly better responses from AI systems. Unlike the adapter-only version, this is a complete standalone model — no base model or PEFT library required.
Model Details
Model Description
- Developed by: Kamran (Kamran-56)
- Model type: Causal Language Model (LoRA fine-tuned, fully merged)
- Language(s): English
- License: MIT
- Finetuned from: Qwen/Qwen2.5-3B-Instruct
- Adapter version: Kamran-56/Qwen2.5-3B-PromptRefiner
- Dataset used: Kamran-56/prompt-refinement-dataset
Uses
Direct Use
This model takes a basic user-written prompt as input and returns an enhanced, detailed, and well-structured version of the same prompt. It adds role, context, task, format, and constraints to any vague input prompt.
Downstream Use
- Chrome extensions that auto-enhance prompts on AI platforms like ChatGPT, Claude, and Gemini
- API middleware that improves prompts before forwarding to LLMs
- HuggingFace Inference API — can be called directly, no extra setup needed
- Productivity tools that help non-technical users write better prompts
Out-of-Scope Use
- Not designed to answer questions or generate general content
- Not suitable for tasks outside of prompt enhancement
- Not trained on non-English prompts
How to Get Started with the Model
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
MODEL = "Kamran-56/Qwen2.5-PromptRefiner-Merged"
tokenizer = AutoTokenizer.from_pretrained(MODEL, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
MODEL,
dtype=torch.bfloat16,
device_map="auto",
trust_remote_code=True
)
model.eval()
def enhance_prompt(bad_prompt):
input_text = f"""<|im_start|>system
You are a world-class prompt engineer with deep expertise across all domains
including coding, writing, business, creativity, and research.
Your job is to transform any basic user prompt into a highly specific,
structured, and effective prompt that will get the best possible response from an AI.
Every enhanced prompt MUST include:
1. A clear ROLE → "Act as a [specific expert]..."
2. Clear CONTEXT → describe the situation in detail
3. Specific TASK → exactly what needs to be done
4. FORMAT → how the response should be structured
5. CONSTRAINTS → tone, length, style, or any boundaries
Rules:
- Keep the original intent and topic
- Be specific, never generic
- Return ONLY the enhanced prompt, nothing else
- No intro, no explanation, no meta-commentary<|im_end|>
<|im_start|>user
{bad_prompt}<|im_end|>
<|im_start|>assistant
"""
inputs = tokenizer(input_text, return_tensors="pt").to("cuda")
with torch.no_grad():
outputs = model.generate(
**inputs,
max_new_tokens=200,
do_sample=True,
temperature=0.7,
top_p=0.9,
repetition_penalty=1.2,
pad_token_id=tokenizer.eos_token_id
)
full_output = tokenizer.decode(outputs[0], skip_special_tokens=True)
return full_output.split("assistant")[-1].strip()
print(enhance_prompt("fix my code"))
Training Details
Training Data
Fine-tuned on Kamran-56/prompt-refinement-dataset containing 1,561 input→output pairs of basic prompts mapped to high-quality enhanced prompts generated using Llama 3.1 via Groq API.
Training Procedure
Training Hyperparameters
- Training regime: bf16 mixed precision
- Fine-tuning method: LoRA (PEFT) — merged into base model after training
- LoRA rank (r): 8
- LoRA alpha: 16
- Target modules: q_proj, k_proj, v_proj, o_proj
- Learning rate: 5e-5
- Epochs: 3
- Batch size: 4
- Gradient accumulation steps: 4
- Effective batch size: 16
- LR scheduler: Cosine
- Warmup steps: 50
- Max sequence length: 512
Speeds, Sizes, Times
- Hardware: Kaggle P100 GPU (16GB VRAM)
- Training time: ~45 minutes
- Final training loss: 1.24
- Merging method: merge_and_unload() via PEFT
Evaluation
Testing Data
Evaluated manually on 12 diverse prompts spanning coding, writing, professional, creative, and general categories.
Results
| Input Prompt | Output Quality |
|---|---|
"fix my code" |
Added role, step-by-step format, context, constraints ✅ |
"write a poem" |
Added poet role, structure, tone, rhyme scheme ✅ |
"write an email" |
Added professional tone, structure, length constraint ✅ |
Average quality rating: 8.5/10 based on manual evaluation.
Difference From Adapter Version
| Property | Kamran-56/Qwen2.5-3B-PromptRefiner | This Model |
|---|---|---|
| Type | LoRA Adapter only | Full merged model |
| Requires base model | ✅ Yes | ❌ No |
| Requires PEFT library | ✅ Yes | ❌ No |
| HF Inference API | ❌ No | ✅ Yes |
| File size | ~20MB | ~6GB |
Citation
@model{qwen25_promptrefiner_merged,
author = {Kamran},
title = {Qwen2.5-PromptRefiner-Merged},
year = {2025},
publisher = {HuggingFace},
url = {https://huggingface.co/Kamran-56/Qwen2.5-PromptRefiner-Merged}
}
- Downloads last month
- -