Instructions to use Kamran-56/Qwen2.5-PromptRefiner-Merged with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use Kamran-56/Qwen2.5-PromptRefiner-Merged with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="Kamran-56/Qwen2.5-PromptRefiner-Merged")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("Kamran-56/Qwen2.5-PromptRefiner-Merged")
model = AutoModelForCausalLM.from_pretrained("Kamran-56/Qwen2.5-PromptRefiner-Merged")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use Kamran-56/Qwen2.5-PromptRefiner-Merged with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "Kamran-56/Qwen2.5-PromptRefiner-Merged"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Kamran-56/Qwen2.5-PromptRefiner-Merged",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/Kamran-56/Qwen2.5-PromptRefiner-Merged

SGLang

How to use Kamran-56/Qwen2.5-PromptRefiner-Merged with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "Kamran-56/Qwen2.5-PromptRefiner-Merged" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Kamran-56/Qwen2.5-PromptRefiner-Merged",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "Kamran-56/Qwen2.5-PromptRefiner-Merged" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Kamran-56/Qwen2.5-PromptRefiner-Merged",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use Kamran-56/Qwen2.5-PromptRefiner-Merged with Docker Model Runner:
```
docker model run hf.co/Kamran-56/Qwen2.5-PromptRefiner-Merged
```

You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Qwen2.5-PromptRefiner-Merged

A fully merged fine-tuned version of Qwen2.5-3B-Instruct trained to transform basic, vague user prompts into high-quality, structured, and effective prompts that get significantly better responses from AI systems. Unlike the adapter-only version, this is a complete standalone model — no base model or PEFT library required.

Model Details

Model Description

Developed by: Kamran (Kamran-56)
Model type: Causal Language Model (LoRA fine-tuned, fully merged)
Language(s): English
License: MIT
Finetuned from: Qwen/Qwen2.5-3B-Instruct
Adapter version: Kamran-56/Qwen2.5-3B-PromptRefiner
Dataset used: Kamran-56/prompt-refinement-dataset

Uses

Direct Use

This model takes a basic user-written prompt as input and returns an enhanced, detailed, and well-structured version of the same prompt. It adds role, context, task, format, and constraints to any vague input prompt.

Downstream Use

Chrome extensions that auto-enhance prompts on AI platforms like ChatGPT, Claude, and Gemini
API middleware that improves prompts before forwarding to LLMs
HuggingFace Inference API — can be called directly, no extra setup needed
Productivity tools that help non-technical users write better prompts

Out-of-Scope Use

Not designed to answer questions or generate general content
Not suitable for tasks outside of prompt enhancement
Not trained on non-English prompts

How to Get Started with the Model

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

MODEL = "Kamran-56/Qwen2.5-PromptRefiner-Merged"

tokenizer = AutoTokenizer.from_pretrained(MODEL, trust_remote_code=True)

model = AutoModelForCausalLM.from_pretrained(
    MODEL,
    dtype=torch.bfloat16,
    device_map="auto",
    trust_remote_code=True
)
model.eval()

def enhance_prompt(bad_prompt):
    input_text = f"""<|im_start|>system
You are a world-class prompt engineer with deep expertise across all domains
including coding, writing, business, creativity, and research.

Your job is to transform any basic user prompt into a highly specific,
structured, and effective prompt that will get the best possible response from an AI.

Every enhanced prompt MUST include:
1. A clear ROLE  → "Act as a [specific expert]..."
2. Clear CONTEXT → describe the situation in detail
3. Specific TASK → exactly what needs to be done
4. FORMAT        → how the response should be structured
5. CONSTRAINTS   → tone, length, style, or any boundaries

Rules:
- Keep the original intent and topic
- Be specific, never generic
- Return ONLY the enhanced prompt, nothing else
- No intro, no explanation, no meta-commentary<|im_end|>
<|im_start|>user
{bad_prompt}<|im_end|>
<|im_start|>assistant
"""
    inputs = tokenizer(input_text, return_tensors="pt").to("cuda")
    with torch.no_grad():
        outputs = model.generate(
            **inputs,
            max_new_tokens=200,
            do_sample=True,
            temperature=0.7,
            top_p=0.9,
            repetition_penalty=1.2,
            pad_token_id=tokenizer.eos_token_id
        )
    full_output = tokenizer.decode(outputs[0], skip_special_tokens=True)
    return full_output.split("assistant")[-1].strip()

print(enhance_prompt("fix my code"))

Training Details

Training Data

Fine-tuned on Kamran-56/prompt-refinement-dataset containing 1,561 input→output pairs of basic prompts mapped to high-quality enhanced prompts generated using Llama 3.1 via Groq API.

Training Procedure

Training Hyperparameters

Training regime: bf16 mixed precision
Fine-tuning method: LoRA (PEFT) — merged into base model after training
LoRA rank (r): 8
LoRA alpha: 16
Target modules: q_proj, k_proj, v_proj, o_proj
Learning rate: 5e-5
Epochs: 3
Batch size: 4
Gradient accumulation steps: 4
Effective batch size: 16
LR scheduler: Cosine
Warmup steps: 50
Max sequence length: 512

Speeds, Sizes, Times

Hardware: Kaggle P100 GPU (16GB VRAM)
Training time: ~45 minutes
Final training loss: 1.24
Merging method: merge_and_unload() via PEFT

Evaluation

Testing Data

Evaluated manually on 12 diverse prompts spanning coding, writing, professional, creative, and general categories.

Results

Input Prompt	Output Quality
`"fix my code"`	Added role, step-by-step format, context, constraints ✅
`"write a poem"`	Added poet role, structure, tone, rhyme scheme ✅
`"write an email"`	Added professional tone, structure, length constraint ✅

Average quality rating: 8.5/10 based on manual evaluation.

Difference From Adapter Version

Property	Kamran-56/Qwen2.5-3B-PromptRefiner	This Model
Type	LoRA Adapter only	Full merged model
Requires base model	✅ Yes	❌ No
Requires PEFT library	✅ Yes	❌ No
HF Inference API	❌ No	✅ Yes
File size	~20MB	~6GB

Citation

@model{qwen25_promptrefiner_merged,
  author    = {Kamran},
  title     = {Qwen2.5-PromptRefiner-Merged},
  year      = {2025},
  publisher = {HuggingFace},
  url       = {https://huggingface.co/Kamran-56/Qwen2.5-PromptRefiner-Merged}
}

Downloads last month: -

Safetensors

Model size

3B params

Tensor type

BF16

Model tree for Kamran-56/Qwen2.5-PromptRefiner-Merged

Base model

Qwen/Qwen2.5-3B

Finetuned

Qwen/Qwen2.5-3B-Instruct