Instructions to use dejanseo/reverse-prompter with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use dejanseo/reverse-prompter with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="dejanseo/reverse-prompter")

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("dejanseo/reverse-prompter")
model = AutoModelForCausalLM.from_pretrained("dejanseo/reverse-prompter")

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use dejanseo/reverse-prompter with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "dejanseo/reverse-prompter"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "dejanseo/reverse-prompter",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker

docker model run hf.co/dejanseo/reverse-prompter

SGLang

How to use dejanseo/reverse-prompter with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "dejanseo/reverse-prompter" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "dejanseo/reverse-prompter",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "dejanseo/reverse-prompter" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "dejanseo/reverse-prompter",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Docker Model Runner
How to use dejanseo/reverse-prompter with Docker Model Runner:
```
docker model run hf.co/dejanseo/reverse-prompter
```

reverse-prompter / README.md

dejanseo

Update README.md

98e134e verified about 2 months ago

preview code

raw

history blame contribute delete

4.2 kB

	---
	license: other
	license_name: link-attribution
	license_link: https://dejanmarketing.com/link-attribution/
	library_name: transformers
	base_model: google/gemma-3-270m
	tags:
	- reverse-prompting
	- prompt-reconstruction
	- gemma
	- text-generation
	pipeline_tag: text-generation
	---

	# Reverse Prompter

	A fine-tuned [google/gemma-3-270m](https://huggingface.co/google/gemma-3-270m) model that reconstructs the most likely prompt from an AI assistant's response.

	Given an AI-generated text, the model generates candidate prompts that could have produced it.

	## How It Works

	The model was trained on prompt-response pairs formatted as:

	```
	{response}\n###\n{prompt}
	```

	At inference time, you provide the response followed by the `\n###\n` separator, and the model generates the reconstructed prompt.

	## Usage

	```python
	from transformers import AutoModelForCausalLM, AutoTokenizer

	model = AutoModelForCausalLM.from_pretrained("dejanseo/reverse-prompter", torch_dtype="bfloat16").cuda().eval()
	tokenizer = AutoTokenizer.from_pretrained("dejanseo/reverse-prompter")

	response_text = "Your AI-generated text here"
	prompt = response_text.strip() + "\n###\n"

	inputs = tokenizer(prompt, return_tensors="pt", add_special_tokens=False).to(model.device)
	outputs = model.generate(**inputs, max_new_tokens=256, penalty_alpha=0.3, top_k=4)

	generated = outputs[0][inputs["input_ids"].shape[-1]:]
	reconstructed_prompt = tokenizer.decode(generated, skip_special_tokens=True).strip()
	print(reconstructed_prompt)
	```

	For best results, run generation across multiple contrastive search configurations and rank outputs by perplexity. See the companion Streamlit app for a full implementation.

	## Training Data

	The training dataset was generated synthetically using Gemini 2.5 Flash via Vertex AI in a three-stage pipeline:

	### 1. Prompt Generation

	100,000 diverse prompts were generated across five categories (20 each per batch):

	- Mid-tail, search query style (single or multi-faceted)
	- Long-tail, search query style (multi-faceted)
	- Simple, prompt-like (single-faceted)
	- Typical, prompt-like (single or multi-faceted)
	- Detailed, prompt-like (multi-faceted)

	Generation was parallelized with 100 concurrent API calls in batches of 100 prompts, with results stored in SQLite.

	### 2. Response Generation

	Each prompt was sent back to Gemini 2.5 Flash (with thinking disabled) to produce a corresponding AI assistant response. This was also parallelized at 100 concurrent calls.

	### 3. Tokenization

	Prompt-response pairs were formatted as `{response}\n###\n{prompt}<eos>` and tokenized using the Gemma 3 tokenizer. Labels were masked (`-100`) over the response and separator tokens so the model only learns to predict the prompt portion. Tokenization was done in batches of 5,000 and concatenated into the final dataset.

	## Training Details

	\| Parameter \| Value \|
	\|---\|---\|
	\| Base model \| google/gemma-3-270m \|
	\| Method \| Full fine-tune \|
	\| Precision \| bfloat16 \|
	\| Epochs \| 1 \|
	\| Batch size \| 2 \|
	\| Gradient accumulation \| 8 (effective batch size 16) \|
	\| Learning rate \| 5e-5 \|
	\| Warmup steps \| 100 \|
	\| Max sequence length \| 2048 \|
	\| Optimizer \| AdamW (torch fused) \|
	\| Gradient checkpointing \| Enabled \|
	\| Training time \| 4h 14m \|
	\| GPU \| NVIDIA GeForce RTX 4090 (24 GB) \|
	\| CPU \| AMD Ryzen 9 7950X3D 16-Core \|
	\| RAM \| 128 GB \|

	### Training Loss

	![Training Loss](assets/train-loss.png)

	## Inference Strategy

	The companion app uses contrastive search with a sweep over configurations:

	- `top_k`: [2, 4, 6, 15]
	- `penalty_alpha`: [0.1, 0.2, 0.3, 0.4, 0.5, 0.6]

	This produces up to 24 candidate prompts per input. Candidates are deduplicated and ranked by perplexity (lower is better). Token-level probabilities provide a confidence signal for each word in the reconstruction.

	## Limitations

	- Prompt reconstruction is inherently probabilistic. The model returns plausible prompts, not necessarily the exact original.
	- Performance is best on responses typical of AI assistants. Non-standard or very short inputs may produce lower-quality reconstructions.
	- The model inherits the capabilities and limitations of the gemma-3-270m base model.

	## Author

	[Dejan AI](https://dejan.ai/)