Instructions to use dejanseo/reverse-prompter with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use dejanseo/reverse-prompter with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="dejanseo/reverse-prompter")

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("dejanseo/reverse-prompter")
model = AutoModelForCausalLM.from_pretrained("dejanseo/reverse-prompter")

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use dejanseo/reverse-prompter with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "dejanseo/reverse-prompter"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "dejanseo/reverse-prompter",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker

docker model run hf.co/dejanseo/reverse-prompter

SGLang

How to use dejanseo/reverse-prompter with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "dejanseo/reverse-prompter" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "dejanseo/reverse-prompter",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "dejanseo/reverse-prompter" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "dejanseo/reverse-prompter",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Docker Model Runner
How to use dejanseo/reverse-prompter with Docker Model Runner:
```
docker model run hf.co/dejanseo/reverse-prompter
```

reverse-prompter

File size: 4,197 Bytes

---
license: other
license_name: link-attribution
license_link: https://dejanmarketing.com/link-attribution/
library_name: transformers
base_model: google/gemma-3-270m
tags:
  - reverse-prompting
  - prompt-reconstruction
  - gemma
  - text-generation
pipeline_tag: text-generation
---

# Reverse Prompter

A fine-tuned [google/gemma-3-270m](https://huggingface.co/google/gemma-3-270m) model that reconstructs the most likely prompt from an AI assistant's response.

Given an AI-generated text, the model generates candidate prompts that could have produced it.

## How It Works

The model was trained on prompt-response pairs formatted as:

```
{response}\n###\n{prompt}
```

At inference time, you provide the response followed by the `\n###\n` separator, and the model generates the reconstructed prompt.

## Usage

```python
from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained("dejanseo/reverse-prompter", torch_dtype="bfloat16").cuda().eval()
tokenizer = AutoTokenizer.from_pretrained("dejanseo/reverse-prompter")

response_text = "Your AI-generated text here"
prompt = response_text.strip() + "\n###\n"

inputs = tokenizer(prompt, return_tensors="pt", add_special_tokens=False).to(model.device)
outputs = model.generate(**inputs, max_new_tokens=256, penalty_alpha=0.3, top_k=4)

generated = outputs[0][inputs["input_ids"].shape[-1]:]
reconstructed_prompt = tokenizer.decode(generated, skip_special_tokens=True).strip()
print(reconstructed_prompt)
```

For best results, run generation across multiple contrastive search configurations and rank outputs by perplexity. See the companion Streamlit app for a full implementation.

## Training Data

The training dataset was generated synthetically using Gemini 2.5 Flash via Vertex AI in a three-stage pipeline:

### 1. Prompt Generation

100,000 diverse prompts were generated across five categories (20 each per batch):

- Mid-tail, search query style (single or multi-faceted)
- Long-tail, search query style (multi-faceted)
- Simple, prompt-like (single-faceted)
- Typical, prompt-like (single or multi-faceted)
- Detailed, prompt-like (multi-faceted)

Generation was parallelized with 100 concurrent API calls in batches of 100 prompts, with results stored in SQLite.

### 2. Response Generation

Each prompt was sent back to Gemini 2.5 Flash (with thinking disabled) to produce a corresponding AI assistant response. This was also parallelized at 100 concurrent calls.

### 3. Tokenization

Prompt-response pairs were formatted as `{response}\n###\n{prompt}<eos>` and tokenized using the Gemma 3 tokenizer. Labels were masked (`-100`) over the response and separator tokens so the model only learns to predict the prompt portion. Tokenization was done in batches of 5,000 and concatenated into the final dataset.

## Training Details

| Parameter | Value |
|---|---|
| Base model | google/gemma-3-270m |
| Method | Full fine-tune |
| Precision | bfloat16 |
| Epochs | 1 |
| Batch size | 2 |
| Gradient accumulation | 8 (effective batch size 16) |
| Learning rate | 5e-5 |
| Warmup steps | 100 |
| Max sequence length | 2048 |
| Optimizer | AdamW (torch fused) |
| Gradient checkpointing | Enabled |
| Training time | 4h 14m |
| GPU | NVIDIA GeForce RTX 4090 (24 GB) |
| CPU | AMD Ryzen 9 7950X3D 16-Core |
| RAM | 128 GB |

### Training Loss

![Training Loss](assets/train-loss.png)

## Inference Strategy

The companion app uses contrastive search with a sweep over configurations:

- `top_k`: [2, 4, 6, 15]
- `penalty_alpha`: [0.1, 0.2, 0.3, 0.4, 0.5, 0.6]

This produces up to 24 candidate prompts per input. Candidates are deduplicated and ranked by perplexity (lower is better). Token-level probabilities provide a confidence signal for each word in the reconstruction.

## Limitations

- Prompt reconstruction is inherently probabilistic. The model returns plausible prompts, not necessarily the exact original.
- Performance is best on responses typical of AI assistants. Non-standard or very short inputs may produce lower-quality reconstructions.
- The model inherits the capabilities and limitations of the gemma-3-270m base model.

## Author

[Dejan AI](https://dejan.ai/)