Text Generation
Transformers
Safetensors
gemma3_text
reverse-prompting
prompt-reconstruction
gemma
text-generation-inference
Instructions to use dejanseo/reverse-prompter with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use dejanseo/reverse-prompter with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="dejanseo/reverse-prompter")# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("dejanseo/reverse-prompter") model = AutoModelForCausalLM.from_pretrained("dejanseo/reverse-prompter") - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use dejanseo/reverse-prompter with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "dejanseo/reverse-prompter" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "dejanseo/reverse-prompter", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/dejanseo/reverse-prompter
- SGLang
How to use dejanseo/reverse-prompter with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "dejanseo/reverse-prompter" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "dejanseo/reverse-prompter", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "dejanseo/reverse-prompter" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "dejanseo/reverse-prompter", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use dejanseo/reverse-prompter with Docker Model Runner:
docker model run hf.co/dejanseo/reverse-prompter
File size: 4,197 Bytes
0373a97 eeb56d0 0373a97 b040c82 0373a97 b040c82 6a57787 30ff154 0373a97 98e134e 0373a97 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 | ---
license: other
license_name: link-attribution
license_link: https://dejanmarketing.com/link-attribution/
library_name: transformers
base_model: google/gemma-3-270m
tags:
- reverse-prompting
- prompt-reconstruction
- gemma
- text-generation
pipeline_tag: text-generation
---
# Reverse Prompter
A fine-tuned [google/gemma-3-270m](https://huggingface.co/google/gemma-3-270m) model that reconstructs the most likely prompt from an AI assistant's response.
Given an AI-generated text, the model generates candidate prompts that could have produced it.
## How It Works
The model was trained on prompt-response pairs formatted as:
```
{response}\n###\n{prompt}
```
At inference time, you provide the response followed by the `\n###\n` separator, and the model generates the reconstructed prompt.
## Usage
```python
from transformers import AutoModelForCausalLM, AutoTokenizer
model = AutoModelForCausalLM.from_pretrained("dejanseo/reverse-prompter", torch_dtype="bfloat16").cuda().eval()
tokenizer = AutoTokenizer.from_pretrained("dejanseo/reverse-prompter")
response_text = "Your AI-generated text here"
prompt = response_text.strip() + "\n###\n"
inputs = tokenizer(prompt, return_tensors="pt", add_special_tokens=False).to(model.device)
outputs = model.generate(**inputs, max_new_tokens=256, penalty_alpha=0.3, top_k=4)
generated = outputs[0][inputs["input_ids"].shape[-1]:]
reconstructed_prompt = tokenizer.decode(generated, skip_special_tokens=True).strip()
print(reconstructed_prompt)
```
For best results, run generation across multiple contrastive search configurations and rank outputs by perplexity. See the companion Streamlit app for a full implementation.
## Training Data
The training dataset was generated synthetically using Gemini 2.5 Flash via Vertex AI in a three-stage pipeline:
### 1. Prompt Generation
100,000 diverse prompts were generated across five categories (20 each per batch):
- Mid-tail, search query style (single or multi-faceted)
- Long-tail, search query style (multi-faceted)
- Simple, prompt-like (single-faceted)
- Typical, prompt-like (single or multi-faceted)
- Detailed, prompt-like (multi-faceted)
Generation was parallelized with 100 concurrent API calls in batches of 100 prompts, with results stored in SQLite.
### 2. Response Generation
Each prompt was sent back to Gemini 2.5 Flash (with thinking disabled) to produce a corresponding AI assistant response. This was also parallelized at 100 concurrent calls.
### 3. Tokenization
Prompt-response pairs were formatted as `{response}\n###\n{prompt}<eos>` and tokenized using the Gemma 3 tokenizer. Labels were masked (`-100`) over the response and separator tokens so the model only learns to predict the prompt portion. Tokenization was done in batches of 5,000 and concatenated into the final dataset.
## Training Details
| Parameter | Value |
|---|---|
| Base model | google/gemma-3-270m |
| Method | Full fine-tune |
| Precision | bfloat16 |
| Epochs | 1 |
| Batch size | 2 |
| Gradient accumulation | 8 (effective batch size 16) |
| Learning rate | 5e-5 |
| Warmup steps | 100 |
| Max sequence length | 2048 |
| Optimizer | AdamW (torch fused) |
| Gradient checkpointing | Enabled |
| Training time | 4h 14m |
| GPU | NVIDIA GeForce RTX 4090 (24 GB) |
| CPU | AMD Ryzen 9 7950X3D 16-Core |
| RAM | 128 GB |
### Training Loss

## Inference Strategy
The companion app uses contrastive search with a sweep over configurations:
- `top_k`: [2, 4, 6, 15]
- `penalty_alpha`: [0.1, 0.2, 0.3, 0.4, 0.5, 0.6]
This produces up to 24 candidate prompts per input. Candidates are deduplicated and ranked by perplexity (lower is better). Token-level probabilities provide a confidence signal for each word in the reconstruction.
## Limitations
- Prompt reconstruction is inherently probabilistic. The model returns plausible prompts, not necessarily the exact original.
- Performance is best on responses typical of AI assistants. Non-standard or very short inputs may produce lower-quality reconstructions.
- The model inherits the capabilities and limitations of the gemma-3-270m base model.
## Author
[Dejan AI](https://dejan.ai/)
|