Text Generation
Transformers
Safetensors
gemma3_text
reverse-prompting
prompt-reconstruction
gemma
text-generation-inference
Instructions to use dejanseo/reverse-prompter with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use dejanseo/reverse-prompter with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="dejanseo/reverse-prompter")# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("dejanseo/reverse-prompter") model = AutoModelForCausalLM.from_pretrained("dejanseo/reverse-prompter") - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use dejanseo/reverse-prompter with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "dejanseo/reverse-prompter" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "dejanseo/reverse-prompter", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/dejanseo/reverse-prompter
- SGLang
How to use dejanseo/reverse-prompter with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "dejanseo/reverse-prompter" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "dejanseo/reverse-prompter", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "dejanseo/reverse-prompter" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "dejanseo/reverse-prompter", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use dejanseo/reverse-prompter with Docker Model Runner:
docker model run hf.co/dejanseo/reverse-prompter
| license: other | |
| license_name: link-attribution | |
| license_link: https://dejanmarketing.com/link-attribution/ | |
| library_name: transformers | |
| base_model: google/gemma-3-270m | |
| tags: | |
| - reverse-prompting | |
| - prompt-reconstruction | |
| - gemma | |
| - text-generation | |
| pipeline_tag: text-generation | |
| # Reverse Prompter | |
| A fine-tuned [google/gemma-3-270m](https://huggingface.co/google/gemma-3-270m) model that reconstructs the most likely prompt from an AI assistant's response. | |
| Given an AI-generated text, the model generates candidate prompts that could have produced it. | |
| ## How It Works | |
| The model was trained on prompt-response pairs formatted as: | |
| ``` | |
| {response}\n###\n{prompt} | |
| ``` | |
| At inference time, you provide the response followed by the `\n###\n` separator, and the model generates the reconstructed prompt. | |
| ## Usage | |
| ```python | |
| from transformers import AutoModelForCausalLM, AutoTokenizer | |
| model = AutoModelForCausalLM.from_pretrained("dejanseo/reverse-prompter", torch_dtype="bfloat16").cuda().eval() | |
| tokenizer = AutoTokenizer.from_pretrained("dejanseo/reverse-prompter") | |
| response_text = "Your AI-generated text here" | |
| prompt = response_text.strip() + "\n###\n" | |
| inputs = tokenizer(prompt, return_tensors="pt", add_special_tokens=False).to(model.device) | |
| outputs = model.generate(**inputs, max_new_tokens=256, penalty_alpha=0.3, top_k=4) | |
| generated = outputs[0][inputs["input_ids"].shape[-1]:] | |
| reconstructed_prompt = tokenizer.decode(generated, skip_special_tokens=True).strip() | |
| print(reconstructed_prompt) | |
| ``` | |
| For best results, run generation across multiple contrastive search configurations and rank outputs by perplexity. See the companion Streamlit app for a full implementation. | |
| ## Training Data | |
| The training dataset was generated synthetically using Gemini 2.5 Flash via Vertex AI in a three-stage pipeline: | |
| ### 1. Prompt Generation | |
| 100,000 diverse prompts were generated across five categories (20 each per batch): | |
| - Mid-tail, search query style (single or multi-faceted) | |
| - Long-tail, search query style (multi-faceted) | |
| - Simple, prompt-like (single-faceted) | |
| - Typical, prompt-like (single or multi-faceted) | |
| - Detailed, prompt-like (multi-faceted) | |
| Generation was parallelized with 100 concurrent API calls in batches of 100 prompts, with results stored in SQLite. | |
| ### 2. Response Generation | |
| Each prompt was sent back to Gemini 2.5 Flash (with thinking disabled) to produce a corresponding AI assistant response. This was also parallelized at 100 concurrent calls. | |
| ### 3. Tokenization | |
| Prompt-response pairs were formatted as `{response}\n###\n{prompt}<eos>` and tokenized using the Gemma 3 tokenizer. Labels were masked (`-100`) over the response and separator tokens so the model only learns to predict the prompt portion. Tokenization was done in batches of 5,000 and concatenated into the final dataset. | |
| ## Training Details | |
| | Parameter | Value | | |
| |---|---| | |
| | Base model | google/gemma-3-270m | | |
| | Method | Full fine-tune | | |
| | Precision | bfloat16 | | |
| | Epochs | 1 | | |
| | Batch size | 2 | | |
| | Gradient accumulation | 8 (effective batch size 16) | | |
| | Learning rate | 5e-5 | | |
| | Warmup steps | 100 | | |
| | Max sequence length | 2048 | | |
| | Optimizer | AdamW (torch fused) | | |
| | Gradient checkpointing | Enabled | | |
| | Training time | 4h 14m | | |
| | GPU | NVIDIA GeForce RTX 4090 (24 GB) | | |
| | CPU | AMD Ryzen 9 7950X3D 16-Core | | |
| | RAM | 128 GB | | |
| ### Training Loss | |
|  | |
| ## Inference Strategy | |
| The companion app uses contrastive search with a sweep over configurations: | |
| - `top_k`: [2, 4, 6, 15] | |
| - `penalty_alpha`: [0.1, 0.2, 0.3, 0.4, 0.5, 0.6] | |
| This produces up to 24 candidate prompts per input. Candidates are deduplicated and ranked by perplexity (lower is better). Token-level probabilities provide a confidence signal for each word in the reconstruction. | |
| ## Limitations | |
| - Prompt reconstruction is inherently probabilistic. The model returns plausible prompts, not necessarily the exact original. | |
| - Performance is best on responses typical of AI assistants. Non-standard or very short inputs may produce lower-quality reconstructions. | |
| - The model inherits the capabilities and limitations of the gemma-3-270m base model. | |
| ## Author | |
| [Dejan AI](https://dejan.ai/) | |