Spaces:
Sleeping
Sleeping
A newer version of the Gradio SDK is available: 6.12.0
Quick Start for New Sessions
What is Prompt Squirrel?
A RAG system that converts natural language prompts β e621-style tags for furry art generation.
Three-Stage Pipeline
- Stage 1 (Rewrite): Natural language β tag-shaped phrases (LLM)
- Stage 2 (Retrieval): Phrases β candidate tags (FastText + TF-IDF/SVD, closed vocab)
- Stage 3 (Selection): Candidates β final selected tags (LLM)
- Stage 3s (Structural): Selected tags β structural inferences (optional, e.g., clothing β topless)
Latest Features (Feb 13-14, 2026)
- Tag Categorization: Organized suggestions by e621 checklist categories (species, clothing, posture, etc.)
- Category Parser: Parses checklist with tiers (CRITICAL/IMPORTANT/NICE_TO_HAVE/META) and constraints
- Evaluation Metrics: Per-category P/R/F1, ranking metrics (MRR, P@K, nDCG)
- Multi-select Constraints: Fixed body_type, species, gender to allow multiple tags
Key Files
app.py- Gradio web interfacepsq_rag/tagging/categorized_suggestions.py- Category-based tag suggestionspsq_rag/tagging/category_parser.py- Parse e621 checklistscripts/eval_pipeline.py- Main evaluation harnessscripts/eval_categorized.py- Per-category metricsscripts/analyze_threshold_grid.py- Threshold grid analysis (score/global rank/phrase rank)scripts/analyze_caption_evident_audit.py- Caption-evident audit vs retrievaldocs/retrieval_contract.md- Stage 2 specdocs/stage3_contract.md- Stage 3 spectagging_checklist.txt- E621 tagging guidelines
Running Code
# Always from repo root
.venv/Scripts/python.exe -m pip install -r requirements.txt # Windows
.venv/Scripts/python.exe app.py
Recent Git History (Last 5 commits)
0f73a4b - Fix eval_categorized.py to work with eval_pipeline.py output
ff407fc - Remove binary PNG files (use Hugging Face XET storage instead)
8ba971a - Add eval results for debugging
51b7109 - Add ranking metrics infrastructure to eval pipeline
edba146 - Add per-category evaluation metrics script
Key Contracts to Remember
- Stage boundaries are strict: Don't mix retrieval (Stage 2) with selection (Stage 3)
- Keep diffs small: One focused change per commit
- Code matches contracts: Update code to match docs, not vice versa
- No feature flags: Delete old code paths, no legacy behavior switches
Quick Orientation Commands
# View project structure
ls -la
# View recent commits
git log --oneline -10
# Check current branch
git branch
# List Python modules
ls -la psq_rag/
# View evaluation results
ls -la data/eval_results/
Common Tasks
- Add category: Edit
tagging_checklist.txt, update parser - Eval changes: Run
scripts/eval_pipeline.py, thenscripts/eval_categorized.py - Threshold sweeps: Run
scripts/analyze_threshold_grid.py(see--mode score|rank|phrase_rank) - Caption-evident audit: Run
scripts/analyze_caption_evident_audit.py - Test retrieval: Use
scripts/smoke_test.py - Debug Stage 3: Use
scripts/stage3_debug.py(--phrasesoptional; omitted runs Stage 1 rewrite first, then Stage 2 retrieval from rewritten phrases)
Data Artifacts (Lazy-loaded)
- FastText embeddings (semantic similarity)
- TF-IDF + SVD matrices (context similarity)
- Alias β canonical tag mappings
- Tag counts, implications, groups, wiki definitions
Eval Datasets
data/eval_samples/e621_sfw_sample_1000_seed123_buffer10000_expanded.jsonl- Base eval set (implication-expanded GT)data/eval_samples/e621_sfw_sample_1000_seed123_buffer10000_caption_evident.jsonl- Caption-evident GT subset (10 samples); used to estimate retrieval ceiling from text
New Eval Features (Feb 2026)
eval_pipeline.pynow logs Stage 3 selection scores and ranks:stage3_selected_scores(retrieval score)stage3_selected_ranks(global rank)stage3_selected_phrase_ranks(per-phrase rank)
- New CLI flag:
--per-phrase-final-kto control per-phrase retrieval cap
NSFW Handling
- Filtered via
word_rating_probabilities.csv(threshold 0.95) - Stage 2 removes NSFW tags when
allow_nsfw_tags=False - Stage 3 doesn't need policy flags (defense-in-depth only)