Spaces:

FoodDesert
/

Prompt_Squirrel_RAG

Sleeping

App Files Files Community

Prompt_Squirrel_RAG / SESSION_QUICKSTART.md

Food Desert

Add eval audit tools, caption-evident set, and logging

73f56cf about 2 months ago

preview code

raw

history blame contribute delete

4.3 kB

A newer version of the Gradio SDK is available: 6.12.0

Upgrade

Quick Start for New Sessions

What is Prompt Squirrel?

A RAG system that converts natural language prompts → e621-style tags for furry art generation.

Three-Stage Pipeline

Stage 1 (Rewrite): Natural language → tag-shaped phrases (LLM)
Stage 2 (Retrieval): Phrases → candidate tags (FastText + TF-IDF/SVD, closed vocab)
Stage 3 (Selection): Candidates → final selected tags (LLM)
Stage 3s (Structural): Selected tags → structural inferences (optional, e.g., clothing → topless)

Latest Features (Feb 13-14, 2026)

Tag Categorization: Organized suggestions by e621 checklist categories (species, clothing, posture, etc.)
Category Parser: Parses checklist with tiers (CRITICAL/IMPORTANT/NICE_TO_HAVE/META) and constraints
Evaluation Metrics: Per-category P/R/F1, ranking metrics (MRR, P@K, nDCG)
Multi-select Constraints: Fixed body_type, species, gender to allow multiple tags

Key Files

app.py - Gradio web interface
psq_rag/tagging/categorized_suggestions.py - Category-based tag suggestions
psq_rag/tagging/category_parser.py - Parse e621 checklist
scripts/eval_pipeline.py - Main evaluation harness
scripts/eval_categorized.py - Per-category metrics
scripts/analyze_threshold_grid.py - Threshold grid analysis (score/global rank/phrase rank)
scripts/analyze_caption_evident_audit.py - Caption-evident audit vs retrieval
docs/retrieval_contract.md - Stage 2 spec
docs/stage3_contract.md - Stage 3 spec
tagging_checklist.txt - E621 tagging guidelines

Running Code

# Always from repo root
.venv/Scripts/python.exe -m pip install -r requirements.txt  # Windows
.venv/Scripts/python.exe app.py

Recent Git History (Last 5 commits)

0f73a4b - Fix eval_categorized.py to work with eval_pipeline.py output
ff407fc - Remove binary PNG files (use Hugging Face XET storage instead)
8ba971a - Add eval results for debugging
51b7109 - Add ranking metrics infrastructure to eval pipeline
edba146 - Add per-category evaluation metrics script

Key Contracts to Remember

Stage boundaries are strict: Don't mix retrieval (Stage 2) with selection (Stage 3)
Keep diffs small: One focused change per commit
Code matches contracts: Update code to match docs, not vice versa
No feature flags: Delete old code paths, no legacy behavior switches

Quick Orientation Commands

# View project structure
ls -la

# View recent commits
git log --oneline -10

# Check current branch
git branch

# List Python modules
ls -la psq_rag/

# View evaluation results
ls -la data/eval_results/

Common Tasks

Add category: Edit tagging_checklist.txt, update parser
Eval changes: Run scripts/eval_pipeline.py, then scripts/eval_categorized.py
Threshold sweeps: Run scripts/analyze_threshold_grid.py (see --mode score|rank|phrase_rank)
Caption-evident audit: Run scripts/analyze_caption_evident_audit.py
Test retrieval: Use scripts/smoke_test.py
Debug Stage 3: Use scripts/stage3_debug.py (--phrases optional; omitted runs Stage 1 rewrite first, then Stage 2 retrieval from rewritten phrases)

Data Artifacts (Lazy-loaded)

FastText embeddings (semantic similarity)
TF-IDF + SVD matrices (context similarity)
Alias → canonical tag mappings
Tag counts, implications, groups, wiki definitions

Eval Datasets

data/eval_samples/e621_sfw_sample_1000_seed123_buffer10000_expanded.jsonl - Base eval set (implication-expanded GT)
data/eval_samples/e621_sfw_sample_1000_seed123_buffer10000_caption_evident.jsonl - Caption-evident GT subset (10 samples); used to estimate retrieval ceiling from text

New Eval Features (Feb 2026)

eval_pipeline.py now logs Stage 3 selection scores and ranks:
- stage3_selected_scores (retrieval score)
- stage3_selected_ranks (global rank)
- stage3_selected_phrase_ranks (per-phrase rank)
New CLI flag: --per-phrase-final-k to control per-phrase retrieval cap

NSFW Handling

Filtered via word_rating_probabilities.csv (threshold 0.95)
Stage 2 removes NSFW tags when allow_nsfw_tags=False
Stage 3 doesn't need policy flags (defense-in-depth only)