Spaces:

FoodDesert
/

Prompt_Squirrel_RAG

Running

App Files Files Community

Prompt_Squirrel_RAG / docs /space_overview.md

Food Desert

Inject architecture section only in accordion, keep docs file clean

eef9f1d 6 days ago

preview code

raw

history blame contribute delete

4.07 kB

	# Prompt Squirrel RAG: System Overview

	This document explains what Prompt Squirrel does, why it is structured this way, and how data moves through the system.

	## Purpose

	Prompt Squirrel converts a rough natural-language prompt into a structured, editable tag list from a fixed image-tag vocabulary, then lets the user refine that list interactively.

	Design goals:

	- Keep generation grounded in a closed tag vocabulary.
	- Balance recall (find good candidates) with precision (avoid bad tags).
	- Keep UI editable so users remain in control.
	- Run reliably in a Hugging Face Space with constrained resources.

	## What Each Step Does

	- `Rewrite`:
	Turns the user prompt into short, tag-like pseudo-phrases that are easier to match in vector retrieval. These phrases are optimized as search queries for candidate lookup.
	- `Structural Inference`:
	Runs an LLM call over a fixed set of high-level structure tags (for example character count, body type, gender, clothing state, gaze/text). It outputs only the structural tags it believes are supported.
	- `Probe Inference`:
	Runs a separate LLM call over a small, curated set of informative tags. This is a targeted check for tags that are often useful for reranking and final selection.
	- `Retrieval Candidates`:
	Uses the rewrite phrases (plus structural/probe context) to fetch candidate tags from the fixed vocabulary, prioritizing recall.
	- `Closed-Set Selection`:
	Runs an LLM call that can only choose from the retrieved candidate list. It cannot invent new tags.
	- `Implication Expansion`:
	Adds parent/related tags implied by selected tags according to the implication graph.
	- `Ranked Rows`:
	Groups and orders suggested tags into row categories for editing.
	- `Toggle UI and Suggested Prompt`:
	Lets the user turn tags on/off and see the resulting prompt text update immediately.

	## Design Rationale

	- Rewrite and retrieval are separate so search phrase generation stays flexible while candidate generation stays deterministic.
	- Retrieval and closed-set selection are separate to keep high recall first, then apply higher-precision filtering.
	- Structural and probe inference run in parallel with rewrite so they can add context without adding much latency.
	- Users control the final prompt by toggling suggested tags on/off; the prompt text is generated from those toggle states.

	## Data Inputs (Broad)

	- Tag vocabulary and alias mappings
	- Tag counts (frequency)
	- Tag implications graph
	- Group/category mappings for row display
	- Optional wiki definitions (used for hover help)

	## Technologies Used

	- FastText embeddings for semantic tag retrieval.
	- HNSW approximate nearest-neighbor indexes for efficient retrieval at runtime.
	- Reduced TF-IDF vectors for context-aware ranking and row scoring.
	- OpenRouter-served instruction LLMs for rewrite, structural inference, probe inference, and closed-set selection.
	Default model: `mistralai/mistral-small-24b-instruct-2501`, chosen empirically from internal caption-evident test-set comparisons (with model choice remaining configurable).
	- Gradio for the interactive web UI (tag toggles, ranked rows, and suggested prompt text).
	- Python pipeline orchestration with CSV/JSON data sources and implication-graph expansion.

	## Evaluation (Broad)

	Current evaluation style compares selected tags against ground-truth tags on caption-evident samples.

	Primary metrics:

	- Precision: `TP / (TP + FP)`
	- Recall: `TP / (TP + FN)`
	- F1: harmonic mean of precision/recall

	The evaluation focus is practical:

	- Is the returned tag set useful and mostly correct?
	- Does it miss important prompt-evident tags?
	- Does UI ranking surface likely-correct tags early?

	## Evaluation Dataset Snapshot

	- File: `data/eval_samples/e621_sfw_sample_1000_seed123_buffer10000_caption_evident_n30.jsonl`
	- Construction: manually curated caption-evident subset, where ground-truth tags are intended to be directly supported by the caption text.
	- Size: 30 images
	- Total ground-truth tag assignments: 440
	- Unique tags represented: 205
	- Average tags per image: 14.67