Claude commited on
Commit
50a9851
·
1 Parent(s): f4f71fe

Add tag categorization pipeline for e621 checklist

Browse files

Implements structured tag suggestion system that organizes recommendations
by category from the e621 tagging checklist.

Key features:
- Parses e621 checklist into structured categories with constraints
- Defines processing tiers: FOUNDATIONAL → CHARACTER → APPEARANCE → SCENE → META
- Implements dependency logic (e.g., skip character tags for zero_pictured)
- Integrates with existing TF-IDF ranking for similarity-based suggestions
- Categories have constraint types: exactly_one, at_most_one, multi_select

Modules:
- psq_rag/tagging/category_parser.py: Parses checklist into TagCategory objects
- psq_rag/tagging/categorized_suggestions.py: Generates categorized suggestions
- scripts/test_parser_only.py: Unit tests for parser (all passing)

Workflow:
1. LLM predicts initial tags from prompt
2. System uses those tags to tf-idf rank remaining tags
3. Results organized by category for user review/selection

https://claude.ai/code/session_015ZwE7a5E6YVTrMpuB2pXX7

psq_rag/tagging/__init__.py ADDED
File without changes
psq_rag/tagging/categorized_suggestions.py ADDED
@@ -0,0 +1,289 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ Generate categorized tag suggestions using TF-IDF similarity rankings.
3
+
4
+ This module takes LLM-selected tags and generates organized suggestions
5
+ for each category from the e621 checklist, ranked by TF-IDF similarity.
6
+ """
7
+ from __future__ import annotations
8
+
9
+ from collections import OrderedDict
10
+ from dataclasses import dataclass
11
+ from pathlib import Path
12
+ from typing import Dict, List, Set, Tuple, Optional
13
+
14
+ from .category_parser import (
15
+ CategoryTier,
16
+ ConstraintType,
17
+ TagCategory,
18
+ parse_checklist,
19
+ should_skip_category,
20
+ )
21
+ from ..retrieval.psq_retrieval import get_tfidf_reduced_similar_tags
22
+
23
+
24
+ @dataclass
25
+ class CategorySuggestions:
26
+ """Suggestions for a single category."""
27
+ category: TagCategory
28
+ suggestions: List[Tuple[str, float]] # [(tag, score), ...]
29
+ already_selected: Set[str] # Tags from this category already in selected_tags
30
+
31
+
32
+ @dataclass
33
+ class CategorizedTagSuggestions:
34
+ """Organized tag suggestions by category."""
35
+ by_category: Dict[str, CategorySuggestions] # category_name -> suggestions
36
+ other_suggestions: List[Tuple[str, float]] # Tags not in any category
37
+ categories: Dict[str, TagCategory] # All category definitions
38
+
39
+
40
+ def load_categories(checklist_path: Optional[Path] = None) -> Dict[str, TagCategory]:
41
+ """
42
+ Load and parse category definitions from checklist.
43
+
44
+ Args:
45
+ checklist_path: Path to checklist file. If None, uses default location.
46
+
47
+ Returns:
48
+ Dict mapping category_name -> TagCategory
49
+ """
50
+ if checklist_path is None:
51
+ # Try to find it in the git repo from the other branch
52
+ import subprocess
53
+ try:
54
+ result = subprocess.run(
55
+ ['git', 'show', 'origin/claude/prompt-squirrel-rag-3PZn7:tagging_checklist.txt'],
56
+ capture_output=True,
57
+ text=True,
58
+ cwd=Path(__file__).parent.parent.parent
59
+ )
60
+ if result.returncode == 0:
61
+ # Write to temp file
62
+ import tempfile
63
+ with tempfile.NamedTemporaryFile(mode='w', delete=False, suffix='.txt') as f:
64
+ f.write(result.stdout)
65
+ checklist_path = Path(f.name)
66
+ except Exception:
67
+ pass
68
+
69
+ if checklist_path is None or not checklist_path.exists():
70
+ raise FileNotFoundError(
71
+ "Could not find tagging_checklist.txt. "
72
+ "Please provide checklist_path or ensure it's in the repository."
73
+ )
74
+
75
+ return parse_checklist(checklist_path)
76
+
77
+
78
+ def build_category_tag_index(categories: Dict[str, TagCategory]) -> Dict[str, str]:
79
+ """
80
+ Build reverse index: tag -> category_name.
81
+
82
+ Args:
83
+ categories: Category definitions
84
+
85
+ Returns:
86
+ Dict mapping tag -> category_name
87
+ """
88
+ tag_to_category = {}
89
+ for cat_name, category in categories.items():
90
+ for tag in category.tags:
91
+ # Normalize tag (the checklist has underscores, TF-IDF might have spaces)
92
+ normalized = tag.replace('_', ' ')
93
+ tag_to_category[normalized] = cat_name
94
+ tag_to_category[tag] = cat_name
95
+
96
+ return tag_to_category
97
+
98
+
99
+ def generate_categorized_suggestions(
100
+ selected_tags: List[str],
101
+ *,
102
+ allow_nsfw_tags: bool = False,
103
+ top_n_per_category: int = 10,
104
+ top_n_other: int = 50,
105
+ checklist_path: Optional[Path] = None,
106
+ ) -> CategorizedTagSuggestions:
107
+ """
108
+ Generate tag suggestions organized by category.
109
+
110
+ Args:
111
+ selected_tags: Tags already selected/predicted by LLM
112
+ allow_nsfw_tags: Whether to include NSFW suggestions
113
+ top_n_per_category: Maximum suggestions per category
114
+ top_n_other: Maximum suggestions in "Other" category
115
+ checklist_path: Optional path to checklist file
116
+
117
+ Returns:
118
+ CategorizedTagSuggestions with organized suggestions
119
+ """
120
+ # Load category definitions
121
+ categories = load_categories(checklist_path)
122
+ tag_to_category = build_category_tag_index(categories)
123
+
124
+ # Get TF-IDF similarity scores for all tags based on selected tags
125
+ from collections import Counter
126
+ # Normalize selected tags (spaces -> underscores for TF-IDF)
127
+ normalized_selected = [tag.replace(' ', '_') for tag in selected_tags]
128
+ pseudo_doc = Counter(normalized_selected)
129
+
130
+ # Get all similar tags with scores
131
+ all_suggestions_ordered = get_tfidf_reduced_similar_tags(
132
+ pseudo_doc,
133
+ allow_nsfw_tags=allow_nsfw_tags
134
+ )
135
+
136
+ # Convert to list of (tag, score)
137
+ all_suggestions = list(all_suggestions_ordered.items())
138
+
139
+ # Track which tags are already selected
140
+ selected_set = set(selected_tags)
141
+ # Also check normalized versions
142
+ selected_set.update(tag.replace('_', ' ') for tag in selected_tags)
143
+ selected_set.update(tag.replace(' ', '_') for tag in selected_tags)
144
+
145
+ # Organize suggestions by category
146
+ by_category: Dict[str, List[Tuple[str, float]]] = {
147
+ cat_name: [] for cat_name in categories.keys()
148
+ }
149
+ other_suggestions: List[Tuple[str, float]] = []
150
+
151
+ for tag, score in all_suggestions:
152
+ # Skip if already selected
153
+ if tag in selected_set or tag.replace('_', ' ') in selected_set:
154
+ continue
155
+
156
+ # Find category
157
+ cat_name = tag_to_category.get(tag)
158
+ if cat_name:
159
+ by_category[cat_name].append((tag, score))
160
+ else:
161
+ other_suggestions.append((tag, score))
162
+
163
+ # Sort each category by score and limit to top N
164
+ sorted_by_category: Dict[str, CategorySuggestions] = {}
165
+
166
+ # Process in tier order
167
+ tier_order = sorted(categories.values(), key=lambda c: c.tier.value)
168
+
169
+ for category in tier_order:
170
+ cat_name = category.name
171
+
172
+ # Check if category should be skipped based on dependencies
173
+ if should_skip_category(category, selected_set, categories):
174
+ continue
175
+
176
+ suggestions = by_category.get(cat_name, [])
177
+
178
+ # Already sorted by score from TF-IDF, just limit
179
+ suggestions = suggestions[:top_n_per_category]
180
+
181
+ # Find tags from this category already selected
182
+ already_selected = set()
183
+ for tag in category.tags:
184
+ if tag in selected_set or tag.replace('_', ' ') in selected_set:
185
+ already_selected.add(tag)
186
+
187
+ sorted_by_category[cat_name] = CategorySuggestions(
188
+ category=category,
189
+ suggestions=suggestions,
190
+ already_selected=already_selected,
191
+ )
192
+
193
+ # Limit "Other" suggestions
194
+ other_suggestions = other_suggestions[:top_n_other]
195
+
196
+ return CategorizedTagSuggestions(
197
+ by_category=sorted_by_category,
198
+ other_suggestions=other_suggestions,
199
+ categories=categories,
200
+ )
201
+
202
+
203
+ def format_suggestions_for_display(
204
+ categorized: CategorizedTagSuggestions,
205
+ show_scores: bool = True,
206
+ ) -> str:
207
+ """
208
+ Format categorized suggestions for display to user.
209
+
210
+ Args:
211
+ categorized: The categorized suggestions
212
+ show_scores: Whether to show similarity scores
213
+
214
+ Returns:
215
+ Formatted string for display
216
+ """
217
+ lines = []
218
+
219
+ # Process categories in tier order
220
+ tier_groups = {}
221
+ for cat_name, cat_sugg in categorized.by_category.items():
222
+ tier = cat_sugg.category.tier
223
+ if tier not in tier_groups:
224
+ tier_groups[tier] = []
225
+ tier_groups[tier].append((cat_name, cat_sugg))
226
+
227
+ for tier in sorted(tier_groups.keys(), key=lambda t: t.value):
228
+ tier_name = tier.name.title()
229
+ lines.append(f"\n{'='*60}")
230
+ lines.append(f"{tier_name} Categories")
231
+ lines.append('='*60)
232
+
233
+ for cat_name, cat_sugg in tier_groups[tier]:
234
+ category = cat_sugg.category
235
+
236
+ # Header
237
+ lines.append(f"\n{category.display_name}")
238
+ lines.append(f" Constraint: {category.constraint.value}")
239
+
240
+ # Already selected tags
241
+ if cat_sugg.already_selected:
242
+ selected_str = ', '.join(sorted(cat_sugg.already_selected))
243
+ lines.append(f" ✓ Selected: {selected_str}")
244
+
245
+ # Suggestions
246
+ if cat_sugg.suggestions:
247
+ lines.append(" Suggestions:")
248
+ for tag, score in cat_sugg.suggestions:
249
+ if show_scores:
250
+ lines.append(f" • {tag} ({score:.3f})")
251
+ else:
252
+ lines.append(f" • {tag}")
253
+ else:
254
+ lines.append(" (no suggestions)")
255
+
256
+ # Other suggestions
257
+ if categorized.other_suggestions:
258
+ lines.append(f"\n{'='*60}")
259
+ lines.append("Other Tags")
260
+ lines.append('='*60)
261
+ for tag, score in categorized.other_suggestions[:20]: # Show top 20
262
+ if show_scores:
263
+ lines.append(f" • {tag} ({score:.3f})")
264
+ else:
265
+ lines.append(f" • {tag}")
266
+
267
+ return '\n'.join(lines)
268
+
269
+
270
+ def get_category_suggestions_dict(
271
+ categorized: CategorizedTagSuggestions
272
+ ) -> Dict[str, List[str]]:
273
+ """
274
+ Get simple dict of category -> suggested tags (without scores).
275
+
276
+ Args:
277
+ categorized: The categorized suggestions
278
+
279
+ Returns:
280
+ Dict mapping category_name -> [tag1, tag2, ...]
281
+ """
282
+ result = {}
283
+
284
+ for cat_name, cat_sugg in categorized.by_category.items():
285
+ result[cat_name] = [tag for tag, _ in cat_sugg.suggestions]
286
+
287
+ result['other'] = [tag for tag, _ in categorized.other_suggestions]
288
+
289
+ return result
psq_rag/tagging/category_parser.py ADDED
@@ -0,0 +1,319 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ Parse e621 tagging checklist into structured category definitions.
3
+ """
4
+ from __future__ import annotations
5
+
6
+ import re
7
+ from dataclasses import dataclass, field
8
+ from enum import Enum
9
+ from pathlib import Path
10
+ from typing import Dict, List, Set, Optional
11
+
12
+
13
+ class ConstraintType(Enum):
14
+ """How many tags from this category can be selected."""
15
+ EXACTLY_ONE = "exactly_one" # e.g., rating: safe/questionable/explicit
16
+ AT_MOST_ONE = "at_most_one" # e.g., primary species (can have multiple for multi-character scenes)
17
+ MULTI_SELECT = "multi" # e.g., perspectives, expressions
18
+ OPTIONAL = "optional" # may or may not apply
19
+
20
+
21
+ class CategoryTier(Enum):
22
+ """Processing order for categories based on dependencies."""
23
+ FOUNDATIONAL = 1 # count, rating
24
+ CHARACTER = 2 # body_type, species, gender
25
+ APPEARANCE = 3 # clothing, pose, expression
26
+ SCENE = 4 # location, perspective
27
+ META = 5 # quality, style, resolution
28
+
29
+
30
+ @dataclass
31
+ class TagCategory:
32
+ """Structured representation of a tag category from the checklist."""
33
+ name: str
34
+ display_name: str
35
+ description: str
36
+ tags: List[str]
37
+ constraint: ConstraintType
38
+ tier: CategoryTier
39
+ depends_on: List[str] = field(default_factory=list) # Category names this depends on
40
+ skip_if: Dict[str, Set[str]] = field(default_factory=dict) # {category: {values}} to skip
41
+
42
+
43
+ def parse_checklist(checklist_path: Path) -> Dict[str, TagCategory]:
44
+ """
45
+ Parse the e621 tagging checklist text file into TagCategory objects.
46
+
47
+ Returns:
48
+ Dict mapping category_name -> TagCategory
49
+ """
50
+ with open(checklist_path, 'r', encoding='utf-8') as f:
51
+ lines = f.readlines()
52
+
53
+ categories = {}
54
+
55
+ # Track current section
56
+ current_section = None
57
+ in_basics = False
58
+ in_explicit = False
59
+ in_pose = False
60
+ in_info = False
61
+
62
+ for i, line in enumerate(lines):
63
+ stripped = line.strip()
64
+
65
+ # Skip header/navigation
66
+ if i < 10:
67
+ continue
68
+
69
+ # Section headers
70
+ if stripped == "Basics":
71
+ in_basics = True
72
+ in_explicit = in_pose = in_info = False
73
+ continue
74
+ elif stripped == "Sexually explicit":
75
+ in_explicit = True
76
+ in_basics = in_pose = in_info = False
77
+ continue
78
+ elif stripped == "Pose / Activity / Appearance":
79
+ in_pose = True
80
+ in_basics = in_explicit = in_info = False
81
+ continue
82
+ elif stripped == "Information and Requests":
83
+ in_info = True
84
+ in_basics = in_explicit = in_pose = False
85
+ continue
86
+ elif stripped in ("Heavily vetted tags.", "Do NOT tag"):
87
+ break # Stop parsing
88
+
89
+ # Skip empty lines
90
+ if not stripped:
91
+ continue
92
+
93
+ # Parse category lines (those with question marks)
94
+ # They must be indented in the original (start with spaces)
95
+ if "?" in stripped and line.startswith(" "):
96
+ # Extract category question and tags
97
+ parts = stripped.split("?", 1)
98
+ if len(parts) != 2:
99
+ continue
100
+
101
+ question = parts[0].strip()
102
+ tags_text = parts[1].strip()
103
+
104
+ # Skip excluded categories
105
+ if question in ("Artist(s)", "Copyright", "Character", "Year of creation"):
106
+ continue
107
+
108
+ # Skip sexually explicit categories entirely
109
+ if in_explicit:
110
+ continue
111
+
112
+ # Special handling for Rating (tags are on following lines, not in description)
113
+ if question == "Rating":
114
+ tags = ["safe", "questionable", "explicit"]
115
+ else:
116
+ # Parse tags from the description
117
+ # Tags are either comma-separated or in parentheses
118
+ tags = parse_tags_from_description(tags_text)
119
+
120
+ if not tags:
121
+ continue
122
+
123
+ # Determine category metadata
124
+ category_name = normalize_category_name(question)
125
+ tier = determine_tier(question, in_basics, in_pose, in_info)
126
+ constraint = determine_constraint(question, tags)
127
+
128
+ # Debug: print categorization
129
+ # print(f"DEBUG: '{question}' -> tier={tier.name}, constraint={constraint.value}")
130
+
131
+ category = TagCategory(
132
+ name=category_name,
133
+ display_name=question,
134
+ description=tags_text,
135
+ tags=tags,
136
+ constraint=constraint,
137
+ tier=tier,
138
+ )
139
+
140
+ categories[category_name] = category
141
+
142
+ # Add dependency information
143
+ add_dependencies(categories)
144
+
145
+ return categories
146
+
147
+
148
+ def parse_tags_from_description(text: str) -> List[str]:
149
+ """Extract individual tags from a description string."""
150
+ # Remove parenthetical explanations
151
+ text = re.sub(r'\([^)]*\)', '', text)
152
+
153
+ # Remove reference text like "See also:", "Common ones:", etc.
154
+ text = re.sub(r'(See also|Common ones|For more|click for)[^.]*\.', '', text, flags=re.IGNORECASE)
155
+
156
+ # Split by comma
157
+ parts = [p.strip() for p in text.split(',')]
158
+
159
+ tags = []
160
+ for part in parts:
161
+ # Clean up
162
+ part = part.strip()
163
+ if not part:
164
+ continue
165
+
166
+ # Remove "tag group:" prefixes
167
+ if part.startswith('tag group:'):
168
+ continue
169
+
170
+ # Remove explanatory text after tags
171
+ if ' for ' in part:
172
+ part = part.split(' for ')[0]
173
+ if ' if ' in part:
174
+ part = part.split(' if ')[0]
175
+ if ' such as ' in part:
176
+ part = part.split(' such as ')[0]
177
+
178
+ part = part.strip()
179
+
180
+ # Extract tags - can be multi-word with underscores or hyphens
181
+ # Tags can have parentheses like pencil_(artwork)
182
+ match = re.match(r'^([a-z_0-9-]+(?:_[a-z_0-9-]+)*(?:_\([a-z]+\))?)', part)
183
+ if match:
184
+ tags.append(match.group(1))
185
+
186
+ return tags
187
+
188
+
189
+ def normalize_category_name(question: str) -> str:
190
+ """Convert question to category name."""
191
+ # Remove question mark and common words
192
+ name = question.lower().replace('?', '').strip()
193
+ name = name.replace('/', '_').replace(' ', '_')
194
+ name = name.replace('(', '').replace(')', '')
195
+
196
+ # Simplify common names
197
+ simplifications = {
198
+ 'sex_gender': 'gender',
199
+ 'how_many': 'count',
200
+ 'quality_medium': 'quality',
201
+ 'picture_organization': 'organization',
202
+ 'text_and_languages': 'text',
203
+ 'image_size': 'resolution',
204
+ }
205
+
206
+ return simplifications.get(name, name)
207
+
208
+
209
+ def determine_tier(question: str, in_basics: bool, in_pose: bool, in_info: bool) -> CategoryTier:
210
+ """Determine processing tier based on question and section."""
211
+ q_lower = question.lower().rstrip('?') # Remove trailing question mark if present
212
+
213
+ # Foundational
214
+ if q_lower in ('how many', 'rating'):
215
+ return CategoryTier.FOUNDATIONAL
216
+
217
+ # Character properties
218
+ if q_lower in ('body type', 'species', 'sex/gender'):
219
+ return CategoryTier.CHARACTER
220
+
221
+ # Appearance
222
+ if in_pose or q_lower in ('clothing', 'posture', 'general activity (if any)',
223
+ 'body decor', 'fur style', 'hair', 'breasts',
224
+ 'limbs', 'gaze', 'expression'):
225
+ return CategoryTier.APPEARANCE
226
+
227
+ # Scene
228
+ if q_lower in ('location', 'perspective'):
229
+ return CategoryTier.SCENE
230
+
231
+ # Meta/info
232
+ if in_info or q_lower in ('quality/medium', 'picture organization', 'style',
233
+ 'text and languages', 'information', 'requests',
234
+ 'image size'):
235
+ return CategoryTier.META
236
+
237
+ # Default
238
+ return CategoryTier.META
239
+
240
+
241
+ def determine_constraint(question: str, tags: List[str]) -> ConstraintType:
242
+ """Determine selection constraint based on category semantics."""
243
+ q_lower = question.lower().rstrip('?') # Remove trailing question mark if present
244
+
245
+ # Exactly one
246
+ if q_lower in ('rating', 'how many', 'body type'):
247
+ return ConstraintType.EXACTLY_ONE
248
+
249
+ # At most one per character (but can have multiple in multi-character scenes)
250
+ if q_lower in ('species', 'sex/gender'):
251
+ return ConstraintType.AT_MOST_ONE
252
+
253
+ # Multi-select
254
+ if q_lower in ('clothing', 'perspective', 'location', 'limbs',
255
+ 'expression', 'general activity (if any)', 'gaze',
256
+ 'posture', 'body decor', 'fur style', 'hair',
257
+ 'breasts', 'text and languages', 'quality/medium', 'style',
258
+ 'picture organization', 'information', 'requests',
259
+ 'image size'):
260
+ return ConstraintType.MULTI_SELECT
261
+
262
+ # Default optional
263
+ return ConstraintType.OPTIONAL
264
+
265
+
266
+ def add_dependencies(categories: Dict[str, TagCategory]) -> None:
267
+ """Add dependency and skip rules to categories."""
268
+
269
+ # If zero_pictured, skip character/appearance categories
270
+ character_appearance_categories = [
271
+ 'body_type', 'species', 'gender', 'clothing', 'posture',
272
+ 'activity', 'body_decor', 'fur_style', 'hair', 'breasts',
273
+ 'limbs', 'gaze', 'expression'
274
+ ]
275
+
276
+ for cat_name in character_appearance_categories:
277
+ if cat_name in categories:
278
+ categories[cat_name].skip_if['count'] = {'zero_pictured'}
279
+
280
+ # Character properties depend on having count first
281
+ for cat_name in ['body_type', 'species', 'gender']:
282
+ if cat_name in categories:
283
+ categories[cat_name].depends_on.append('count')
284
+
285
+ # Appearance depends on character properties
286
+ appearance_cats = ['clothing', 'posture', 'activity', 'body_decor',
287
+ 'fur_style', 'hair', 'breasts', 'limbs', 'gaze', 'expression']
288
+ for cat_name in appearance_cats:
289
+ if cat_name in categories:
290
+ categories[cat_name].depends_on.extend(['count', 'body_type'])
291
+
292
+
293
+ def should_skip_category(category: TagCategory, selected_tags: Set[str],
294
+ categories: Dict[str, TagCategory]) -> bool:
295
+ """
296
+ Check if a category should be skipped based on already-selected tags.
297
+
298
+ Args:
299
+ category: The category to check
300
+ selected_tags: Tags already selected
301
+ categories: All categories for dependency resolution
302
+
303
+ Returns:
304
+ True if category should be skipped
305
+ """
306
+ for dep_category_name, skip_values in category.skip_if.items():
307
+ dep_category = categories.get(dep_category_name)
308
+ if not dep_category:
309
+ continue
310
+
311
+ # Check if any of the selected tags from the dependency category
312
+ # are in the skip_values set
313
+ dep_tags = set(dep_category.tags)
314
+ selected_dep_tags = selected_tags & dep_tags
315
+
316
+ if selected_dep_tags & skip_values:
317
+ return True
318
+
319
+ return False
scripts/test_categorized_suggestions.py ADDED
@@ -0,0 +1,157 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ #!/usr/bin/env python3
2
+ """
3
+ Test script for categorized tag suggestions.
4
+ """
5
+ from pathlib import Path
6
+ import sys
7
+
8
+ # Add parent directory to path
9
+ sys.path.insert(0, str(Path(__file__).parent.parent))
10
+
11
+ from psq_rag.tagging.category_parser import parse_checklist
12
+ from psq_rag.tagging.categorized_suggestions import (
13
+ generate_categorized_suggestions,
14
+ format_suggestions_for_display,
15
+ load_categories,
16
+ )
17
+
18
+
19
+ def test_parse_checklist():
20
+ """Test parsing the checklist file."""
21
+ print("=" * 80)
22
+ print("Testing checklist parsing...")
23
+ print("=" * 80)
24
+
25
+ checklist_path = Path(__file__).parent.parent / "tagging_checklist.txt"
26
+
27
+ if not checklist_path.exists():
28
+ print(f"ERROR: Checklist not found at {checklist_path}")
29
+ return False
30
+
31
+ categories = parse_checklist(checklist_path)
32
+
33
+ print(f"\nParsed {len(categories)} categories:")
34
+ for cat_name, category in categories.items():
35
+ print(f"\n {cat_name}:")
36
+ print(f" Display: {category.display_name}")
37
+ print(f" Tier: {category.tier.name}")
38
+ print(f" Constraint: {category.constraint.value}")
39
+ print(f" Tags: {len(category.tags)} tags")
40
+ print(f" Sample tags: {category.tags[:5]}")
41
+ if category.depends_on:
42
+ print(f" Depends on: {category.depends_on}")
43
+ if category.skip_if:
44
+ print(f" Skip if: {category.skip_if}")
45
+
46
+ return True
47
+
48
+
49
+ def test_categorized_suggestions():
50
+ """Test generating categorized suggestions."""
51
+ print("\n" + "=" * 80)
52
+ print("Testing categorized suggestions...")
53
+ print("=" * 80)
54
+
55
+ checklist_path = Path(__file__).parent.parent / "tagging_checklist.txt"
56
+
57
+ # Example: User prompt resulted in these LLM-selected tags
58
+ selected_tags = [
59
+ "anthro",
60
+ "canine",
61
+ "male",
62
+ "solo",
63
+ "forest",
64
+ "standing",
65
+ ]
66
+
67
+ print(f"\nSelected tags: {', '.join(selected_tags)}")
68
+ print("\nGenerating categorized suggestions...")
69
+
70
+ try:
71
+ categorized = generate_categorized_suggestions(
72
+ selected_tags,
73
+ allow_nsfw_tags=False,
74
+ top_n_per_category=5,
75
+ top_n_other=10,
76
+ checklist_path=checklist_path,
77
+ )
78
+
79
+ print("\nFormatted output:")
80
+ print(format_suggestions_for_display(categorized, show_scores=True))
81
+
82
+ return True
83
+
84
+ except Exception as e:
85
+ print(f"ERROR: {e}")
86
+ import traceback
87
+ traceback.print_exc()
88
+ return False
89
+
90
+
91
+ def test_zero_pictured():
92
+ """Test that character categories are skipped for zero_pictured."""
93
+ print("\n" + "=" * 80)
94
+ print("Testing zero_pictured dependency logic...")
95
+ print("=" * 80)
96
+
97
+ checklist_path = Path(__file__).parent.parent / "tagging_checklist.txt"
98
+
99
+ selected_tags = [
100
+ "zero_pictured",
101
+ "forest",
102
+ "outside",
103
+ ]
104
+
105
+ print(f"\nSelected tags: {', '.join(selected_tags)}")
106
+ print("(Should skip character/appearance categories)")
107
+
108
+ try:
109
+ categorized = generate_categorized_suggestions(
110
+ selected_tags,
111
+ allow_nsfw_tags=False,
112
+ top_n_per_category=5,
113
+ top_n_other=10,
114
+ checklist_path=checklist_path,
115
+ )
116
+
117
+ print("\nCategories with suggestions:")
118
+ for cat_name, cat_sugg in categorized.by_category.items():
119
+ if cat_sugg.suggestions or cat_sugg.already_selected:
120
+ print(f" {cat_name}: {len(cat_sugg.suggestions)} suggestions")
121
+
122
+ # Check that character categories are empty
123
+ character_cats = ['body_type', 'species', 'gender', 'clothing']
124
+ all_skipped = True
125
+ for cat in character_cats:
126
+ if cat in categorized.by_category:
127
+ if categorized.by_category[cat].suggestions:
128
+ print(f" WARNING: {cat} should have been skipped!")
129
+ all_skipped = False
130
+
131
+ if all_skipped:
132
+ print("\n✓ All character categories correctly skipped!")
133
+
134
+ return all_skipped
135
+
136
+ except Exception as e:
137
+ print(f"ERROR: {e}")
138
+ import traceback
139
+ traceback.print_exc()
140
+ return False
141
+
142
+
143
+ if __name__ == "__main__":
144
+ success = True
145
+
146
+ success &= test_parse_checklist()
147
+ success &= test_categorized_suggestions()
148
+ success &= test_zero_pictured()
149
+
150
+ print("\n" + "=" * 80)
151
+ if success:
152
+ print("✓ All tests passed!")
153
+ else:
154
+ print("✗ Some tests failed")
155
+ print("=" * 80)
156
+
157
+ sys.exit(0 if success else 1)
scripts/test_parser_only.py ADDED
@@ -0,0 +1,166 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ #!/usr/bin/env python3
2
+ """
3
+ Test script for category parser only (no TF-IDF dependencies).
4
+ """
5
+ from pathlib import Path
6
+ import sys
7
+
8
+ # Add parent directory to path
9
+ sys.path.insert(0, str(Path(__file__).parent.parent))
10
+
11
+ from psq_rag.tagging.category_parser import parse_checklist, should_skip_category
12
+
13
+
14
+ def test_parse_checklist():
15
+ """Test parsing the checklist file."""
16
+ print("=" * 80)
17
+ print("Testing checklist parsing...")
18
+ print("=" * 80)
19
+
20
+ checklist_path = Path(__file__).parent.parent / "tagging_checklist.txt"
21
+
22
+ if not checklist_path.exists():
23
+ print(f"ERROR: Checklist not found at {checklist_path}")
24
+ return False
25
+
26
+ categories = parse_checklist(checklist_path)
27
+
28
+ print(f"\nParsed {len(categories)} categories:\n")
29
+
30
+ # Group by tier
31
+ by_tier = {}
32
+ for cat_name, category in categories.items():
33
+ tier = category.tier.name
34
+ if tier not in by_tier:
35
+ by_tier[tier] = []
36
+ by_tier[tier].append((cat_name, category))
37
+
38
+ for tier_name in sorted(by_tier.keys()):
39
+ print(f"\n{'='*60}")
40
+ print(f"{tier_name} Tier")
41
+ print('='*60)
42
+
43
+ for cat_name, category in by_tier[tier_name]:
44
+ print(f"\n Category: {cat_name}")
45
+ print(f" Display: {category.display_name}")
46
+ print(f" Constraint: {category.constraint.value}")
47
+ print(f" Tags ({len(category.tags)}): {', '.join(category.tags[:8])}")
48
+ if len(category.tags) > 8:
49
+ print(f" ... and {len(category.tags) - 8} more")
50
+ if category.depends_on:
51
+ print(f" Depends on: {category.depends_on}")
52
+ if category.skip_if:
53
+ print(f" Skip if: {category.skip_if}")
54
+
55
+ return True
56
+
57
+
58
+ def test_skip_logic():
59
+ """Test the skip logic for zero_pictured."""
60
+ print("\n" + "=" * 80)
61
+ print("Testing skip logic...")
62
+ print("=" * 80)
63
+
64
+ checklist_path = Path(__file__).parent.parent / "tagging_checklist.txt"
65
+ categories = parse_checklist(checklist_path)
66
+
67
+ # Test 1: zero_pictured should skip character categories
68
+ selected_tags = {'zero_pictured', 'forest', 'outside'}
69
+
70
+ character_cats = ['body_type', 'species', 'gender', 'clothing']
71
+ print(f"\nTest 1: Selected tags = {selected_tags}")
72
+ print("Should skip character/appearance categories:")
73
+
74
+ all_correct = True
75
+ for cat_name in character_cats:
76
+ if cat_name in categories:
77
+ should_skip = should_skip_category(
78
+ categories[cat_name],
79
+ selected_tags,
80
+ categories
81
+ )
82
+ status = "✓ SKIP" if should_skip else "✗ KEEP"
83
+ print(f" {cat_name}: {status}")
84
+ if not should_skip:
85
+ all_correct = False
86
+
87
+ # Test 2: solo should NOT skip character categories
88
+ selected_tags = {'solo', 'anthro', 'male'}
89
+ print(f"\nTest 2: Selected tags = {selected_tags}")
90
+ print("Should NOT skip character categories:")
91
+
92
+ for cat_name in character_cats:
93
+ if cat_name in categories:
94
+ should_skip = should_skip_category(
95
+ categories[cat_name],
96
+ selected_tags,
97
+ categories
98
+ )
99
+ status = "✓ KEEP" if not should_skip else "✗ SKIP"
100
+ print(f" {cat_name}: {status}")
101
+ if should_skip:
102
+ all_correct = False
103
+
104
+ return all_correct
105
+
106
+
107
+ def test_tag_extraction():
108
+ """Test that tags are extracted correctly from descriptions."""
109
+ print("\n" + "=" * 80)
110
+ print("Testing tag extraction...")
111
+ print("=" * 80)
112
+
113
+ checklist_path = Path(__file__).parent.parent / "tagging_checklist.txt"
114
+ categories = parse_checklist(checklist_path)
115
+
116
+ # Check specific categories we care about
117
+ test_cases = {
118
+ 'count': ['solo', 'duo', 'trio', 'group', 'zero_pictured'],
119
+ 'rating': ['safe', 'questionable', 'explicit'], # These might not parse correctly
120
+ 'body_type': ['anthro', 'feral', 'humanoid', 'taur'],
121
+ 'species': ['human', 'canine', 'feline', 'equine'],
122
+ 'gender': ['male', 'female', 'intersex', 'ambiguous_gender'],
123
+ 'clothing': ['fully_clothed', 'partially_clothed', 'nude'],
124
+ 'location': ['inside', 'outside', 'bedroom', 'kitchen', 'forest'],
125
+ }
126
+
127
+ all_correct = True
128
+ for cat_name, expected_tags in test_cases.items():
129
+ if cat_name not in categories:
130
+ print(f"\n✗ Category '{cat_name}' not found!")
131
+ all_correct = False
132
+ continue
133
+
134
+ category = categories[cat_name]
135
+ found_tags = set(category.tags)
136
+ expected_set = set(expected_tags)
137
+
138
+ missing = expected_set - found_tags
139
+ extra = found_tags - expected_set
140
+
141
+ if missing:
142
+ print(f"\n{cat_name}:")
143
+ print(f" ✗ Missing expected tags: {missing}")
144
+ all_correct = False
145
+ else:
146
+ print(f"\n✓ {cat_name}: All expected tags found")
147
+ print(f" Tags: {', '.join(sorted(category.tags))}")
148
+
149
+ return all_correct
150
+
151
+
152
+ if __name__ == "__main__":
153
+ success = True
154
+
155
+ success &= test_parse_checklist()
156
+ success &= test_tag_extraction()
157
+ success &= test_skip_logic()
158
+
159
+ print("\n" + "=" * 80)
160
+ if success:
161
+ print("✓ All tests passed!")
162
+ else:
163
+ print("✗ Some tests failed")
164
+ print("=" * 80)
165
+
166
+ sys.exit(0 if success else 1)
tagging_checklist.txt ADDED
@@ -0,0 +1,76 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ e621:tagging checklist (locked)
2
+
3
+ [Back: e621:index]
4
+
5
+ This is an informal and unofficial supplement to the tagging rules and guidelines, meant to encourage better and more complete tagging.
6
+
7
+ Make sure you're also familiar with our Tag What You See policy before editing tags: tag_what_you_see for the policy itself, and e621:Tag What You See (Explained) for a more in-depth explanation why we use TWYS.
8
+
9
+ Each entry below poses a general question about a post, with some example tags that answer it. A good post will probably have most of these answered (but not necessarily all).
10
+ Basics
11
+
12
+ Tags that all posts should have, to maintain minimal searchability.
13
+
14
+ Artist(s)? Use their best known alias. If a picture has more than one artist, tag them all, along with collaboration. If you're not sure who the artist is, tag unknown_artist. If the artist wishes to remain anonymous, use anonymous_artist instead.
15
+ Rating?
16
+ Explicit for fully or partially exposed genitalia (penis, pussy, cloaca, sheath, balls, or anus), various sex acts even if no genitalia are visible, high amounts of violence/gore, sexual fluids such as cum or pussy_juice, and extreme sexual fetishes such as scat, watersports, or BDSM.
17
+ Safe for anything that can be viewed in public without much uproar: no genitals, no sexual overtones or poses, no realistic violence, or any questionable activity.
18
+ Questionable for everything in between, such as topless females and suggestive poses.
19
+ For more help on ratings please see e621: Ratings
20
+ Copyright? The original series or company a character or game is owned by.
21
+ Character? Tag the character's best known name. If not that, their full name. For more, see howto:tag_characters.
22
+ Body type? anthro, feral, humanoid, taur, anthrofied (pokemorph, digimorph), ponified, feralized
23
+ Species? human, canine, feline, bovine, cervine, equine, lagomorph, rodent, avian, insect, marine (cetacean, shark), scalie (click for detailed lists)
24
+ Sex/gender? male, female, intersex (herm, maleherm, gynomorph, andromorph), ambiguous_gender
25
+ See How To: Tag Genders for a detailed guide
26
+ How many? solo, duo, trio, group, zero_pictured
27
+ Clothing? fully_clothed, partially_clothed, skimpy, nude, bottomless, topless, underwear, open_shirt
28
+ Location? inside, outside, bedroom, kitchen, forest
29
+ Perspective? front_view, rear_view, side_view, three-quarter_view, low-angle_view, high-angle_view, worm's-eye_view, bird's-eye_view, first_person_view
30
+
31
+ Sexually explicit
32
+
33
+ Male bits? penis, balls, sheath, knot, erection, half-erect, flaccid, humanoid_penis, equine_penis, tapering_penis, veiny_penis, uncut, circumcised
34
+ Female bits? pussy, clitoris, plump_labia, equine_pussy, canine_pussy
35
+ Other? butt, anus, puffy_anus, gaping_anus, urethra, genital_slit
36
+ Sex act? sex (male/female, female/female, male/male, bisexual), masturbation, handjob, footjob, fellatio, cunnilingus, vaginal_penetration, anal_penetration, threesome, foursome, orgy, gangbang, frottage, tribadism, orgasm, cum_inside
37
+ Position? Common ones: missionary_position, cowgirl_position, reverse_cowgirl_position, from_behind, 69_position, stand_and_carry_position.
38
+ See also: tag group:sex positions
39
+ Sexual themes? bondage, domination, rape, rough_sex, happy_sex, presenting, internal, impregnation, bestiality, interspecies, public, exhibitionism
40
+ Fluids? cum, cumshot, precum, pussy_juice, pussy_ejaculation, saliva
41
+ Toys? dildo, vibrator, buttplug, egg_vibrator, strapon, feeldoe
42
+
43
+ Pose / Activity / Appearance
44
+
45
+ General activity (if any)? walking, running, fighting, sleeping, dancing, eating, kissing, licking
46
+ Posture? standing, bent_over, sitting, crouching, kneeling, all_fours, on_front, on_side, on_back, ass_up (see tag group:pose for full list)
47
+ Body decor? glasses, ring, necklace, bracelet, anklet, tattoo, piercing, collar, hat
48
+ Fur style? mane, chest_tuft, pubes
49
+ Hair? hair, long hair, short hair
50
+ Breasts? breasts (small_breasts, big_breasts, huge_breasts), nipples, under_boob, side_boob, teats
51
+ Limbs? crossed_arms, raised_arms, arms_behind_head, spread_legs, crossed_legs, raised_leg, legs_up, raised_tail, tailwag
52
+ Gaze? looking_at_viewer, looking_back, eye_contact, eyes_closed
53
+ Expression? blush, wink, smile, grin, tongue_out, naughty_face, embarrassed, happy, sad
54
+
55
+ Information and Requests
56
+
57
+ Quality/medium? sketch, line_art, monochrome, shaded, pencil_(artwork), watercolor, 3D, digital_media_(artwork)
58
+ Picture organization? comic, multiple_scenes, sequence, close-up, portrait, pinup, solo_focus, wallpaper
59
+ Style? toony, detailed, realistic
60
+ Text and languages? english_text, japanese_text, spanish_text, runes, dialogue, speech_bubble, symbol
61
+ Information? translated, partially_translated, unknown_artist_signature, not_furry, bigger version at the source
62
+ Requests? translation_request, source_request, tagme
63
+ Image size? low_res, hi_res, absurd_res, superabsurd_res
64
+ Year of creation? 2016, 2015, and so on
65
+
66
+ Heavily vetted tags.
67
+
68
+ Tags that can be found on our global blacklist, and heavily vetted tags MUST be added upon upload.
69
+
70
+ young, gore, scat, watersports, diaper, my little pony, vore, not furry, rape, hyper, feral, nazi, politics, zoophile iconography.
71
+ Everything pedophilia
72
+
73
+ Do NOT tag
74
+
75
+ Subjective tags that express opinions. Common examples include beautiful, sexy, hot, good, crappy and most other adjectives. Subjective themes can be collected into a set instead. (See https://e621.net/help/sets )
76
+ Generic tags such as legs, eyes, big, image and organism.