Commit History

Fix UI tag-button desync and add regression smoke coverage
827e786

Food Desert commited on

Consolidate probe configs and eval artifacts on main
6e50f4d

Food Desert commited on

Refine tag toggle UI ordering/colors and add category assignment analysis artifacts
33fc1b0

Food Desert commited on

Switch Stage3 to explicit-only no-why selection, drop bear probe, and set k=1 defaults
06a3c46

Food Desert commited on

Consolidate pending pipeline, structural, and analysis updates
30bedf0

Food Desert commited on

Remove dead retrieval/display helpers and simplify debug paths
5188881

Food Desert commited on

Simplify Stage3 chunking to interleave-only and add eval diagnostics
3c18372

Food Desert commited on

Add eval audit tools, caption-evident set, and logging
73f56cf

Food Desert commited on

Record non-fatal pipeline issues in eval JSONL outputs
41dd600

FoodDesert commited on

Fix eval_categorized.py to work with eval_pipeline.py output
435bff3

Claude commited on

Add ranking metrics infrastructure to eval pipeline
0ed7e94

Claude commited on

Add per-category evaluation metrics script
7261188

Claude commited on

Add tag categorization pipeline for e621 checklist
50a9851

Claude commited on

Fix Windows encoding issues in diagnostic script
a69f12b

Claude commited on

Add diagnostic script for structural clothing inference
998779f

Claude commited on

Redesign structural inference as group-based system with wiki data
684cf99

Claude commited on

Add per-tag evidence tracking and wiki extraction script
019823a

Claude commited on

Add compact eval analysis script for new output format
16c5aa4

Claude commited on

Add structural tag inference (Stage 3s) and compact eval output
a16e111

Claude commited on

Default min_why to strong_implied; add retrieval gap analysis script
4968635

Claude commited on

Normalize GT annotations: expand implications, exclude non-evaluable tags
14e5c38

Claude commited on

Add tag implication expansion (fox→canine→canid→mammal)
eeada1d

Claude commited on

Fix min_why not passed to workers in parallel eval mode
054dd0f

Claude commited on

Add --min-why threshold to filter Stage 3 selections by confidence level
09a248d

Claude commited on

Add diagnostic eval metrics, why-distribution tracking, and generic character filter
349b999

Claude commited on

Add parallel processing to eval pipeline with ThreadPoolExecutor
12dfa28

Claude commited on

Add independent character tag metrics to eval pipeline
f1b4da2

Claude commited on

Improve eval harness: shuffle samples, always write results
133d74c

Claude commited on

Add end-to-end evaluation harness for pipeline metrics
6909d06

Claude commited on

Expand alias filter tests with real CSV data and pipeline tests
ea9e11c

Claude commited on

Add alias-based character tag filtering for Stage 3
c6be992

Food Desert commited on