|
|
--- |
|
|
language: en |
|
|
license: apache-2.0 |
|
|
tags: |
|
|
- text2text-generation |
|
|
- flan-t5 |
|
|
- danbooru |
|
|
- tag-completion |
|
|
- anime |
|
|
datasets: |
|
|
- danbooru-tag-implications |
|
|
base_model: google/flan-t5-base |
|
|
--- |
|
|
|
|
|
# Danbooru Tag Implications Model |
|
|
|
|
|
A FLAN-T5 Base model fine-tuned to predict Danbooru tag implications. Given a tag, the model outputs all tags that it implies according to Danbooru's tag implication system. |
|
|
|
|
|
## Model Description |
|
|
|
|
|
This model learns the structured relationships between Danbooru tags, specifically the "implication" relationships where one tag automatically implies another. For example: |
|
|
- `bikini` implies `swimsuit` |
|
|
- `cat_ears` implies `animal_ears` |
|
|
- `striped_panties` implies both `panties` and `striped_clothes` |
|
|
|
|
|
**Base Model:** `google/flan-t5-base` (248M parameters) |
|
|
|
|
|
**Training Data:** 32,331 tag implication pairs from Danbooru |
|
|
|
|
|
**Task Format:** `implications: <tag>` `<implied_tag1>, <implied_tag2>, ...` |
|
|
|
|
|
## Use Cases |
|
|
|
|
|
1. **Tag completion in image generation workflows** - Automatically add implied tags to prompts |
|
|
2. **Tag validation** - Ensure tag sets include all necessary implied tags |
|
|
3. **Tag understanding** - Learn the hierarchical relationships in Danbooru's tagging system |
|
|
|
|
|
## Training Details |
|
|
|
|
|
### Dataset |
|
|
|
|
|
- **Source:** Danbooru tag implications database (public data) |
|
|
- **Size:** 32,331 training examples |
|
|
- **Filtering:** Removed series-specific tags (e.g., tags with parentheses) from generic tag implications |
|
|
- **Split:** 99% train, 1% eval |
|
|
|
|
|
### Training Configuration |
|
|
|
|
|
```python |
|
|
Seq2SeqTrainingArguments( |
|
|
per_device_train_batch_size=16, |
|
|
per_device_eval_batch_size=16, |
|
|
learning_rate=5e-5, |
|
|
num_train_epochs=3, |
|
|
bf16=True, |
|
|
predict_with_generate=True, |
|
|
generation_max_length=128, |
|
|
generation_num_beams=4, |
|
|
) |
|
|
``` |
|
|
|
|
|
### Training Results |
|
|
|
|
|
- **Final eval loss:** ~0.027 |
|
|
- **Training time:** ~36 minutes on single GPU |
|
|
- **Inference speed:** ~200ms per tag (GPU) |
|
|
|
|
|
## Usage |
|
|
|
|
|
### Basic Usage |
|
|
|
|
|
```python |
|
|
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM |
|
|
|
|
|
model_name = "Elldreth/danbooru-tag-implications-flan-t5" |
|
|
tokenizer = AutoTokenizer.from_pretrained(model_name) |
|
|
model = AutoModelForSeq2SeqLM.from_pretrained(model_name) |
|
|
|
|
|
def get_implications(tag): |
|
|
input_text = f"implications: {tag}" |
|
|
inputs = tokenizer(input_text, return_tensors="pt") |
|
|
outputs = model.generate(**inputs, max_new_tokens=128, num_beams=4) |
|
|
return tokenizer.decode(outputs[0], skip_special_tokens=True) |
|
|
|
|
|
# Examples |
|
|
print(get_implications("bikini")) # Output: swimsuit |
|
|
print(get_implications("cat_ears")) # Output: animal_ears |
|
|
print(get_implications("striped_panties")) # Output: panties, striped_clothes |
|
|
``` |
|
|
|
|
|
### Expanding a Full Tag Set |
|
|
|
|
|
```python |
|
|
def expand_tags(tags_string): |
|
|
"""Expand all tags in a comma-separated string""" |
|
|
tags = [t.strip() for t in tags_string.split(',')] |
|
|
expanded = set(tags) |
|
|
|
|
|
for tag in tags: |
|
|
implications = get_implications(tag) |
|
|
if implications: |
|
|
expanded.update([t.strip() for t in implications.split(',')]) |
|
|
|
|
|
return ', '.join(sorted(expanded)) |
|
|
|
|
|
# Example |
|
|
input_tags = "1girl, bikini, cat_ears" |
|
|
expanded_tags = expand_tags(input_tags) |
|
|
print(expanded_tags) |
|
|
# Output: 1girl, animal_ears, bikini, cat_ears, swimsuit |
|
|
``` |
|
|
|
|
|
### Important: Guard Against Unknown Tags |
|
|
|
|
|
The model was trained on specific Danbooru tags. For production use, you should only query tags that exist in the training data to avoid hallucinations: |
|
|
|
|
|
```python |
|
|
import json |
|
|
|
|
|
# Load the training dataset to get valid tags |
|
|
tags_with_implications = set() |
|
|
with open('tag_implications_dataset.jsonl', 'r') as f: |
|
|
for line in f: |
|
|
data = json.loads(line) |
|
|
tag = data['input'].replace('implications: ', '') |
|
|
tags_with_implications.add(tag) |
|
|
|
|
|
def get_implications_safe(tag): |
|
|
if tag not in tags_with_implications: |
|
|
return "" # Tag has no known implications |
|
|
return get_implications(tag) |
|
|
``` |
|
|
|
|
|
## Examples |
|
|
|
|
|
### Clothing Tags |
|
|
|
|
|
| Input | Output | |
|
|
|-------|--------| |
|
|
| `bikini` | `swimsuit` | |
|
|
| `school_swimsuit` | `swimsuit` | |
|
|
| `sleeveless_dress` | `dress, sleeveless` | |
|
|
| `striped_panties` | `panties, striped_clothes` | |
|
|
|
|
|
### Animal Features |
|
|
|
|
|
| Input | Output | |
|
|
|-------|--------| |
|
|
| `cat_ears` | `animal_ears` | |
|
|
| `dog_ears` | `animal_ears` | |
|
|
| `fox_tail` | `tail` | |
|
|
|
|
|
### Complex Implications |
|
|
|
|
|
| Input | Output | |
|
|
|-------|--------| |
|
|
| `striped_bikini` | `bikini, striped_clothes, swimsuit` | |
|
|
| `black_dress` | `dress` | |
|
|
|
|
|
## Limitations |
|
|
|
|
|
1. **Only works with Danbooru tags** - The model is trained on specific Danbooru tag names (underscore-separated) |
|
|
2. **No natural language** - Input must be exact tag names, not descriptions |
|
|
3. **May hallucinate on unknown tags** - Always use the guard mechanism for production |
|
|
4. **Generic tags only** - Series-specific tags (with parentheses) were filtered from generic tag implications |
|
|
5. **English-centric** - Primarily English tag names |
|
|
|
|
|
## Training Data Filtering |
|
|
|
|
|
To prevent generic tags from suggesting series-specific tags, we applied this rule: |
|
|
- If an input tag has **no parentheses** output tags with parentheses are filtered out |
|
|
- Example: `bikini` won't suggest `swimsuit_(series_name)` |
|
|
- Series-specific tags can still imply other series-specific tags |
|
|
|
|
|
## Hardware Requirements |
|
|
|
|
|
- **Inference:** ~1.5GB VRAM (GPU) or 2GB RAM (CPU) |
|
|
- **Model size:** 945 MB on disk |
|
|
- **Recommended:** GPU with CUDA for best performance |
|
|
|
|
|
## Citation |
|
|
|
|
|
If you use this model, please cite the Danbooru tag implications data: |
|
|
|
|
|
``` |
|
|
Danbooru Tag Implications Database |
|
|
https://danbooru.donmai.us/ |
|
|
``` |
|
|
|
|
|
## License |
|
|
|
|
|
Apache 2.0 - Same as the base FLAN-T5 model |
|
|
|
|
|
## Model Card Authors |
|
|
|
|
|
Created as part of the Danbooru Tag Expander project for ComfyUI. |
|
|
|