Elldreth's picture
Upload 9 files
4df761b verified
---
language: en
license: apache-2.0
tags:
- text2text-generation
- flan-t5
- danbooru
- tag-completion
- anime
datasets:
- danbooru-tag-implications
base_model: google/flan-t5-base
---
# Danbooru Tag Implications Model
A FLAN-T5 Base model fine-tuned to predict Danbooru tag implications. Given a tag, the model outputs all tags that it implies according to Danbooru's tag implication system.
## Model Description
This model learns the structured relationships between Danbooru tags, specifically the "implication" relationships where one tag automatically implies another. For example:
- `bikini` implies `swimsuit`
- `cat_ears` implies `animal_ears`
- `striped_panties` implies both `panties` and `striped_clothes`
**Base Model:** `google/flan-t5-base` (248M parameters)
**Training Data:** 32,331 tag implication pairs from Danbooru
**Task Format:** `implications: <tag>` `<implied_tag1>, <implied_tag2>, ...`
## Use Cases
1. **Tag completion in image generation workflows** - Automatically add implied tags to prompts
2. **Tag validation** - Ensure tag sets include all necessary implied tags
3. **Tag understanding** - Learn the hierarchical relationships in Danbooru's tagging system
## Training Details
### Dataset
- **Source:** Danbooru tag implications database (public data)
- **Size:** 32,331 training examples
- **Filtering:** Removed series-specific tags (e.g., tags with parentheses) from generic tag implications
- **Split:** 99% train, 1% eval
### Training Configuration
```python
Seq2SeqTrainingArguments(
per_device_train_batch_size=16,
per_device_eval_batch_size=16,
learning_rate=5e-5,
num_train_epochs=3,
bf16=True,
predict_with_generate=True,
generation_max_length=128,
generation_num_beams=4,
)
```
### Training Results
- **Final eval loss:** ~0.027
- **Training time:** ~36 minutes on single GPU
- **Inference speed:** ~200ms per tag (GPU)
## Usage
### Basic Usage
```python
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
model_name = "Elldreth/danbooru-tag-implications-flan-t5"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSeq2SeqLM.from_pretrained(model_name)
def get_implications(tag):
input_text = f"implications: {tag}"
inputs = tokenizer(input_text, return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=128, num_beams=4)
return tokenizer.decode(outputs[0], skip_special_tokens=True)
# Examples
print(get_implications("bikini")) # Output: swimsuit
print(get_implications("cat_ears")) # Output: animal_ears
print(get_implications("striped_panties")) # Output: panties, striped_clothes
```
### Expanding a Full Tag Set
```python
def expand_tags(tags_string):
"""Expand all tags in a comma-separated string"""
tags = [t.strip() for t in tags_string.split(',')]
expanded = set(tags)
for tag in tags:
implications = get_implications(tag)
if implications:
expanded.update([t.strip() for t in implications.split(',')])
return ', '.join(sorted(expanded))
# Example
input_tags = "1girl, bikini, cat_ears"
expanded_tags = expand_tags(input_tags)
print(expanded_tags)
# Output: 1girl, animal_ears, bikini, cat_ears, swimsuit
```
### Important: Guard Against Unknown Tags
The model was trained on specific Danbooru tags. For production use, you should only query tags that exist in the training data to avoid hallucinations:
```python
import json
# Load the training dataset to get valid tags
tags_with_implications = set()
with open('tag_implications_dataset.jsonl', 'r') as f:
for line in f:
data = json.loads(line)
tag = data['input'].replace('implications: ', '')
tags_with_implications.add(tag)
def get_implications_safe(tag):
if tag not in tags_with_implications:
return "" # Tag has no known implications
return get_implications(tag)
```
## Examples
### Clothing Tags
| Input | Output |
|-------|--------|
| `bikini` | `swimsuit` |
| `school_swimsuit` | `swimsuit` |
| `sleeveless_dress` | `dress, sleeveless` |
| `striped_panties` | `panties, striped_clothes` |
### Animal Features
| Input | Output |
|-------|--------|
| `cat_ears` | `animal_ears` |
| `dog_ears` | `animal_ears` |
| `fox_tail` | `tail` |
### Complex Implications
| Input | Output |
|-------|--------|
| `striped_bikini` | `bikini, striped_clothes, swimsuit` |
| `black_dress` | `dress` |
## Limitations
1. **Only works with Danbooru tags** - The model is trained on specific Danbooru tag names (underscore-separated)
2. **No natural language** - Input must be exact tag names, not descriptions
3. **May hallucinate on unknown tags** - Always use the guard mechanism for production
4. **Generic tags only** - Series-specific tags (with parentheses) were filtered from generic tag implications
5. **English-centric** - Primarily English tag names
## Training Data Filtering
To prevent generic tags from suggesting series-specific tags, we applied this rule:
- If an input tag has **no parentheses** output tags with parentheses are filtered out
- Example: `bikini` won't suggest `swimsuit_(series_name)`
- Series-specific tags can still imply other series-specific tags
## Hardware Requirements
- **Inference:** ~1.5GB VRAM (GPU) or 2GB RAM (CPU)
- **Model size:** 945 MB on disk
- **Recommended:** GPU with CUDA for best performance
## Citation
If you use this model, please cite the Danbooru tag implications data:
```
Danbooru Tag Implications Database
https://danbooru.donmai.us/
```
## License
Apache 2.0 - Same as the base FLAN-T5 model
## Model Card Authors
Created as part of the Danbooru Tag Expander project for ComfyUI.