danbooru-tag-implications-flan-t5 / README.md

Upload 9 files

4df761b verified 2 months ago

5.71 kB

	---
	language: en
	license: apache-2.0
	tags:
	- text2text-generation
	- flan-t5
	- danbooru
	- tag-completion
	- anime
	datasets:
	- danbooru-tag-implications
	base_model: google/flan-t5-base
	---

	# Danbooru Tag Implications Model

	A FLAN-T5 Base model fine-tuned to predict Danbooru tag implications. Given a tag, the model outputs all tags that it implies according to Danbooru's tag implication system.

	## Model Description

	This model learns the structured relationships between Danbooru tags, specifically the "implication" relationships where one tag automatically implies another. For example:
	- `bikini` implies `swimsuit`
	- `cat_ears` implies `animal_ears`
	- `striped_panties` implies both `panties` and `striped_clothes`

	Base Model: `google/flan-t5-base` (248M parameters)

	Training Data: 32,331 tag implication pairs from Danbooru

	Task Format: `implications: <tag>` `<implied_tag1>, <implied_tag2>, ...`

	## Use Cases

	1. Tag completion in image generation workflows - Automatically add implied tags to prompts
	2. Tag validation - Ensure tag sets include all necessary implied tags
	3. Tag understanding - Learn the hierarchical relationships in Danbooru's tagging system

	## Training Details

	### Dataset

	- Source: Danbooru tag implications database (public data)
	- Size: 32,331 training examples
	- Filtering: Removed series-specific tags (e.g., tags with parentheses) from generic tag implications
	- Split: 99% train, 1% eval

	### Training Configuration

	```python
	Seq2SeqTrainingArguments(
	per_device_train_batch_size=16,
	per_device_eval_batch_size=16,
	learning_rate=5e-5,
	num_train_epochs=3,
	bf16=True,
	predict_with_generate=True,
	generation_max_length=128,
	generation_num_beams=4,
	)
	```

	### Training Results

	- Final eval loss: ~0.027
	- Training time: ~36 minutes on single GPU
	- Inference speed: ~200ms per tag (GPU)

	## Usage

	### Basic Usage

	```python
	from transformers import AutoTokenizer, AutoModelForSeq2SeqLM

	model_name = "Elldreth/danbooru-tag-implications-flan-t5"
	tokenizer = AutoTokenizer.from_pretrained(model_name)
	model = AutoModelForSeq2SeqLM.from_pretrained(model_name)

	def get_implications(tag):
	input_text = f"implications: {tag}"
	inputs = tokenizer(input_text, return_tensors="pt")
	outputs = model.generate(**inputs, max_new_tokens=128, num_beams=4)
	return tokenizer.decode(outputs[0], skip_special_tokens=True)

	# Examples
	print(get_implications("bikini")) # Output: swimsuit
	print(get_implications("cat_ears")) # Output: animal_ears
	print(get_implications("striped_panties")) # Output: panties, striped_clothes
	```

	### Expanding a Full Tag Set

	```python
	def expand_tags(tags_string):
	"""Expand all tags in a comma-separated string"""
	tags = [t.strip() for t in tags_string.split(',')]
	expanded = set(tags)

	for tag in tags:
	implications = get_implications(tag)
	if implications:
	expanded.update([t.strip() for t in implications.split(',')])

	return ', '.join(sorted(expanded))

	# Example
	input_tags = "1girl, bikini, cat_ears"
	expanded_tags = expand_tags(input_tags)
	print(expanded_tags)
	# Output: 1girl, animal_ears, bikini, cat_ears, swimsuit
	```

	### Important: Guard Against Unknown Tags

	The model was trained on specific Danbooru tags. For production use, you should only query tags that exist in the training data to avoid hallucinations:

	```python
	import json

	# Load the training dataset to get valid tags
	tags_with_implications = set()
	with open('tag_implications_dataset.jsonl', 'r') as f:
	for line in f:
	data = json.loads(line)
	tag = data['input'].replace('implications: ', '')
	tags_with_implications.add(tag)

	def get_implications_safe(tag):
	if tag not in tags_with_implications:
	return "" # Tag has no known implications
	return get_implications(tag)
	```

	## Examples

	### Clothing Tags

	\| Input \| Output \|
	\|-------\|--------\|
	\| `bikini` \| `swimsuit` \|
	\| `school_swimsuit` \| `swimsuit` \|
	\| `sleeveless_dress` \| `dress, sleeveless` \|
	\| `striped_panties` \| `panties, striped_clothes` \|

	### Animal Features

	\| Input \| Output \|
	\|-------\|--------\|
	\| `cat_ears` \| `animal_ears` \|
	\| `dog_ears` \| `animal_ears` \|
	\| `fox_tail` \| `tail` \|

	### Complex Implications

	\| Input \| Output \|
	\|-------\|--------\|
	\| `striped_bikini` \| `bikini, striped_clothes, swimsuit` \|
	\| `black_dress` \| `dress` \|

	## Limitations

	1. Only works with Danbooru tags - The model is trained on specific Danbooru tag names (underscore-separated)
	2. No natural language - Input must be exact tag names, not descriptions
	3. May hallucinate on unknown tags - Always use the guard mechanism for production
	4. Generic tags only - Series-specific tags (with parentheses) were filtered from generic tag implications
	5. English-centric - Primarily English tag names

	## Training Data Filtering

	To prevent generic tags from suggesting series-specific tags, we applied this rule:
	- If an input tag has no parentheses output tags with parentheses are filtered out
	- Example: `bikini` won't suggest `swimsuit_(series_name)`
	- Series-specific tags can still imply other series-specific tags

	## Hardware Requirements

	- Inference: ~1.5GB VRAM (GPU) or 2GB RAM (CPU)
	- Model size: 945 MB on disk
	- Recommended: GPU with CUDA for best performance

	## Citation

	If you use this model, please cite the Danbooru tag implications data:

	```
	Danbooru Tag Implications Database
	https://danbooru.donmai.us/
	```

	## License

	Apache 2.0 - Same as the base FLAN-T5 model

	## Model Card Authors

	Created as part of the Danbooru Tag Expander project for ComfyUI.