Alogotron
/

Milady-Avatar-Adapter

Model card Files Files and versions

Milady-Avatar-Adapter / README.md

Alogotron's picture

Upload README.md with huggingface_hub

c26077a verified about 21 hours ago

|

history blame contribute delete

1.55 kB

	# Milady Avatar Adapter (SDXL)

	Neural adapter that maps Qwen3-4B language model activations to SDXL prompt embedding space,
	enabling real-time emotional avatar generation in the Milady art style.

	## Architecture

	### SDXL Adapter (NEW - Higher Quality)
	- Input: Qwen3-4B hidden states from layers [9, 18, 27] → 7680 dims
	- Layer Weighting: Learned weighted combination → 2560 dims
	- Cross-Attention Decoder: 3-layer transformer decoder with 8 heads
	- Output: SDXL prompt embeddings [77, 2048] + pooled embeddings [1280]
	- Parameters: 5.28M
	- Training: 500 epochs on 200 emotion-labeled samples, MSE loss
	- Best Val Loss: 6.762

	### Pipeline
	```
	Emotional Text → Qwen3-4B (hooks on layers 9,18,27) → Adapter → SDXL + Milady LoRA → Avatar Image
	```

	## Files

	### SDXL Version (Recommended)
	- `sdxl/best_sdxl_adapter.pt` - Trained adapter weights
	- `sdxl/sdxl_adapter.py` - Adapter architecture
	- `sdxl/test_sdxl_pipeline.py` - End-to-end inference script
	- `sdxl/train_sdxl_adapter.py` - Training script

	### Klein Version (Legacy)
	- `adapters/` - Original FLUX.2-Klein adapter weights

	## Requirements
	- SDXL base model: `stabilityai/stable-diffusion-xl-base-1.0`
	- Milady LoRA: CivitAI Milady SDXL LoRA
	- Qwen3-4B: `Qwen/Qwen3-4B`
	- Python packages: `torch`, `transformers`, `diffusers`, `safetensors`

	## Emotions Supported
	20 emotions: happy, sad, angry, surprised, scared, disgusted, neutral, excited, calm, anxious,
	confident, shy, proud, loving, jealous, curious, bored, amused, thoughtful, determined