--- license: other license_name: fair-1.0.0 license_link: LICENSE library_name: pytorch tags: - image-generation - pixel-art - sprites - flow-matching - diffusion - text-to-image - game-assets pipeline_tag: text-to-image --- # Alucard A small (32M parameter) text-to-sprite generative model using flow matching. Generates 128x128 RGBA sprites from text prompts, with optional reference frame input for animation generation. **GitHub**: [evilsocket/alucard](https://github.com/evilsocket/alucard) ## Installation ```bash pip install git+https://github.com/evilsocket/alucard.git ``` ## Usage ### Generate a sprite from text ```python from alucard import Alucard # Load model (downloads weights automatically from HuggingFace) model = Alucard.from_pretrained("evilsocket/alucard") # Generate a sprite sprite = model("a pixel art knight sprite, idle pose") sprite.save("knight.png") # Generate multiple variations sprites = model("a pixel art dragon enemy sprite", num_samples=4, seed=42) for i, s in enumerate(sprites): s.save(f"dragon_{i}.png") ``` ### Generate an animation sequence Use the `ref` parameter to condition generation on a previous frame: ```python from alucard import Alucard model = Alucard.from_pretrained("evilsocket/alucard") # Generate the first frame frame_1 = model("a pixel art knight sprite, walking right, frame 1") frame_1.save("walk_01.png") # Generate subsequent frames by passing the previous frame as reference frame_2 = model("a pixel art knight sprite, walking right, frame 2", ref=frame_1) frame_2.save("walk_02.png") frame_3 = model("a pixel art knight sprite, walking right, frame 3", ref=frame_2) frame_3.save("walk_03.png") frame_4 = model("a pixel art knight sprite, walking right, frame 4", ref=frame_3) frame_4.save("walk_04.png") ``` You can also pass a file path as `ref`: ```python sprite = model("a pixel art knight sprite, attack pose", ref="walk_01.png") ``` ### Generation parameters ```python sprite = model( "a pixel art wizard sprite", num_samples=1, # number of images to generate num_steps=20, # Euler ODE steps (more = better quality, slower) cfg_text=5.0, # text guidance scale (higher = stronger prompt adherence) cfg_ref=2.0, # reference guidance scale (higher = more similar to ref) seed=42, # reproducibility ) ``` ### Load from local weights ```python # From a .safetensors file model = Alucard.from_pretrained("path/to/alucard_model.safetensors") # From a training checkpoint model = Alucard.from_pretrained("path/to/best.pt") # From a local directory containing alucard_model.safetensors model = Alucard.from_pretrained("path/to/model_dir/") ``` ## Architecture | Property | Value | |----------|-------| | Parameters | 31,956,228 (32M) | | Input | 128x128 RGBA (4ch noisy + 4ch reference) | | Output | 128x128 RGBA | | Text encoder | CLIP ViT-B/32 (frozen, 512-dim) | | Conditioning | AdaLN-Zero | | Training | Flow matching (rectified flow) | | Base channels | 64, multipliers [1, 2, 4, 4] | | Attention | Self-attention at 32x32 and 16x16 | ## Training Trained on 33K sprites from publicly available datasets (Kaggle Pixel Art, Kenney CC0, GameTileNet, Pixel Art Nouns, TinyHero). ## License Released under the [FAIR License (Free for Attribution and Individual Rights) v1.0.0](LICENSE). - **Non-commercial use** (personal, educational, research, non-profit) is freely permitted under the terms of the license. - **Commercial use** (SaaS, paid apps, any monetization) requires visible attribution to the project and its author. See the [license](LICENSE) for details. - **Business use** (any use by or on behalf of a business entity) requires a signed commercial agreement with the author. Contact `evilsocket@gmail.com` for inquiries.