# ImageGen HF-compatible package for the trained local text-to-image adapter. ## Files - `adapter_model.pt`: trained adapter weights. This file is preserved and is not regenerated by setup scripts. - `config.json`: adapter and image-generator architecture. - `training_config.json`: training provenance and trainable tensor list. - `model_index.json`: Hugging Face/Diffusers-style entry metadata. - `pipeline.py`: `ImageGenPipeline.from_pretrained(...)` and generation wrapper. - `quality_adapter.pt`: sidecar latent quality adapter trained without replacing the base adapter checkpoint. - `visual_contract_adapter.pt`: sidecar Visual Contract Adapter for prompt/layout control. It is initialized as a safe no-op until VCA-specific training. - `model/`: adapter, Qwen-aligned text refiner, LoRA helpers, and latent diffusion scheduler. - `tokenizer/`: local tokenizer used for prompt tokenization. - `models/Phillnet-2-SDXL-UNet-VAE/`: packaged local UNet, VAE, and scheduler config for the diffusion route. - `models/Phillnet-2-SDXL-TextEncoders/`: packaged SDXL Turbo tokenizers/text encoders used for prompt-faithful diffusion conditioning. ## Load ```python from ImageGen import ImageGenPipeline pipe = ImageGenPipeline.from_pretrained("ImageGen", device="cpu") out = pipe("a geometric neon logo", height=128, width=128, num_inference_steps=1) image = out.images[0] ``` For coherent Qwen conditioning, pass the already-loaded local Qwen/GptOss text model: ```python pipe = ImageGenPipeline.from_pretrained("ImageGen", text_model=qwen_model, tokenizer=qwen_tokenizer, device="cuda") ``` By default the pipeline uses the trained text-prior route with the non-destructive quality adapter enabled: ```python out = pipe("studio photo of a glass sculpture", height=128, width=128) ``` Use `quality_strength=0.0` to recover the unmodified base adapter behavior. The `generation_strategy="diffusion"` uses the packaged SDXL Turbo text encoders plus the local UNet/VAE route. Use it for prompt-faithful public examples: ```python out = pipe( "wireless headphones product photo on a clean desk, soft studio light", height=128, width=128, num_inference_steps=1, generation_strategy="diffusion", ) ``` The Visual Contract Adapter is exposed separately: ```python out = pipe( 'woman in a blue jacket holding a yellow umbrella, poster text says "RAIN DAY"', height=128, width=128, contract_strength=1.0, ) ``` The VCA checkpoint is saved outside `adapter_model.pt`, starts as a zero-output residual, and uses existing Qwen/text conditioning tokens plus optional `contract_maps` tensors. This keeps the base weights reversible and lets future training target prompt obedience, layout, edit masks, and exact-text constraints. ## PhillMagine320 Fine-Tune The local adapter was fine-tuned on the full [`ayjays132/PhillMagine320`](https://huggingface.co/datasets/ayjays132/PhillMagine320) dataset: 288 train rows plus 32 test rows. Conditioning includes every dataset feature: `prompt`, `label`, `has_text_elements`, `source`, and `split`. ```powershell python ImageGen\finetune_phillmagine320.py --epochs 1 --batch-size 2 --grad-accum 2 --image-size 128 --max-length 128 --lr 2e-5 ``` The trainer uses CUDA with bf16 autocast, freezes the SDXL VAE, conditions from the local Phill-AXIOM `embed_tokens` table in `model.safetensors`, preserves the adapter checkpoint key set, and refreshes `models/qwen_aligned_refiner/deep_16.pt` from the saved adapter. The pre-finetune checkpoint is kept as `adapter_model.pre_phillmagine320.pt`. ## Compatibility Contract The pipeline keeps the existing adapter checkpoint intact. Architecture changes should be made in `config.json` only when retraining or intentionally migrating weights.