TokForge SD1.5-LCM — Photoreal (Realistic Vision V5.1)
A Realistic Vision V5.1 + LCM SD1.5 MNN bundle — the TokForge SD15_LCM_MNN fast few-step
image model, but with the photoreal Realistic Vision V5.1 checkpoint as the base instead of
DreamShaper-7. Renders a coherent, photorealistic 512px image in 4 steps on the in-process MNN
diffusion engine — full MNN speed, the photoreal look baked into the UNet weights.
This is the Photoreal tier of the TokForge styled image catalog: natural skin, lifelike texture and lighting, true-to-life portraits and scenes. Verified on-device (OnePlus D9500 CPU, 4 steps, ~36 s) rendering a clean photorealistic candid portrait where the DreamShaper base renders a more stylized/cinematic look.
9-file MNN bundle: unet.mnn(.weight) (INT8, Realistic Vision V5.1 + LCM fused), reference SD1.5
f16 text_encoder.mnn(.weight) + vae_decoder.mnn(.weight), vocab.json, merges.txt,
alphas.txt.
Provenance & licenses
- Base model: SG161222/Realistic_Vision_V5.1_noVAE — CreativeML-OpenRAIL-M (commercial-OK; redistribution of derivatives is licensed under OpenRAIL-M §III.4). RealisticVision V5.1 by SG_161222.
- LCM adapter: latent-consistency/lcm-lora-sdv1-5 — openrail++ (UNet-only consistency adapter, fused to keep the 4-step floor).
- CLIP / VAE / tokenizer: the standard SD1.5 reference assets (CLIP ViT-L/14 text encoder + SD1.5 VAE) reused verbatim from the TokForge base LCM bundle. Realistic Vision V5.1 ships "noVAE", so the standard SD1.5 VAE is used (the same VAE referenced by the DreamShaper bundle).
Note: The Lykon "Add More Details" detail-enhancer LoRA (Civitai 82098) was evaluated for additional skin/texture detail but is distributed in kohya format with text-encoder (
lora_te_*) keys that the diffusers fuse converter cannot apply (raisesIndexError); it is not included. Realistic Vision V5.1 already biases strongly photoreal on its own.
Modifications (OpenRAIL-M §III "mark modified")
This bundle is a modified derivative of Realistic Vision V5.1:
- The LCM consistency adapter (
lcm-lora-sdv1-5) is fused into the UNet (fp32 ΔW viadiffusers.fuse_lora()) so the model is coherent at 4-8 steps with CFG≈1.0. - The fused UNet is exported to ONNX and converted to an INT8-quantized MNN model
(
MNNConvert, asymmetric 8-bit weight quant) for on-device CPU inference. - CLIP and VAE are the SD1.5 reference MNN assets, not Realistic Vision's own.
No retraining or fine-tuning of the original weights was performed beyond the LCM fuse + quant.
Use restrictions (OpenRAIL-M Attachment A)
Use of this model is subject to the CreativeML Open RAIL-M license, including the Attachment A use-based restrictions: you agree not to use the model, or any derivative, to violate any law; to exploit/harm minors; to generate or disseminate verifiably false information to harm others; to generate or disseminate personal identifiable information to harm someone; to defame, disparage or harass others; for fully automated decision-making that adversely affects legal rights or creates binding obligations; for discrimination or harm to individuals or groups based on protected characteristics; to exploit vulnerabilities of a specific group; to generate non-consensual or false content about individuals; or to provide medical advice/interpretation of medical results as a substitute for professional advice. See the full license text: https://huggingface.co/spaces/CompVis/stable-diffusion-license
These use-based restrictions propagate to all who use or redistribute this bundle.
Attribution
- Realistic Vision V5.1 — © SG_161222, CreativeML-OpenRAIL-M.
- Latent Consistency Model LoRA (SD1.5) — Latent Consistency team, openrail++.
- Packaged for on-device MNN inference by TokForge (dev.tokforge).