| --- |
| license: cc-by-sa-4.0 |
| datasets: |
| - sankalpsinha77/MARVEL-40M |
| language: |
| - en |
| base_model: |
| - stabilityai/stable-diffusion-3.5-large |
| tags: |
| - text-to-image |
| - image |
| --- |
| |
|
|
| <style> |
| .nurbgen-text { |
| font-family: 'Arial', sans-serif; |
| font-weight: 800; |
| font-size: 5rem; |
| background: linear-gradient( |
| |
| 90deg, |
| #f59e0b, #ef4444, #ec4899, #f59e0b |
| ); |
|
|
| background-size: 300% 100%; |
| -webkit-background-clip: text; |
| -webkit-text-fill-color: transparent; |
| background-clip: text; |
| } |
| </style> |
| |
|
|
|
|
| <div align="center"> |
|
|
| <span class="nurbgen-text">MARVEL-FX3D</span> |
|
|
| |
| Sankalp Sinha👨💻 · Mohammad Sadil Khan👨💻 · Muhammad Usama · Shino Sam · Didier Stricker · Sk Aziz Ali · Muhammad Zeshan Afzal |
| |
| 👨💻 Equally contributing first authors |
|
|
| [](https://openaccess.thecvf.com/content/CVPR2025/papers/Sinha_MARVEL-40M_Multi-Level_Visual_Elaboration_for_High-Fidelity_Text-to-3D_Content_Creation_CVPR_2025_paper.pdf) |
| [](https://sankalpsinha-cmos.github.io/MARVEL/) |
| [](https://huggingface.co/datasets/sankalpsinha77/MARVEL-40M) |
| [](https://sadilkhan.github.io/Marvel-Explorer/) |
| [](https://github.com/SadilKhan/MARVEL-FX3D) |
| |
| |
| <img src="https://readme-typing-svg.herokuapp.com?font=JetBrains+Mono&size=36&pause=1000¢er=true&vCenter=true&width=1000&height=75&color=0C7C59&lines=CVPR+2025" /> |
| </div> |
| |
| --- |
| |
| This repo contains weights for fine-tuned Stable Diffusion 3.5 Large on [MARVEL-40M+](https://sadilkhan.github.io/Marvel-Explorer/) dataset. Given a text prompt, the model generates an image suitable for a pretrained image-to-3D model such as Sam3D, Trellis, or Stable Fast 3D. |
| |
| # Inference |
| |
| ```python |
| # Generate Image from text prompts |
| |
| import torch |
| from diffusers import StableDiffusion3Pipeline |
| |
| model_id = "stabilityai/stable-diffusion-3.5-large" |
| lora_path = "SadilKhan/MARVEL_FX3D" # or local path |
|
|
| pipe = StableDiffusion3Pipeline.from_pretrained( |
| model_id, |
| torch_dtype=torch.float16, |
| device_map="auto" |
| ) |
| |
| # Load LoRA weights |
| pipe.load_lora_weights(lora_path) |
| |
| pipe.to("cuda") |
| |
| prompt = "An old, moss-covered wishing well. Rough stones, aged wood, rusty chains, mushrooms, fallen leaves, and twigs create an enchanting, ancient, and rustic atmosphere." |
| |
| image = pipe( |
| prompt=prompt, |
| num_inference_steps=28, |
| guidance_scale=7.0, |
| ).images[0] |
|
|
| image.save("output.png") |
| ``` |
| |
| # Citation |
| |
| If you find MARVEL-FX3D useful, please cite |
| |
| ``` |
| @inproceedings{sinha2025marvel, |
| title = {MARVEL-40M+: Multi-Level Visual Elaboration for High-Fidelity Text-to-3D Content Creation}, |
| author = {Sinha, Sankalp and Khan, Mohammad Sadil and Usama, Muhammad and Sam, Shino and Stricker, Didier and Ali, Sk Aziz and Afzal, Muhammad Zeshan}, |
| booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition}, |
| pages={8105--8116}, |
| year={2025} |
| } |
| ``` |