File size: 5,523 Bytes
3dabc5a e65a5a6 841c320 3dabc5a 3d0656a 7ec55d0 3d0656a 19de910 3dabc5a ff4aa70 3dabc5a abd0e63 3dabc5a 7ec55d0 3d0656a 19de910 3d0656a 19de910 32b9e9d 19de910 3dabc5a 7ec55d0 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 |
---
tags:
- text-to-image
- lora
- diffusers
- template:diffusion-lora
base_model: black-forest-labs/FLUX.1-Kontext-dev
instance_prompt: >-
[photo content], recreate the scene from a top-down perspective. Maintain all
visual proportions, lighting consistency, and realistic spatial relationships.
Ensure the background, textures, and environmental shadows remain naturally
aligned from this elevated angle.
license: other
license_name: flux-1-dev-non-commercial-license
license_link: LICENSE.md
language:
- en
pipeline_tag: image-to-image
library_name: diffusers
---

# **Kontext-Top-Down-View**
The Kontext-Top-Down-View is an experimental adapter for black-forest-lab's FLUX.1-Kontext-dev, designed to transform scenes into a top-down perspective while maintaining accurate visual proportions, consistent lighting, and realistic spatial relationships. The model ensures that backgrounds, textures, and environmental details remain natural and contextually coherent, producing high-quality, perspective-accurate visual outputs. It was trained on 800 image pairs (400 start images and 400 end images) to achieve precise, geometry-consistent top-down scene generation.
> [!note]
[photo content], recreate the scene from a top-down perspective. Maintain all visual proportions, lighting consistency, and realistic spatial relationships. Ensure the background, textures, and environmental shadows remain naturally aligned from this elevated angle.
> You modified the prompt, altering its properties and subjective elements. Note: this is an experimental adapter and may contain artifacts.
---
## **Sample Inferences : Demo**
<table style="width:100%; border-collapse:collapse;">
<tr>
<td style="width:50%; text-align:center;">
<img src="https://cdn-uploads.huggingface.co/production/uploads/65bb837dbfb878f46c77de4c/O9hti3lQODGSiZLGPm811.jpeg"
alt="Kontext-Unblur-Upscale" style="width:100%; height:auto;"/>
</td>
<td style="width:50%; text-align:center;">
<img src="https://cdn-uploads.huggingface.co/production/uploads/65bb837dbfb878f46c77de4c/iH52aQZ7BA6Gdnmj2rkgX.webp"
alt="Kontext-Top-Down-View" style="width:100%; height:auto;"/>
</td>
</tr>
</table>
<table style="width:100%; border-collapse:collapse;">
<tr>
<td style="width:50%; text-align:center;">
<img src="https://cdn-uploads.huggingface.co/production/uploads/65bb837dbfb878f46c77de4c/N_nMU9x0hnb4HAdchJtQC.jpeg"
alt="Kontext-Unblur-Upscale" style="width:100%; height:auto;"/>
</td>
<td style="width:50%; text-align:center;">
<img src="https://cdn-uploads.huggingface.co/production/uploads/65bb837dbfb878f46c77de4c/r_hw2cwckPCfapUZyHe9c.webp"
alt="Kontext-Top-Down-View" style="width:100%; height:auto;"/>
</td>
</tr>
</table>
---
## Parameter Settings
| Setting | Value |
| ------------------------ | ------------------------ |
| Module Type | Adapter |
| Base Model | FLUX.1 Kontext Dev - fp8 |
| Trigger Words | [photo content], recreate the scene from a top-down perspective. Maintain all visual proportions, lighting consistency, and realistic spatial relationships. Ensure the background, textures, and environmental shadows remain naturally aligned from this elevated angle. |
| Image Processing Repeats | 50 |
| Epochs | 25 |
| Save Every N Epochs | 1 |
Labeling: DeepCaption-VLA-7B(natural language & English)
Total Images Used for Training : 800 Image Pairs (400 Start, 400 End)
## Training Parameters
| Setting | Value |
| --------------------------- | --------- |
| Seed | - |
| Clip Skip | - |
| Text Encoder LR | 0.00001 |
| UNet LR | 0.00005 |
| LR Scheduler | constant |
| Optimizer | AdamW8bit |
| Network Dimension | 64 |
| Network Alpha | 32 |
| Gradient Accumulation Steps | - |
## Label Parameters
| Setting | Value |
| --------------- | ----- |
| Shuffle Caption | - |
| Keep N Tokens | - |
## Advanced Parameters
| Setting | Value |
| ------------------------- | ----- |
| Noise Offset | 0.03 |
| Multires Noise Discount | 0.1 |
| Multires Noise Iterations | 10 |
| Conv Dimension | - |
| Conv Alpha | - |
| Batch Size | - |
| Steps | 3800 & 400(warm up) |
| Sampler | euler |
---
## Trigger words
You should use `[photo content]` to trigger the image generation.
You should use `recreate the scene from a top-down perspective. Maintain all visual proportions` to trigger the image generation.
You should use `lighting consistency` to trigger the image generation.
You should use `and realistic spatial relationships. Ensure the background` to trigger the image generation.
You should use `textures` to trigger the image generation.
You should use `and environmental shadows remain naturally aligned from this elevated angle.` to trigger the image generation.
## Download model
[Download](/prithivMLmods/Kontext-Top-Down-View/tree/main) them in the Files & versions tab. |