|
|
--- |
|
|
tags: |
|
|
- text-to-image |
|
|
- lora |
|
|
- diffusers |
|
|
- template:diffusion-lora |
|
|
base_model: black-forest-labs/FLUX.1-Kontext-dev |
|
|
instance_prompt: >- |
|
|
[photo content], recreate the scene from a top-down perspective. Maintain all |
|
|
visual proportions, lighting consistency, and realistic spatial relationships. |
|
|
Ensure the background, textures, and environmental shadows remain naturally |
|
|
aligned from this elevated angle. |
|
|
license: other |
|
|
license_name: flux-1-dev-non-commercial-license |
|
|
license_link: LICENSE.md |
|
|
language: |
|
|
- en |
|
|
pipeline_tag: image-to-image |
|
|
library_name: diffusers |
|
|
--- |
|
|
|
|
|
 |
|
|
|
|
|
# **Kontext-Top-Down-View** |
|
|
|
|
|
The Kontext-Top-Down-View is an experimental adapter for black-forest-lab's FLUX.1-Kontext-dev, designed to transform scenes into a top-down perspective while maintaining accurate visual proportions, consistent lighting, and realistic spatial relationships. The model ensures that backgrounds, textures, and environmental details remain natural and contextually coherent, producing high-quality, perspective-accurate visual outputs. It was trained on 800 image pairs (400 start images and 400 end images) to achieve precise, geometry-consistent top-down scene generation. |
|
|
|
|
|
> [!note] |
|
|
[photo content], recreate the scene from a top-down perspective. Maintain all visual proportions, lighting consistency, and realistic spatial relationships. Ensure the background, textures, and environmental shadows remain naturally aligned from this elevated angle. |
|
|
|
|
|
> You modified the prompt, altering its properties and subjective elements. Note: this is an experimental adapter and may contain artifacts. |
|
|
|
|
|
--- |
|
|
|
|
|
## **Sample Inferences : Demo** |
|
|
|
|
|
<table style="width:100%; border-collapse:collapse;"> |
|
|
<tr> |
|
|
<td style="width:50%; text-align:center;"> |
|
|
<img src="https://cdn-uploads.huggingface.co/production/uploads/65bb837dbfb878f46c77de4c/O9hti3lQODGSiZLGPm811.jpeg" |
|
|
alt="Kontext-Unblur-Upscale" style="width:100%; height:auto;"/> |
|
|
</td> |
|
|
<td style="width:50%; text-align:center;"> |
|
|
<img src="https://cdn-uploads.huggingface.co/production/uploads/65bb837dbfb878f46c77de4c/iH52aQZ7BA6Gdnmj2rkgX.webp" |
|
|
alt="Kontext-Top-Down-View" style="width:100%; height:auto;"/> |
|
|
</td> |
|
|
</tr> |
|
|
</table> |
|
|
|
|
|
<table style="width:100%; border-collapse:collapse;"> |
|
|
<tr> |
|
|
<td style="width:50%; text-align:center;"> |
|
|
<img src="https://cdn-uploads.huggingface.co/production/uploads/65bb837dbfb878f46c77de4c/N_nMU9x0hnb4HAdchJtQC.jpeg" |
|
|
alt="Kontext-Unblur-Upscale" style="width:100%; height:auto;"/> |
|
|
</td> |
|
|
<td style="width:50%; text-align:center;"> |
|
|
<img src="https://cdn-uploads.huggingface.co/production/uploads/65bb837dbfb878f46c77de4c/r_hw2cwckPCfapUZyHe9c.webp" |
|
|
alt="Kontext-Top-Down-View" style="width:100%; height:auto;"/> |
|
|
</td> |
|
|
</tr> |
|
|
</table> |
|
|
|
|
|
--- |
|
|
|
|
|
|
|
|
## Parameter Settings |
|
|
|
|
|
| Setting | Value | |
|
|
| ------------------------ | ------------------------ | |
|
|
| Module Type | Adapter | |
|
|
| Base Model | FLUX.1 Kontext Dev - fp8 | |
|
|
| Trigger Words | [photo content], recreate the scene from a top-down perspective. Maintain all visual proportions, lighting consistency, and realistic spatial relationships. Ensure the background, textures, and environmental shadows remain naturally aligned from this elevated angle. | |
|
|
| Image Processing Repeats | 50 | |
|
|
| Epochs | 25 | |
|
|
| Save Every N Epochs | 1 | |
|
|
|
|
|
Labeling: DeepCaption-VLA-7B(natural language & English) |
|
|
|
|
|
Total Images Used for Training : 800 Image Pairs (400 Start, 400 End) |
|
|
|
|
|
## Training Parameters |
|
|
|
|
|
| Setting | Value | |
|
|
| --------------------------- | --------- | |
|
|
| Seed | - | |
|
|
| Clip Skip | - | |
|
|
| Text Encoder LR | 0.00001 | |
|
|
| UNet LR | 0.00005 | |
|
|
| LR Scheduler | constant | |
|
|
| Optimizer | AdamW8bit | |
|
|
| Network Dimension | 64 | |
|
|
| Network Alpha | 32 | |
|
|
| Gradient Accumulation Steps | - | |
|
|
|
|
|
## Label Parameters |
|
|
|
|
|
| Setting | Value | |
|
|
| --------------- | ----- | |
|
|
| Shuffle Caption | - | |
|
|
| Keep N Tokens | - | |
|
|
|
|
|
## Advanced Parameters |
|
|
|
|
|
| Setting | Value | |
|
|
| ------------------------- | ----- | |
|
|
| Noise Offset | 0.03 | |
|
|
| Multires Noise Discount | 0.1 | |
|
|
| Multires Noise Iterations | 10 | |
|
|
| Conv Dimension | - | |
|
|
| Conv Alpha | - | |
|
|
| Batch Size | - | |
|
|
| Steps | 3800 & 400(warm up) | |
|
|
| Sampler | euler | |
|
|
|
|
|
--- |
|
|
|
|
|
## Trigger words |
|
|
|
|
|
You should use `[photo content]` to trigger the image generation. |
|
|
|
|
|
You should use `recreate the scene from a top-down perspective. Maintain all visual proportions` to trigger the image generation. |
|
|
|
|
|
You should use `lighting consistency` to trigger the image generation. |
|
|
|
|
|
You should use `and realistic spatial relationships. Ensure the background` to trigger the image generation. |
|
|
|
|
|
You should use `textures` to trigger the image generation. |
|
|
|
|
|
You should use `and environmental shadows remain naturally aligned from this elevated angle.` to trigger the image generation. |
|
|
|
|
|
|
|
|
## Download model |
|
|
|
|
|
[Download](/prithivMLmods/Kontext-Top-Down-View/tree/main) them in the Files & versions tab. |