File size: 5,523 Bytes
3dabc5a
 
 
 
 
 
 
 
 
 
 
 
e65a5a6
 
 
841c320
 
 
 
3dabc5a
3d0656a
7ec55d0
3d0656a
19de910
3dabc5a
ff4aa70
3dabc5a
abd0e63
 
3dabc5a
7ec55d0
 
3d0656a
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
19de910
 
 
3d0656a
19de910
 
 
 
 
 
32b9e9d
19de910
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
3dabc5a
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
7ec55d0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
---
tags:
- text-to-image
- lora
- diffusers
- template:diffusion-lora
base_model: black-forest-labs/FLUX.1-Kontext-dev
instance_prompt: >-
  [photo content], recreate the scene from a top-down perspective. Maintain all
  visual proportions, lighting consistency, and realistic spatial relationships.
  Ensure the background, textures, and environmental shadows remain naturally
  aligned from this elevated angle.
license: other
license_name: flux-1-dev-non-commercial-license
license_link: LICENSE.md
language:
- en
pipeline_tag: image-to-image
library_name: diffusers
---

![1](https://cdn-uploads.huggingface.co/production/uploads/65bb837dbfb878f46c77de4c/gkn6DvNaQn14GgbhHgq5v.png)

# **Kontext-Top-Down-View**

The Kontext-Top-Down-View is an experimental adapter for black-forest-lab's FLUX.1-Kontext-dev, designed to transform scenes into a top-down perspective while maintaining accurate visual proportions, consistent lighting, and realistic spatial relationships. The model ensures that backgrounds, textures, and environmental details remain natural and contextually coherent, producing high-quality, perspective-accurate visual outputs. It was trained on 800 image pairs (400 start images and 400 end images) to achieve precise, geometry-consistent top-down scene generation.

> [!note]
[photo content], recreate the scene from a top-down perspective. Maintain all visual proportions, lighting consistency, and realistic spatial relationships. Ensure the background, textures, and environmental shadows remain naturally aligned from this elevated angle.

> You modified the prompt, altering its properties and subjective elements. Note: this is an experimental adapter and may contain artifacts.

---

## **Sample Inferences : Demo**

<table style="width:100%; border-collapse:collapse;">
  <tr>
    <td style="width:50%; text-align:center;">
      <img src="https://cdn-uploads.huggingface.co/production/uploads/65bb837dbfb878f46c77de4c/O9hti3lQODGSiZLGPm811.jpeg" 
           alt="Kontext-Unblur-Upscale" style="width:100%; height:auto;"/>
    </td>
    <td style="width:50%; text-align:center;">
      <img src="https://cdn-uploads.huggingface.co/production/uploads/65bb837dbfb878f46c77de4c/iH52aQZ7BA6Gdnmj2rkgX.webp" 
           alt="Kontext-Top-Down-View" style="width:100%; height:auto;"/>
    </td>
  </tr>
</table>

<table style="width:100%; border-collapse:collapse;">
  <tr>
    <td style="width:50%; text-align:center;">
      <img src="https://cdn-uploads.huggingface.co/production/uploads/65bb837dbfb878f46c77de4c/N_nMU9x0hnb4HAdchJtQC.jpeg" 
           alt="Kontext-Unblur-Upscale" style="width:100%; height:auto;"/>
    </td>
    <td style="width:50%; text-align:center;">
      <img src="https://cdn-uploads.huggingface.co/production/uploads/65bb837dbfb878f46c77de4c/r_hw2cwckPCfapUZyHe9c.webp" 
           alt="Kontext-Top-Down-View" style="width:100%; height:auto;"/>
    </td>
  </tr>
</table>

---


## Parameter Settings

| Setting                  | Value                    |
| ------------------------ | ------------------------ |
| Module Type              | Adapter                     |
| Base Model               | FLUX.1 Kontext Dev - fp8 |
| Trigger Words            | [photo content], recreate the scene from a top-down perspective. Maintain all visual proportions, lighting consistency, and realistic spatial relationships. Ensure the background, textures, and environmental shadows remain naturally aligned from this elevated angle. |
| Image Processing Repeats | 50                       |
| Epochs                   | 25                       |
| Save Every N Epochs      | 1                        |

    Labeling: DeepCaption-VLA-7B(natural language & English)
    
    Total Images Used for Training : 800 Image Pairs (400 Start, 400 End)

## Training Parameters

| Setting                     | Value     |
| --------------------------- | --------- |
| Seed                        | -         |
| Clip Skip                   | -         |
| Text Encoder LR             | 0.00001   |
| UNet LR                     | 0.00005   |
| LR Scheduler                | constant  |
| Optimizer                   | AdamW8bit |
| Network Dimension           | 64        |
| Network Alpha               | 32        |
| Gradient Accumulation Steps | -         |

## Label Parameters

| Setting         | Value |
| --------------- | ----- |
| Shuffle Caption | -     |
| Keep N Tokens   | -     |

## Advanced Parameters

| Setting                   | Value |
| ------------------------- | ----- |
| Noise Offset              | 0.03  |
| Multires Noise Discount   | 0.1   |
| Multires Noise Iterations | 10    |
| Conv Dimension            | -     |
| Conv Alpha                | -     |
| Batch Size                | -     |
| Steps   | 3800 & 400(warm up)  |
| Sampler | euler |

---

## Trigger words

You should use `[photo content]` to trigger the image generation.

You should use `recreate the scene from a top-down perspective. Maintain all visual proportions` to trigger the image generation.

You should use `lighting consistency` to trigger the image generation.

You should use `and realistic spatial relationships. Ensure the background` to trigger the image generation.

You should use `textures` to trigger the image generation.

You should use `and environmental shadows remain naturally aligned from this elevated angle.` to trigger the image generation.


## Download model

[Download](/prithivMLmods/Kontext-Top-Down-View/tree/main) them in the Files & versions tab.