File size: 5,869 Bytes

b857f6f

---
base_model: unsloth/llama-3.2-11b-vision-instruct-bnb-4bit
library_name: peft
---

# Model Card: LlamaFloorPlanVisionAIAdaptor

## Model Details

**Model Name:** FloorPlanVisionAIAdaptor  
**Task:** Floor plan analysis for architectural and interior design insights  
**Framework:** PyTorch with `unsloth`

---

## Model Description

The `FloorPlanVisionAIAdaptor` model is a state-of-the-art Vision-Language Model (VLM) designed for analyzing floor plan images. The model leverages a deep neural architecture optimized for tasks requiring detailed visual understanding combined with textual reasoning. It can infer the layout, room counts, key features, and other architectural details from images of floor plans.

### Key Features:
- **Multi-modal Input:** Accepts both image and text input for contextual understanding.
- **Expertise Emulation:** Simulates the expertise of an architect or interior designer.
- **Gradient Checkpointing:** Reduces memory usage, enabling analysis of high-resolution images.
- **Flexible Precision:** Supports 4-bit  inference depending on memory constraints.

### Applications:
- Automated floor plan analysis for real estate listings.
- Assisting architects in creating and verifying designs.
- Generating insights for interior design and space planning.
- Educational purposes in architecture and design training.

---

## Intended Use

### Primary Use Cases:
- To analyze and interpret floor plan images, providing detailed descriptions of:
  - Room layout and connections.
  - Room dimensions and count.
  - Unique architectural features.
- To assist architects, designers, and real estate professionals in understanding and documenting floor plans.

### Users:
- Architects
- Interior Designers
- Real Estate Professionals
- Educators and Students in Architecture and Design

---

## How to Use the Model

### Installation
To use the model, ensure that the required libraries such as `torch`, `unsloth`, and `transformers` are installed:
```bash
pip install torch unsloth transformers
```

### Loading the Model
The following Python script demonstrates how to load and use the model:
```python
import os
from unsloth import FastVisionModel  # Import FastVisionModel for Vision-Language tasks
import torch

# Load the pre-trained model and tokenizer
model, tokenizer = FastVisionModel.from_pretrained(
    "sabaridsnfuji/FloorPlanVisionAIAdaptor",
    load_in_4bit=True,  # Use 4-bit precision to save memory if needed
    use_gradient_checkpointing="unsloth"  # Enable gradient checkpointing for efficiency
)

FastVisionModel.for_inference(model)  # Enable inference mode


from PIL import Image

# Function to load image using PIL and return image object
def load_image(image_path):
    try:
        image = Image.open(image_path)
        return image
    except Exception as e:
        print(f"Error loading image {image_path}: {e}")
        return None

# Define the instruction and input
instruction = """You are an expert in architecture and interior design. Analyze the floor plan image and describe accurately the key features, room count, layout, and any other important details you observe."""

image = load_image("/content/sample_images/5_2.jpg") # converted_dataset[0]["image"]


# Format input message
messages = [
    {"role": "user", "content": [
        {"type": "image"},
        {"type": "text", "text": instruction}
    ]}
]

input_text = tokenizer.apply_chat_template(messages, add_generation_prompt=True)

# Prepare inputs
inputs = tokenizer(
    image,  # Replace with the actual image tensor
    input_text,
    add_special_tokens=False,
    return_tensors="pt",
).to("cuda")

# Perform inference
from transformers import TextStreamer
text_streamer = TextStreamer(tokenizer, skip_prompt=True)

output = model.generate(
    **inputs,
    streamer=text_streamer,
    max_new_tokens=2048,
    use_cache=True,
    temperature=1.5,
    min_p=0.1
)
```
## Input image:


![Floor Plan Image](https://huggingface.co/sabaridsnfuji/FloorPlanVisionAIAdaptor/resolve/main/5_1.jpg)


##  Sample Output:

```
**Room Count:**
1 bedroom, 1 study/office, 1 bathroom, kitchen, living room, dining room, verandah.

**Room Types and Labels:**
Bedroom, kitchen, living room, dining room, study/office, bathroom, verandah.

**Room Sizes:**
- Bedroom: 9'8" x 9'10"
- Kitchen: 22'8" x 13'0"
- Dining Room: 10'0" x 13'0"
- Living Room: 13'8" x 15'6"
- Study/Office: 9'8" x 9'10"

**Primary Features:**
Stairs, verandah, windows along perimeter, kitchen island.

**Functional Areas:**
Bathroom adjacent to kitchen; no pantry or mudroom. Kitchen island provides functional space.

**Layout Overview:**
Central stairs with rooms radiating off. Kitchen near bathroom; living and dining areas open-plan.

**Flooring and Attributes:**
Tile in bathroom, verandah, and main living spaces. Likely standard ceiling height.

**Summary:**
Compact, single-floor layout with essential living spaces and utility rooms. Open-plan living areas provide fluid movement; stairs likely provide additional storage.<|eot_id|>
```


---

## Limitations
- **Domain-Specific Knowledge:** While the model is trained to emulate architectural expertise, it may not replace human professionals for complex design tasks.
- **Image Quality:** Performance may degrade for low-resolution or incomplete floor plans.
- **Generalization:** The model may struggle with floor plans featuring unconventional layouts or non-standard symbols.

---

## Training Data
The model was trained on a curated dataset of architectural floor plans, annotated with detailed descriptions of rooms, layouts, and features.

---



---

## Acknowledgements
The development of this model was inspired by advancements in multi-modal AI and the need for intelligent systems in the architectural domain. Special thanks to the contributors of `unsloth` and `transformers` libraries.

---