Isaac-0.2-1B by Perceptron

Introducing the 1B parameter variant of Isaac-0.2, the hybrid-reasoning vision-language model.

This release brings major upgrades — optional reasoning via thinking traces, perceptive tool calling (including our new Focus system), stronger grounding, better OCR, better desktop use, and improved structured output — while remaining fast, compact, and deployable.

Extending the efficient frontier of perception

Isaac 0.2 extends what we started with Isaac 0.1: small models that outperform systems 10× larger on visual reasoning and perception tasks, all running on commodity GPUs or edge devices. From robotics to media search to industrial inspection, Isaac 0.2 delivers high-accuracy perception without the heavy compute footprint.

What's New in Isaac 0.2

Reasoning via Thinking Traces: Short, structured reasoning traces improve multi-step decisions, small-object understanding, and ambiguous spatial tasks.
Perceptive Tool Calling + Focus (Zoom & Crop): Isaac 0.2 can trigger tool calls to focus (i.e., zoom and crop) and re-query the model on a smaller region — dramatically improving fine-grained perception.
Structured Outputs: More reliable structured output generation for consistent JSON and predictable downstream integration.
Complex OCR: Improved text recognition across cluttered, low-resolution, or distorted regions — enabling accurate extraction from documents, diagrams, labels, screens, and dense real-world scenes.
Desktop Use: Better performance on everyday desktop and mobile workflows such as UI understanding and navigation, making Isaac faster and more capable for agentic use cases.

Performance Benchmarks

Chatting with Isaac in 🤗 Transformers

Learn more at our Huggingface Example Repo, where we demo extracting and rendering points.

pip install perceptron

Usage

import torch
from transformers import AutoModelForCausalLM, AutoProcessor
from transformers.image_utils import load_image
from transformers.utils.import_utils import is_torch_cuda_available

def document_to_messages(document: list[dict]):
    messages, images = [], []
    for item in document:
        if not (content := item.get("content")):
            continue
        role = item.get("role", "user")
        if item.get("type") == "image":
            images.append(load_image(content))
            messages.append({"role": role, "content": "<image>"})
        elif item.get("type") == "text":
            messages.append({"role": role, "content": content})
    return messages, images

hf_path = "PerceptronAI/Isaac-0.2-1B"
device, dtype = ("cuda",torch.bfloat16) if is_torch_cuda_available() else ("cpu",torch.float32)

# Load model/processor from the checkpoint
processor = AutoProcessor.from_pretrained(hf_path, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
    hf_path, trust_remote_code=True, vision_attn_implementation="flash_attention_2"
)
model = model.to(device=device, dtype=dtype)
model.eval()

# Prepare input for generation
document = [
    {
        "type": "text",
        "content": "<hint>BOX</hint>",
        "role": "user",
    },
    {
        "type": "image",
        "content": "https://raw.githubusercontent.com/perceptron-ai-inc/perceptron/refs/heads/main/huggingface/assets/example.webp",
        "role": "user",
    },
    {
        "type": "text",
        "content": "Determine whether it is safe to cross the street. Look for signage and moving traffic.",
        "role": "user",
    },
]
messages, images = document_to_messages(document)
text = processor.apply_chat_template(
    messages, tokenize=False, add_generation_prompt=True
)
inputs = processor(text=text, images=images, return_tensors="pt")

# Generate text using the model
generated_ids = model.generate(
    tensor_stream=inputs["tensor_stream"].to(next(model.parameters()).device),
    max_new_tokens=256,
    do_sample=False,
)
generated_text = processor.tokenizer.decode(
    generated_ids[0], skip_special_tokens=False
)
print(f"\nFull generated output:\n{generated_text}")

Downloads last month: 39

Safetensors

Model size

1B params

Tensor type

F32

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support