File size: 1,518 Bytes
3402dce
42e1de5
016edcd
42e1de5
 
3402dce
 
016edcd
42e1de5
05f8531
 
371de6f
3402dce
 
42e1de5
016edcd
42e1de5
7c91d97
42e1de5
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
---
title: Falcon-Perception-0.6B WebGPU
emoji: πŸ¦…
colorFrom: indigo
colorTo: pink
sdk: static
pinned: false
license: apache-2.0
short_description: Open-vocab detection + segmentation, all in the browser
models:
  - tiiuae/Falcon-Perception
  - onnx-community/falcon-perception-onnx-webgpu
---

# πŸ¦… Falcon-Perception-0.6B WebGPU

A browser demo for **[tiiuae/Falcon-Perception](https://huggingface.co/tiiuae/Falcon-Perception)** β€” a 0.6B open-vocabulary VLM that turns natural-language queries into bounding boxes and pixel-accurate segmentation masks, running fully client-side via WebGPU + ONNX Runtime Web.

[![Model](https://img.shields.io/badge/πŸ€—%20Model-tiiuae%2FFalcon--Perception-yellow)](https://huggingface.co/tiiuae/Falcon-Perception)
[![Weights](https://img.shields.io/badge/πŸ€—%20ONNX%20Weights-onnx--community%2Ffalcon--perception--onnx--webgpu-blue)](https://huggingface.co/onnx-community/falcon-perception-onnx-webgpu)

## What's inside

- **Detection** β€” draw bounding boxes for any natural-language query ("athletes", "the runner in front", "mangoes").
- **Segmentation** β€” pixel-accurate masks via the AnyUp upsampler, all in-browser.
- **Tracker (preview)** β€” HUD-style reticles on video. Limited by VLM latency between detections; see the in-space disclaimer.

## How it runs

2.4 GB of ONNX weights are fetched once on first visit, then cached by your browser β€” no backend, no API keys, no network round-trip after load. Multi-threaded WASM is enabled via `coi-serviceworker`.