Image Segmentation
ultralytics
PyTorch
English
object-detection
instance-segmentation
yolov8
coco
real-time
capsule-network
interpretable-ai
symbolic-ai
Eval Results (legacy)
Instructions to use zpyuan/SymbolicCapsuleNetwork with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- ultralytics
How to use zpyuan/SymbolicCapsuleNetwork with ultralytics:
from ultralytics import YOLOvv8 model = YOLOvv8.from_pretrained("zpyuan/SymbolicCapsuleNetwork") source = 'http://images.cocodataset.org/val2017/000000039769.jpg' model.predict(source=source, save=True) - Notebooks
- Google Colab
- Kaggle
| license: gpl-3.0 | |
| language: | |
| - en | |
| tags: | |
| - object-detection | |
| - instance-segmentation | |
| - yolov8 | |
| - coco | |
| - real-time | |
| - pytorch | |
| - capsule-network | |
| - interpretable-ai | |
| - symbolic-ai | |
| library_name: ultralytics | |
| pipeline_tag: image-segmentation | |
| datasets: | |
| - coco | |
| model-index: | |
| - name: SCN | |
| results: | |
| - task: | |
| type: object-detection | |
| name: Object Detection | |
| dataset: | |
| name: COCO 2017 | |
| type: coco | |
| split: val2017 | |
| metrics: | |
| - type: mAP50 | |
| value: 0.57100 | |
| name: mAP50 | |
| - type: mAP50-95 | |
| value: 0.41600 | |
| name: mAP50:95 | |
| - task: | |
| type: instance-segmentation | |
| name: Instance Segmentation | |
| dataset: | |
| name: COCO 2017 | |
| type: coco | |
| split: val2017 | |
| metrics: | |
| - type: mAP50 | |
| value: 0.53316 | |
| name: Mask mAP50 | |
| - type: mAP50-95 | |
| value: 0.34080 | |
| name: Mask mAP50:95 | |
| # Symbolic Capsule Network (SCN) | |
| > *What if a detector could tell you not just **what** it found, but **why** it is confident?* | |
| **SCN** is a real-time object detection and instance segmentation model that replaces the conventional convolutional head with a **capsule-based neck and head**. By encoding visual entities as pose-aware vectors rather than scalar activations, SCN explicitly captures *part-whole relationships* — the structural agreements between object parts and the wholes they compose. Every detection is backed by a **symbolic routing path**: a traceable chain of capsule agreements that exposes *which parts* voted for *which object*, turning each prediction into an auditable reasoning trace. | |
| ## Live Demo | |
| [Try the interactive demo ↗](https://huggingface.co/spaces/zpyuan/SymbolicCapsuleNetwork-demo) | |
| ## Example Results | |
| | | | | |
| |---|---| | |
| |  |  | | |
| |  |  | | |
| --- | |
| ## Key Ideas | |
| Standard convolutional detectors reduce every visual entity to a scalar confidence score, discarding the compositional structure that makes objects recognisable. SCN addresses this with three tightly integrated contributions: | |
| **1. Part-Whole Relation Modelling** | |
| `CapsRoute` layers propagate evidence upward from low-level part capsules — encoding local features such as wheels, windows, and body panels — to high-level object capsules through dynamic routing-by-agreement. Agreement is only reached when the geometric votes from multiple parts are mutually consistent, giving the model an inductive bias toward spatially coherent detections. | |
| **2. Symbolic Routing Paths** | |
| The routing coefficients produced at each capsule layer form an explicit, directed evidence graph. Unlike Grad-CAM or SHAP, which reconstruct explanations after the fact, SCN's routing weights are native model outputs — first-class signals that describe the model's reasoning as it happens, without any additional computation. | |
| **3. Concept-Based Detection Auditing** | |
| Routing paths enable structured inspection that scalar networks cannot support: | |
| - **Verify** that a predicted "car" is grounded in consistent wheel, body, and windshield part activations. | |
| - **Diagnose** which part capsule collapsed when the model misses an object under occlusion or viewpoint change. | |
| - **Detect bias** by aggregating routing statistics across a dataset to reveal which visual parts the model over-relies on. | |
| ## Architecture | |
|  | |
| The pipeline flows through four capsule-specific modules: | |
| | Module | Role | | |
| |---|---| | |
| | `CapsProj` | Projects multi-scale CNN feature maps into capsule space | | |
| | `CapsAlign` | Aligns capsule resolutions across FPN levels | | |
| | `CapsRoute` / `CapsRouteV2-4` | Dynamic routing-by-agreement across part-to-whole levels | | |
| | `CapsDecode` | Decodes final capsule activations into boxes and masks | | |
| --- | |
| ## Performance | |
| ### Detection — COCO 2017 val | |
| SCN sets a new state of the art among nano-scale detectors, surpassing every YOLO variant at comparable FLOPs. | |
| | Model | mAP50 | mAP50:95 | mAP50 (E2E) | mAP50:95 (E2E) | Speed (ms) | Params (M) | FLOPs (B) | | |
| |---|---:|---:|---:|---:|---:|---:|---:| | |
| | YOLOv6n | 53.1% | 37.5% | 52.1% | 36.9% | 20.8 | 4.7 | 11.4 | | |
| | YOLOv7-tiny | 56.7% | 38.7% | 55.7% | 38.1% | 20.9 | 6.2 | 13.8 | | |
| | YOLOv8n | 52.5% | 37.3% | 51.5% | 36.6% | 18.3 | 3.2 | 8.7 | | |
| | YOLOv9t | 53.1% | 38.3% | 52.1% | 37.6% | 20.1 | 2.0 | 7.7 | | |
| | YOLOv10n | 53.8% | 38.5% | 52.8% | 37.8% | 16.7 | 2.3 | 6.7 | | |
| | YOLOv11n | 55.1% | 39.5% | 54.1% | 38.8% | 19.3 | 2.6 | 6.5 | | |
| | YOLOv12n | 56.7% | 40.4% | 55.7% | 39.7% | 19.4 | 2.5 | 6.0 | | |
| | YOLO26n | 56.8% | 40.8% | 55.7% | 40.0% | 14.4 | 2.6 | 6.1 | | |
| | **SCN-n (Ours)** | **57.1%** | **41.6%** | **56.1%** | **40.4%** | 29.6 | 3.3 | 6.5 | | |
| SCN-n achieves **+0.3% mAP50 and +0.8% mAP50:95** over the previous best (YOLO26n) at the same 6.5B FLOPs budget — accuracy gains that come entirely from structural reasoning, not extra capacity. | |
| ### Accuracy–Efficiency Frontier | |
|  | |
| *SCN occupies the top of the accuracy–efficiency frontier across all model scales (n / s / m / l / x). At every FLOPs level, SCN variants outperform their YOLO counterparts, demonstrating that part-whole routing is a principled and scalable improvement.* | |
| ### Instance Segmentation — COCO 2017 val | |
| | Model | Input | Mask mAP50 | Mask mAP50:95 | | |
| |---|---:|---:|---:| | |
| | SCN Segmentation | 640 | 53.3% | 34.1% | | |
| --- | |
| ## Quick Start | |
| ```bash | |
| pip install ultralytics huggingface_hub | |
| ``` | |
| ```python | |
| from huggingface_hub import hf_hub_download | |
| from ultralytics import YOLO | |
| from models import register_ultralytics_modules | |
| weights = hf_hub_download( | |
| repo_id="zpyuan/SymbolicCapsuleNetwork", | |
| filename="weights/symbolic_capsule_network_segmentation.pt", | |
| ) | |
| register_ultralytics_modules() | |
| model = YOLO(weights) | |
| results = model.predict("image.jpg", imgsz=640, conf=0.25) | |
| results[0].show() | |
| ``` | |
| Command-line: | |
| ```bash | |
| python predict.py path/to/image.jpg | |
| python predict.py path/to/image.jpg --conf 0.3 --imgsz 1280 | |
| ``` | |
| --- | |
| ## Repository Structure | |
| | Path | Description | | |
| |---|---| | |
| | `weights/symbolic_capsule_network_segmentation.pt` | Pretrained segmentation checkpoint | | |
| | `modules/` | Capsule modules: `CapsProj`, `CapsAlign`, `CapsRoute`, `CapsRouteV2-4`, `CapsDecode` | | |
| | `models/custom_yolo.py` | Ultralytics hook that registers capsule layers before model load | | |
| | `configs/seg_model/` | YAML defining the capsule neck and head architecture | | |
| | `predict.py` | Minimal inference entry point | | |
| --- | |