APEIRIA: Distilling Neuro-Symbolic Programs into 3D Multi-modal LLMs

APEIRIA (ἄπειρον, unlimited in Greek) is a neuro-symbolic 3D multi-modal LLM framework designed to bridge the gap between interpretable neuro-symbolic reasoning and the flexibility of end-to-end 3D MLLMs. It distills symbolic reasoning patterns into MLLMs using a natural language chain-of-thought (CoT) approach.

The model was presented in the paper Distilling Neuro-Symbolic Programs into 3D Multi-modal LLMs, which was accepted to ICML 2026. This repository contains the released APEIRIA model checkpoint as a LoRA adapter based on the Qwen3-VL-8B-Instruct backbone.

Paper: Distilling Neuro-Symbolic Programs into 3D Multi-modal LLMs
Project Page: https://matthewdm0816.github.io/Apeiria_Open
Repository: https://github.com/oceanflowlab/APEIRIA

Model Description

APEIRIA introduces a three-stage curriculum to progressively build reasoning capabilities:

3D Perception Alignment: Grounds object visual-geometric features to the LLM.
CoT-SFT: Teaches query decomposition and stepwise verification from symbolic program traces.
CoT-RL: Extends reasoning patterns to open-set concepts and deeply nested instructions.

By transferring reasoning patterns rather than concept-specific knowledge, APEIRIA preserves transparent reasoning and modular interchangeability of planning and perception components.

Usage

Please refer to the official GitHub repository for environment setup, data preparation, and detailed instructions on running inference or training.

Citation

If you find this work useful, please consider citing:

@inproceedings{mo2026,
  title={Distilling Neuro-Symbolic Programs into 3D Multi-modal LLMs},
  author={Mo, Wentao and Liu, Yang},
  booktitle={International Conference on Machine Learning},
  year={2026}
}

Acknowledgements

This code builds upon previous 3D MLLMs and foundation models, including Chat-Scene, SegDINO3D, and Mask3D. It utilizes the SGLang library for fast multi-modal generation.

Downloads last month: -

Inference Providers NEW

Image-Text-to-Text

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for kmichiru/OpenApeiria

Base model

Qwen/Qwen3-VL-8B-Instruct

Adapter

(119)

this model

Paper for kmichiru/OpenApeiria

Distilling Neuro-Symbolic Programs into 3D Multi-modal LLMs

Paper • 2606.01215 • Published 10 days ago