File size: 8,730 Bytes
097b5d5 6c428f2 3fb71ab 6c428f2 097b5d5 efcae34 097b5d5 efcae34 097b5d5 efcae34 097b5d5 6c428f2 efcae34 097b5d5 efcae34 097b5d5 efcae34 097b5d5 efcae34 097b5d5 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 | ---
license: apache-2.0
tags:
- 3d
- mesh-generation
- vq-vae
- codebook
- topology
- graph-neural-network
- research
datasets:
- allenai/objaverse
library_name: pytorch
pipeline_tag: other
---
# MeshLex Research
<div align="center">
**MeshLex: Learning a Topology-aware Patch Vocabulary for Compositional Mesh Generation**
<a href="https://github.com/Pthahnix/MeshLex-Research"><img alt="GitHub"
src="https://img.shields.io/badge/GitHub-MeshLex--Research-181717?logo=github&logoColor=white"/></a>
<a href="https://huggingface.co/Pthahnix/MeshLex-Research"><img alt="Hugging Face"
src="https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-MeshLex--Research-ffc107?color=ffc107&logoColor=white"/></a>
<a href="https://github.com/Pthahnix/MeshLex-Research/blob/main/LICENSE"><img alt="License"
src="https://img.shields.io/badge/License-Apache_2.0-f5de53?&color=f5de53"/></a>
</div>
<hr>
## Table of Contents
1. [Overview](#overview)
2. [Current Status](#current-status)
3. [Repo Contents](#repo-contents)
4. [Core Hypothesis](#core-hypothesis)
5. [Model Architecture](#model-architecture)
6. [Experimental Results](#experimental-results)
7. [Data](#data)
8. [Quick Start](#quick-start)
9. [Timeline](#timeline)
10. [License](#license)
## Overview
A research project exploring whether 3D triangle meshes possess a finite, reusable "vocabulary" of local topological patterns β analogous to how BPE tokens form a vocabulary for natural language.
Instead of generating meshes face-by-face, MeshLex learns a **codebook of ~4096 topology-aware patches** (each covering 20-50 faces) and generates meshes by selecting, deforming, and assembling patches from this codebook. A 4000-face mesh becomes ~130 tokens β an order of magnitude more compact than the state-of-the-art (FACE, ICML 2026: ~400 tokens).
| | MeshMosaic | FreeMesh | FACE | **MeshLex** |
|---|---|---|---|---|
| Approach | Divide-and-conquer | BPE on coordinates | One-face-one-token | **Topology patch codebook** |
| Still per-face generation? | Yes | Yes | Yes | **No** |
| Has codebook? | No | Yes (coordinate-level) | No | **Yes (topology-level)** |
| Compression (4K faces) | N/A | ~300 tokens | ~400 tokens | **~130 tokens** |
## Current Status
**Feasibility validation COMPLETE β 4/4 experiments STRONG GO. Ready for formal experiment design.**
| # | Experiment | Status | Result |
|---|-----------|--------|--------|
| 1 | A-stage Γ 5-Category | **Done** | STRONG GO (ratio 1.145x, util 46%) |
| 2 | A-stage Γ LVIS-Wide | **Done** | **STRONG GO (ratio 1.019x, util 95.3%)** |
| 3 | B-stage Γ 5-Category | **Done** | STRONG GO (ratio 1.185x, util 47%) |
| 4 | B-stage Γ LVIS-Wide | **Done** | **STRONG GO (ratio 1.019x, util 94.9%)** |
Key findings:
- **More categories = dramatically better generalization**: LVIS-Wide (1156 cat) ratio 1.019x vs 5-cat 1.145x, util 95% vs 46%
- **Best result (Exp4)**: Same-cat CD 211.6, Cross-cat CD 215.8 β near-zero generalization gap
- SimVQ collapse fix successful: utilization 0.46% β 99%+ (217x improvement)
- B-stage multi-token KV decoder effective: reconstruction CD reduced 6.2%
## Repo Contents
This HuggingFace repo stores **checkpoints** and **processed datasets** for reproducibility.
### Checkpoints
| Experiment | Path | Description |
|------------|------|-------------|
| Exp1 A-stage Γ 5cat | `checkpoints/exp1_A_5cat/` | `checkpoint_final.pt` + `training_history.json` |
| Exp2 A-stage Γ LVIS-Wide | `checkpoints/exp2_A_lvis_wide/` | `checkpoint_final.pt` + `training_history.json` |
| Exp3 B-stage Γ 5cat | `checkpoints/exp3_B_5cat/` | `checkpoint_final.pt` + `training_history.json` |
| Exp4 B-stage Γ LVIS-Wide | `checkpoints/exp4_B_lvis_wide/` | `checkpoint_final.pt` + `training_history.json` |
### Data
| File / Directory | Size | Contents |
|------------------|------|----------|
| `data/meshlex_data.tar.gz` | ~1.2 GB | All processed data in one archive (recommended) |
| `data/patches/` | ~1.1 GB | NPZ patch files (5cat + LVIS-Wide splits) |
| `data/meshes/` | ~931 MB | Preprocessed decimated OBJ files (5,497 meshes) |
| `data/objaverse/` | ~2 MB | Download manifests |
The `tar.gz` archive contains patches, meshes, and manifests β download it and extract to skip all preprocessing.
## Core Hypothesis
> Mesh local topology is low-entropy and universal across object categories. A finite codebook of ~4096 topology prototypes, combined with continuous deformation parameters, can reconstruct arbitrary meshes with high fidelity.
## Model Architecture
The full model is a **VQ-VAE** with three modules:
```
Objaverse-LVIS GLB β Decimation (pyfqmr) β Normalize [-1,1]
β METIS Patch Segmentation (~35 faces/patch)
β PCA-aligned local coordinates
β Face features (15-dim: vertices + normal + angles)
β SAGEConv GNN Encoder β 128-dim embedding
β SimVQ Codebook (K=4096, learnable reparameterization)
β Cross-attention MLP Decoder β Reconstructed vertices
```
- **PatchEncoder**: 4-layer SAGEConv GNN + global mean pooling β 128-dim **z**
- **SimVQ Codebook**: Frozen base **C** + learnable linear **W**, effective codebook **CW = W(C)**. All 4096 entries share W's gradient β no code is ever forgotten
- **PatchDecoder**: Cross-attention with learnable vertex queries β per-vertex xyz coordinates
- **A-stage**: Single KV token decoder (baseline)
- **B-stage**: 4 KV tokens decoder (improved reconstruction, resumed from A-stage)
## Experimental Results
| Experiment | Scale | Stage | CD Ratio | Util (same) | Util (cross) | Decision |
|------------|-------|-------|----------|-------------|--------------|----------|
| Exp1 | 5 categories | A (1 KV token) | 1.145x | 46.0% | 47.0% | β
STRONG GO |
| Exp3 | 5 categories | B (4 KV tokens) | 1.185x | 47.1% | 47.3% | β
STRONG GO |
| Exp2 | 1156 categories | A (1 KV token) | **1.019x** | **95.3%** | **83.6%** | β
**STRONG GO** |
| **Exp4** | **1156 categories** | **B (4 KV tokens)** | **1.019x** | **94.9%** | **82.8%** | β
**STRONG GO** |
**CD Ratio** = Cross-category CD / Same-category CD. Closer to 1.0 = better generalization. Target: < 1.2x.
Scaling from 5 to 1156 categories causes CD ratio to **drop from 1.145x to 1.019x** (near-perfect generalization) and utilization to **surge from 46% to 95%** (nearly full codebook activation).
## Data
Training data sourced from [Objaverse-LVIS](https://huggingface.co/datasets/allenai/objaverse) (Allen AI).
- **5-Category**: chair, table, airplane, car, lamp β used for initial validation
- **LVIS-Wide**: 1156 categories from Objaverse-LVIS, 10 objects per category
- `seen_train`: 188,696 patches (1046 categories)
- `seen_test`: 45,441 patches (same 1046 categories, held-out objects)
- `unseen`: 12,655 patches (110 held-out categories, never seen during training)
## Quick Start
```bash
# Clone the code repo
git clone https://github.com/Pthahnix/MeshLex-Research.git
cd MeshLex-Research
# Install dependencies
pip install -r requirements.txt
pip install torch-geometric
pip install pyg_lib torch_scatter torch_sparse torch_cluster torch_spline_conv \
-f https://data.pyg.org/whl/torch-2.4.0+cu124.html
# Download processed data from this HF repo
pip install huggingface_hub
python -c "
from huggingface_hub import hf_hub_download
hf_hub_download('Pthahnix/MeshLex-Research', 'data/meshlex_data.tar.gz', repo_type='model', local_dir='.')
"
tar xzf data/meshlex_data.tar.gz -C data/
# Download checkpoints
python -c "
from huggingface_hub import snapshot_download
snapshot_download('Pthahnix/MeshLex-Research', allow_patterns='checkpoints/*', repo_type='model', local_dir='.')
"
mv checkpoints data/checkpoints
# Run evaluation on Exp4 (best model)
PYTHONPATH=. python scripts/evaluate.py \
--checkpoint data/checkpoints/exp4_B_lvis_wide/checkpoint_final.pt \
--same_cat_dirs data/patches/lvis_wide/seen_test \
--cross_cat_dirs data/patches/lvis_wide/unseen \
--output results/eval_results.json
# Run unit tests
python -m pytest tests/ -v
```
## Timeline
- **Day 1 (2026-03-06)**: Project inception, gap analysis, idea generation, experiment design
- **Day 2 (2026-03-07)**: Full codebase implementation (14 tasks), unit tests, initial experiment
- **Day 3 (2026-03-08)**: Diagnosed codebook collapse, fixed SimVQ, Exp1 β **STRONG GO**
- **Day 4 (2026-03-09)**: Exp2 + Exp3 completed β **STRONG GO**. Key finding: more categories = better generalization
- **Day 5 (2026-03-13)**: Pod reset recovery, expanded LVIS-Wide (1156 cat), retrained Exp2, trained Exp4 β all **STRONG GO**
- **Day 6 (2026-03-14)**: Final comparison report + visualizations. Full dataset + checkpoints backed up to HuggingFace
## License
Apache-2.0
|