File size: 8,730 Bytes
097b5d5
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
6c428f2
3fb71ab
 
 
6c428f2
097b5d5
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
efcae34
 
 
 
097b5d5
efcae34
097b5d5
efcae34
097b5d5
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
6c428f2
efcae34
097b5d5
 
 
 
 
 
 
 
 
 
efcae34
 
 
 
 
 
 
097b5d5
efcae34
097b5d5
efcae34
097b5d5
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
---
license: apache-2.0
tags:
  - 3d
  - mesh-generation
  - vq-vae
  - codebook
  - topology
  - graph-neural-network
  - research
datasets:
  - allenai/objaverse
library_name: pytorch
pipeline_tag: other
---

# MeshLex Research

<div align="center">

**MeshLex: Learning a Topology-aware Patch Vocabulary for Compositional Mesh Generation**

<a href="https://github.com/Pthahnix/MeshLex-Research"><img alt="GitHub"
  src="https://img.shields.io/badge/GitHub-MeshLex--Research-181717?logo=github&logoColor=white"/></a>
<a href="https://huggingface.co/Pthahnix/MeshLex-Research"><img alt="Hugging Face"
  src="https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-MeshLex--Research-ffc107?color=ffc107&logoColor=white"/></a>
<a href="https://github.com/Pthahnix/MeshLex-Research/blob/main/LICENSE"><img alt="License"
  src="https://img.shields.io/badge/License-Apache_2.0-f5de53?&color=f5de53"/></a>

</div>

<hr>

## Table of Contents

1. [Overview](#overview)
2. [Current Status](#current-status)
3. [Repo Contents](#repo-contents)
4. [Core Hypothesis](#core-hypothesis)
5. [Model Architecture](#model-architecture)
6. [Experimental Results](#experimental-results)
7. [Data](#data)
8. [Quick Start](#quick-start)
9. [Timeline](#timeline)
10. [License](#license)

## Overview

A research project exploring whether 3D triangle meshes possess a finite, reusable "vocabulary" of local topological patterns β€” analogous to how BPE tokens form a vocabulary for natural language.

Instead of generating meshes face-by-face, MeshLex learns a **codebook of ~4096 topology-aware patches** (each covering 20-50 faces) and generates meshes by selecting, deforming, and assembling patches from this codebook. A 4000-face mesh becomes ~130 tokens β€” an order of magnitude more compact than the state-of-the-art (FACE, ICML 2026: ~400 tokens).

| | MeshMosaic | FreeMesh | FACE | **MeshLex** |
|---|---|---|---|---|
| Approach | Divide-and-conquer | BPE on coordinates | One-face-one-token | **Topology patch codebook** |
| Still per-face generation? | Yes | Yes | Yes | **No** |
| Has codebook? | No | Yes (coordinate-level) | No | **Yes (topology-level)** |
| Compression (4K faces) | N/A | ~300 tokens | ~400 tokens | **~130 tokens** |

## Current Status

**Feasibility validation COMPLETE β€” 4/4 experiments STRONG GO. Ready for formal experiment design.**

| # | Experiment | Status | Result |
|---|-----------|--------|--------|
| 1 | A-stage Γ— 5-Category | **Done** | STRONG GO (ratio 1.145x, util 46%) |
| 2 | A-stage Γ— LVIS-Wide | **Done** | **STRONG GO (ratio 1.019x, util 95.3%)** |
| 3 | B-stage Γ— 5-Category | **Done** | STRONG GO (ratio 1.185x, util 47%) |
| 4 | B-stage Γ— LVIS-Wide | **Done** | **STRONG GO (ratio 1.019x, util 94.9%)** |

Key findings:
- **More categories = dramatically better generalization**: LVIS-Wide (1156 cat) ratio 1.019x vs 5-cat 1.145x, util 95% vs 46%
- **Best result (Exp4)**: Same-cat CD 211.6, Cross-cat CD 215.8 β€” near-zero generalization gap
- SimVQ collapse fix successful: utilization 0.46% β†’ 99%+ (217x improvement)
- B-stage multi-token KV decoder effective: reconstruction CD reduced 6.2%

## Repo Contents

This HuggingFace repo stores **checkpoints** and **processed datasets** for reproducibility.

### Checkpoints

| Experiment | Path | Description |
|------------|------|-------------|
| Exp1 A-stage Γ— 5cat | `checkpoints/exp1_A_5cat/` | `checkpoint_final.pt` + `training_history.json` |
| Exp2 A-stage Γ— LVIS-Wide | `checkpoints/exp2_A_lvis_wide/` | `checkpoint_final.pt` + `training_history.json` |
| Exp3 B-stage Γ— 5cat | `checkpoints/exp3_B_5cat/` | `checkpoint_final.pt` + `training_history.json` |
| Exp4 B-stage Γ— LVIS-Wide | `checkpoints/exp4_B_lvis_wide/` | `checkpoint_final.pt` + `training_history.json` |

### Data

| File / Directory | Size | Contents |
|------------------|------|----------|
| `data/meshlex_data.tar.gz` | ~1.2 GB | All processed data in one archive (recommended) |
| `data/patches/` | ~1.1 GB | NPZ patch files (5cat + LVIS-Wide splits) |
| `data/meshes/` | ~931 MB | Preprocessed decimated OBJ files (5,497 meshes) |
| `data/objaverse/` | ~2 MB | Download manifests |

The `tar.gz` archive contains patches, meshes, and manifests β€” download it and extract to skip all preprocessing.

## Core Hypothesis

> Mesh local topology is low-entropy and universal across object categories. A finite codebook of ~4096 topology prototypes, combined with continuous deformation parameters, can reconstruct arbitrary meshes with high fidelity.

## Model Architecture

The full model is a **VQ-VAE** with three modules:

```
Objaverse-LVIS GLB β†’ Decimation (pyfqmr) β†’ Normalize [-1,1]
    β†’ METIS Patch Segmentation (~35 faces/patch)
    β†’ PCA-aligned local coordinates
    β†’ Face features (15-dim: vertices + normal + angles)
    β†’ SAGEConv GNN Encoder β†’ 128-dim embedding
    β†’ SimVQ Codebook (K=4096, learnable reparameterization)
    β†’ Cross-attention MLP Decoder β†’ Reconstructed vertices
```

- **PatchEncoder**: 4-layer SAGEConv GNN + global mean pooling β†’ 128-dim **z**
- **SimVQ Codebook**: Frozen base **C** + learnable linear **W**, effective codebook **CW = W(C)**. All 4096 entries share W's gradient β€” no code is ever forgotten
- **PatchDecoder**: Cross-attention with learnable vertex queries β†’ per-vertex xyz coordinates
- **A-stage**: Single KV token decoder (baseline)
- **B-stage**: 4 KV tokens decoder (improved reconstruction, resumed from A-stage)

## Experimental Results

| Experiment | Scale | Stage | CD Ratio | Util (same) | Util (cross) | Decision |
|------------|-------|-------|----------|-------------|--------------|----------|
| Exp1 | 5 categories | A (1 KV token) | 1.145x | 46.0% | 47.0% | βœ… STRONG GO |
| Exp3 | 5 categories | B (4 KV tokens) | 1.185x | 47.1% | 47.3% | βœ… STRONG GO |
| Exp2 | 1156 categories | A (1 KV token) | **1.019x** | **95.3%** | **83.6%** | βœ… **STRONG GO** |
| **Exp4** | **1156 categories** | **B (4 KV tokens)** | **1.019x** | **94.9%** | **82.8%** | βœ… **STRONG GO** |

**CD Ratio** = Cross-category CD / Same-category CD. Closer to 1.0 = better generalization. Target: < 1.2x.

Scaling from 5 to 1156 categories causes CD ratio to **drop from 1.145x to 1.019x** (near-perfect generalization) and utilization to **surge from 46% to 95%** (nearly full codebook activation).

## Data

Training data sourced from [Objaverse-LVIS](https://huggingface.co/datasets/allenai/objaverse) (Allen AI).

- **5-Category**: chair, table, airplane, car, lamp β€” used for initial validation
- **LVIS-Wide**: 1156 categories from Objaverse-LVIS, 10 objects per category
  - `seen_train`: 188,696 patches (1046 categories)
  - `seen_test`: 45,441 patches (same 1046 categories, held-out objects)
  - `unseen`: 12,655 patches (110 held-out categories, never seen during training)

## Quick Start

```bash
# Clone the code repo
git clone https://github.com/Pthahnix/MeshLex-Research.git
cd MeshLex-Research

# Install dependencies
pip install -r requirements.txt
pip install torch-geometric
pip install pyg_lib torch_scatter torch_sparse torch_cluster torch_spline_conv \
    -f https://data.pyg.org/whl/torch-2.4.0+cu124.html

# Download processed data from this HF repo
pip install huggingface_hub
python -c "
from huggingface_hub import hf_hub_download
hf_hub_download('Pthahnix/MeshLex-Research', 'data/meshlex_data.tar.gz', repo_type='model', local_dir='.')
"
tar xzf data/meshlex_data.tar.gz -C data/

# Download checkpoints
python -c "
from huggingface_hub import snapshot_download
snapshot_download('Pthahnix/MeshLex-Research', allow_patterns='checkpoints/*', repo_type='model', local_dir='.')
"
mv checkpoints data/checkpoints

# Run evaluation on Exp4 (best model)
PYTHONPATH=. python scripts/evaluate.py \
  --checkpoint data/checkpoints/exp4_B_lvis_wide/checkpoint_final.pt \
  --same_cat_dirs data/patches/lvis_wide/seen_test \
  --cross_cat_dirs data/patches/lvis_wide/unseen \
  --output results/eval_results.json

# Run unit tests
python -m pytest tests/ -v
```

## Timeline

- **Day 1 (2026-03-06)**: Project inception, gap analysis, idea generation, experiment design
- **Day 2 (2026-03-07)**: Full codebase implementation (14 tasks), unit tests, initial experiment
- **Day 3 (2026-03-08)**: Diagnosed codebook collapse, fixed SimVQ, Exp1 β€” **STRONG GO**
- **Day 4 (2026-03-09)**: Exp2 + Exp3 completed β€” **STRONG GO**. Key finding: more categories = better generalization
- **Day 5 (2026-03-13)**: Pod reset recovery, expanded LVIS-Wide (1156 cat), retrained Exp2, trained Exp4 β€” all **STRONG GO**
- **Day 6 (2026-03-14)**: Final comparison report + visualizations. Full dataset + checkpoints backed up to HuggingFace

## License

Apache-2.0