docs: add full model card with badges, TOC, results, and quick start
Browse files
README.md
CHANGED
|
@@ -1,3 +1,191 @@
|
|
| 1 |
-
---
|
| 2 |
-
license: apache-2.0
|
| 3 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
license: apache-2.0
|
| 3 |
+
tags:
|
| 4 |
+
- 3d
|
| 5 |
+
- mesh-generation
|
| 6 |
+
- vq-vae
|
| 7 |
+
- codebook
|
| 8 |
+
- topology
|
| 9 |
+
- graph-neural-network
|
| 10 |
+
- research
|
| 11 |
+
datasets:
|
| 12 |
+
- allenai/objaverse
|
| 13 |
+
library_name: pytorch
|
| 14 |
+
pipeline_tag: other
|
| 15 |
+
---
|
| 16 |
+
|
| 17 |
+
# MeshLex Research
|
| 18 |
+
|
| 19 |
+
<div align="center">
|
| 20 |
+
|
| 21 |
+
**MeshLex: Learning a Topology-aware Patch Vocabulary for Compositional Mesh Generation**
|
| 22 |
+
|
| 23 |
+
<a href="https://github.com/Pthahnix/Meshlex-Research"><img alt="GitHub"
|
| 24 |
+
src="https://img.shields.io/badge/GitHub-Meshlex--Research-181717?logo=github&logoColor=white"/></a>
|
| 25 |
+
<a href="https://github.com/Pthahnix/Meshlex-Research/blob/main/LICENSE"><img alt="License"
|
| 26 |
+
src="https://img.shields.io/badge/License-Apache_2.0-f5de53?&color=f5de53"/></a>
|
| 27 |
+
|
| 28 |
+
</div>
|
| 29 |
+
|
| 30 |
+
<hr>
|
| 31 |
+
|
| 32 |
+
## Table of Contents
|
| 33 |
+
|
| 34 |
+
1. [Overview](#overview)
|
| 35 |
+
2. [Current Status](#current-status)
|
| 36 |
+
3. [Repo Contents](#repo-contents)
|
| 37 |
+
4. [Core Hypothesis](#core-hypothesis)
|
| 38 |
+
5. [Model Architecture](#model-architecture)
|
| 39 |
+
6. [Experimental Results](#experimental-results)
|
| 40 |
+
7. [Data](#data)
|
| 41 |
+
8. [Quick Start](#quick-start)
|
| 42 |
+
9. [Timeline](#timeline)
|
| 43 |
+
10. [License](#license)
|
| 44 |
+
|
| 45 |
+
## Overview
|
| 46 |
+
|
| 47 |
+
A research project exploring whether 3D triangle meshes possess a finite, reusable "vocabulary" of local topological patterns β analogous to how BPE tokens form a vocabulary for natural language.
|
| 48 |
+
|
| 49 |
+
Instead of generating meshes face-by-face, MeshLex learns a **codebook of ~4096 topology-aware patches** (each covering 20-50 faces) and generates meshes by selecting, deforming, and assembling patches from this codebook. A 4000-face mesh becomes ~130 tokens β an order of magnitude more compact than the state-of-the-art (FACE, ICML 2026: ~400 tokens).
|
| 50 |
+
|
| 51 |
+
| | MeshMosaic | FreeMesh | FACE | **MeshLex** |
|
| 52 |
+
|---|---|---|---|---|
|
| 53 |
+
| Approach | Divide-and-conquer | BPE on coordinates | One-face-one-token | **Topology patch codebook** |
|
| 54 |
+
| Still per-face generation? | Yes | Yes | Yes | **No** |
|
| 55 |
+
| Has codebook? | No | Yes (coordinate-level) | No | **Yes (topology-level)** |
|
| 56 |
+
| Compression (4K faces) | N/A | ~300 tokens | ~400 tokens | **~130 tokens** |
|
| 57 |
+
|
| 58 |
+
## Current Status
|
| 59 |
+
|
| 60 |
+
**Feasibility validation COMPLETE β 4/4 experiments STRONG GO. Ready for formal experiment design.**
|
| 61 |
+
|
| 62 |
+
| # | Experiment | Status | Result |
|
| 63 |
+
|---|-----------|--------|--------|
|
| 64 |
+
| 1 | A-stage Γ 5-Category | **Done** | STRONG GO (ratio 1.145x, util 46%) |
|
| 65 |
+
| 2 | A-stage Γ LVIS-Wide | **Done** | **STRONG GO (ratio 1.019x, util 95.3%)** |
|
| 66 |
+
| 3 | B-stage Γ 5-Category | **Done** | STRONG GO (ratio 1.185x, util 47%) |
|
| 67 |
+
| 4 | B-stage Γ LVIS-Wide | **Done** | **STRONG GO (ratio 1.019x, util 94.9%)** |
|
| 68 |
+
|
| 69 |
+
Key findings:
|
| 70 |
+
- **More categories = dramatically better generalization**: LVIS-Wide (1156 cat) ratio 1.019x vs 5-cat 1.145x, util 95% vs 46%
|
| 71 |
+
- **Best result (Exp4)**: Same-cat CD 211.6, Cross-cat CD 215.8 β near-zero generalization gap
|
| 72 |
+
- SimVQ collapse fix successful: utilization 0.46% β 99%+ (217x improvement)
|
| 73 |
+
- B-stage multi-token KV decoder effective: reconstruction CD reduced 6.2%
|
| 74 |
+
|
| 75 |
+
## Repo Contents
|
| 76 |
+
|
| 77 |
+
This HuggingFace repo stores **checkpoints** and **processed datasets** for reproducibility.
|
| 78 |
+
|
| 79 |
+
### Checkpoints
|
| 80 |
+
|
| 81 |
+
| Experiment | Path | Description |
|
| 82 |
+
|------------|------|-------------|
|
| 83 |
+
| Exp1 A-stage Γ 5cat | `checkpoints/exp1_A_5cat/` | `checkpoint_final.pt` + `training_history.json` |
|
| 84 |
+
| Exp2 A-stage Γ LVIS-Wide | `checkpoints/exp2_A_lvis_wide/` | `checkpoint_final.pt` + `training_history.json` |
|
| 85 |
+
| Exp3 B-stage Γ 5cat | `checkpoints/exp3_B_5cat/` | `checkpoint_final.pt` + `training_history.json` |
|
| 86 |
+
| Exp4 B-stage Γ LVIS-Wide | `checkpoints/exp4_B_lvis_wide/` | `checkpoint_final.pt` + `training_history.json` |
|
| 87 |
+
|
| 88 |
+
### Data
|
| 89 |
+
|
| 90 |
+
| Directory | Size | Contents |
|
| 91 |
+
|-----------|------|----------|
|
| 92 |
+
| `data/patches/5cat/` | ~82 MB | 5-category NPZ patch files (train/test splits) |
|
| 93 |
+
| `data/patches/lvis_wide/` | ~1 GB | LVIS-Wide NPZ patches (188K train / 45K test / 12K unseen) |
|
| 94 |
+
| `data/meshes/` | ~931 MB | Preprocessed decimated OBJ files (5,497 meshes) |
|
| 95 |
+
| `data/objaverse/` | ~2 MB | Download manifests (can recreate download pipeline) |
|
| 96 |
+
|
| 97 |
+
The processed data can be downloaded directly β no need to re-download from Objaverse and re-run preprocessing.
|
| 98 |
+
|
| 99 |
+
## Core Hypothesis
|
| 100 |
+
|
| 101 |
+
> Mesh local topology is low-entropy and universal across object categories. A finite codebook of ~4096 topology prototypes, combined with continuous deformation parameters, can reconstruct arbitrary meshes with high fidelity.
|
| 102 |
+
|
| 103 |
+
## Model Architecture
|
| 104 |
+
|
| 105 |
+
The full model is a **VQ-VAE** with three modules:
|
| 106 |
+
|
| 107 |
+
```
|
| 108 |
+
Objaverse-LVIS GLB β Decimation (pyfqmr) β Normalize [-1,1]
|
| 109 |
+
β METIS Patch Segmentation (~35 faces/patch)
|
| 110 |
+
β PCA-aligned local coordinates
|
| 111 |
+
β Face features (15-dim: vertices + normal + angles)
|
| 112 |
+
β SAGEConv GNN Encoder β 128-dim embedding
|
| 113 |
+
β SimVQ Codebook (K=4096, learnable reparameterization)
|
| 114 |
+
β Cross-attention MLP Decoder β Reconstructed vertices
|
| 115 |
+
```
|
| 116 |
+
|
| 117 |
+
- **PatchEncoder**: 4-layer SAGEConv GNN + global mean pooling β 128-dim **z**
|
| 118 |
+
- **SimVQ Codebook**: Frozen base **C** + learnable linear **W**, effective codebook **CW = W(C)**. All 4096 entries share W's gradient β no code is ever forgotten
|
| 119 |
+
- **PatchDecoder**: Cross-attention with learnable vertex queries β per-vertex xyz coordinates
|
| 120 |
+
- **A-stage**: Single KV token decoder (baseline)
|
| 121 |
+
- **B-stage**: 4 KV tokens decoder (improved reconstruction, resumed from A-stage)
|
| 122 |
+
|
| 123 |
+
## Experimental Results
|
| 124 |
+
|
| 125 |
+
| Experiment | Scale | Stage | CD Ratio | Util (same) | Util (cross) | Decision |
|
| 126 |
+
|------------|-------|-------|----------|-------------|--------------|----------|
|
| 127 |
+
| Exp1 | 5 categories | A (1 KV token) | 1.145x | 46.0% | 47.0% | β
STRONG GO |
|
| 128 |
+
| Exp3 | 5 categories | B (4 KV tokens) | 1.185x | 47.1% | 47.3% | β
STRONG GO |
|
| 129 |
+
| Exp2 | 1156 categories | A (1 KV token) | **1.019x** | **95.3%** | **83.6%** | β
**STRONG GO** |
|
| 130 |
+
| **Exp4** | **1156 categories** | **B (4 KV tokens)** | **1.019x** | **94.9%** | **82.8%** | β
**STRONG GO** |
|
| 131 |
+
|
| 132 |
+
**CD Ratio** = Cross-category CD / Same-category CD. Closer to 1.0 = better generalization. Target: < 1.2x.
|
| 133 |
+
|
| 134 |
+
Scaling from 5 to 1156 categories causes CD ratio to **drop from 1.145x to 1.019x** (near-perfect generalization) and utilization to **surge from 46% to 95%** (nearly full codebook activation).
|
| 135 |
+
|
| 136 |
+
## Data
|
| 137 |
+
|
| 138 |
+
Training data sourced from [Objaverse-LVIS](https://huggingface.co/datasets/allenai/objaverse) (Allen AI).
|
| 139 |
+
|
| 140 |
+
- **5-Category**: chair, table, airplane, car, lamp β used for initial validation
|
| 141 |
+
- **LVIS-Wide**: 1156 categories from Objaverse-LVIS, 10 objects per category
|
| 142 |
+
- `seen_train`: 188,696 patches (1046 categories)
|
| 143 |
+
- `seen_test`: 45,441 patches (same 1046 categories, held-out objects)
|
| 144 |
+
- `unseen`: 12,655 patches (110 held-out categories, never seen during training)
|
| 145 |
+
|
| 146 |
+
## Quick Start
|
| 147 |
+
|
| 148 |
+
```bash
|
| 149 |
+
# Clone the code repo
|
| 150 |
+
git clone https://github.com/Pthahnix/Meshlex-Research.git
|
| 151 |
+
cd Meshlex-Research
|
| 152 |
+
|
| 153 |
+
# Install dependencies
|
| 154 |
+
pip install -r requirements.txt
|
| 155 |
+
pip install torch-geometric
|
| 156 |
+
pip install pyg_lib torch_scatter torch_sparse torch_cluster torch_spline_conv \
|
| 157 |
+
-f https://data.pyg.org/whl/torch-2.4.0+cu124.html
|
| 158 |
+
|
| 159 |
+
# Download processed data from this HF repo
|
| 160 |
+
pip install huggingface_hub
|
| 161 |
+
python -c "
|
| 162 |
+
from huggingface_hub import snapshot_download
|
| 163 |
+
snapshot_download('Pthahnix/Meshlex-Research', local_dir='hf_download', repo_type='model')
|
| 164 |
+
"
|
| 165 |
+
# Move data and checkpoints into place
|
| 166 |
+
cp -r hf_download/data/ data/
|
| 167 |
+
cp -r hf_download/checkpoints/ data/checkpoints/
|
| 168 |
+
|
| 169 |
+
# Run evaluation on Exp4 (best model)
|
| 170 |
+
PYTHONPATH=. python scripts/evaluate.py \
|
| 171 |
+
--checkpoint data/checkpoints/exp4_B_lvis_wide/checkpoint_final.pt \
|
| 172 |
+
--same_cat_dirs data/patches/lvis_wide/seen_test \
|
| 173 |
+
--cross_cat_dirs data/patches/lvis_wide/unseen \
|
| 174 |
+
--output results/eval_results.json
|
| 175 |
+
|
| 176 |
+
# Run unit tests
|
| 177 |
+
python -m pytest tests/ -v
|
| 178 |
+
```
|
| 179 |
+
|
| 180 |
+
## Timeline
|
| 181 |
+
|
| 182 |
+
- **Day 1 (2026-03-06)**: Project inception, gap analysis, idea generation, experiment design
|
| 183 |
+
- **Day 2 (2026-03-07)**: Full codebase implementation (14 tasks), unit tests, initial experiment
|
| 184 |
+
- **Day 3 (2026-03-08)**: Diagnosed codebook collapse, fixed SimVQ, Exp1 β **STRONG GO**
|
| 185 |
+
- **Day 4 (2026-03-09)**: Exp2 + Exp3 completed β **STRONG GO**. Key finding: more categories = better generalization
|
| 186 |
+
- **Day 5 (2026-03-13)**: Pod reset recovery, expanded LVIS-Wide (1156 cat), retrained Exp2, trained Exp4 β all **STRONG GO**
|
| 187 |
+
- **Day 6 (2026-03-14)**: Final comparison report + visualizations. Full dataset + checkpoints backed up to HuggingFace
|
| 188 |
+
|
| 189 |
+
## License
|
| 190 |
+
|
| 191 |
+
Apache-2.0
|