| | --- |
| | license: apache-2.0 |
| | tags: |
| | - 3d |
| | - mesh-generation |
| | - vq-vae |
| | - codebook |
| | - topology |
| | - graph-neural-network |
| | - research |
| | datasets: |
| | - allenai/objaverse |
| | library_name: pytorch |
| | pipeline_tag: other |
| | --- |
| | |
| | # MeshLex Research |
| |
|
| | <div align="center"> |
| |
|
| | **MeshLex: Learning a Topology-aware Patch Vocabulary for Compositional Mesh Generation** |
| |
|
| | <a href="https://github.com/Pthahnix/MeshLex-Research"><img alt="GitHub" |
| | src="https://img.shields.io/badge/GitHub-MeshLex--Research-181717?logo=github&logoColor=white"/></a> |
| | <a href="https://huggingface.co/Pthahnix/MeshLex-Research"><img alt="Hugging Face" |
| | src="https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-MeshLex--Research-ffc107?color=ffc107&logoColor=white"/></a> |
| | <a href="https://github.com/Pthahnix/MeshLex-Research/blob/main/LICENSE"><img alt="License" |
| | src="https://img.shields.io/badge/License-Apache_2.0-f5de53?&color=f5de53"/></a> |
| |
|
| | </div> |
| |
|
| | <hr> |
| |
|
| | ## Table of Contents |
| |
|
| | 1. [Overview](#overview) |
| | 2. [Current Status](#current-status) |
| | 3. [Repo Contents](#repo-contents) |
| | 4. [Core Hypothesis](#core-hypothesis) |
| | 5. [Model Architecture](#model-architecture) |
| | 6. [Experimental Results](#experimental-results) |
| | 7. [Data](#data) |
| | 8. [Quick Start](#quick-start) |
| | 9. [Timeline](#timeline) |
| | 10. [License](#license) |
| |
|
| | ## Overview |
| |
|
| | A research project exploring whether 3D triangle meshes possess a finite, reusable "vocabulary" of local topological patterns β analogous to how BPE tokens form a vocabulary for natural language. |
| |
|
| | Instead of generating meshes face-by-face, MeshLex learns a **codebook of ~4096 topology-aware patches** (each covering 20-50 faces) and generates meshes by selecting, deforming, and assembling patches from this codebook. A 4000-face mesh becomes ~130 tokens β an order of magnitude more compact than the state-of-the-art (FACE, ICML 2026: ~400 tokens). |
| |
|
| | | | MeshMosaic | FreeMesh | FACE | **MeshLex** | |
| | |---|---|---|---|---| |
| | | Approach | Divide-and-conquer | BPE on coordinates | One-face-one-token | **Topology patch codebook** | |
| | | Still per-face generation? | Yes | Yes | Yes | **No** | |
| | | Has codebook? | No | Yes (coordinate-level) | No | **Yes (topology-level)** | |
| | | Compression (4K faces) | N/A | ~300 tokens | ~400 tokens | **~130 tokens** | |
| |
|
| | ## Current Status |
| |
|
| | **Feasibility validation COMPLETE β 4/4 experiments STRONG GO. Ready for formal experiment design.** |
| |
|
| | | # | Experiment | Status | Result | |
| | |---|-----------|--------|--------| |
| | | 1 | A-stage Γ 5-Category | **Done** | STRONG GO (ratio 1.145x, util 46%) | |
| | | 2 | A-stage Γ LVIS-Wide | **Done** | **STRONG GO (ratio 1.019x, util 95.3%)** | |
| | | 3 | B-stage Γ 5-Category | **Done** | STRONG GO (ratio 1.185x, util 47%) | |
| | | 4 | B-stage Γ LVIS-Wide | **Done** | **STRONG GO (ratio 1.019x, util 94.9%)** | |
| |
|
| | Key findings: |
| | - **More categories = dramatically better generalization**: LVIS-Wide (1156 cat) ratio 1.019x vs 5-cat 1.145x, util 95% vs 46% |
| | - **Best result (Exp4)**: Same-cat CD 211.6, Cross-cat CD 215.8 β near-zero generalization gap |
| | - SimVQ collapse fix successful: utilization 0.46% β 99%+ (217x improvement) |
| | - B-stage multi-token KV decoder effective: reconstruction CD reduced 6.2% |
| |
|
| | ## Repo Contents |
| |
|
| | This HuggingFace repo stores **checkpoints** and **processed datasets** for reproducibility. |
| |
|
| | ### Checkpoints |
| |
|
| | | Experiment | Path | Description | |
| | |------------|------|-------------| |
| | | Exp1 A-stage Γ 5cat | `checkpoints/exp1_A_5cat/` | `checkpoint_final.pt` + `training_history.json` | |
| | | Exp2 A-stage Γ LVIS-Wide | `checkpoints/exp2_A_lvis_wide/` | `checkpoint_final.pt` + `training_history.json` | |
| | | Exp3 B-stage Γ 5cat | `checkpoints/exp3_B_5cat/` | `checkpoint_final.pt` + `training_history.json` | |
| | | Exp4 B-stage Γ LVIS-Wide | `checkpoints/exp4_B_lvis_wide/` | `checkpoint_final.pt` + `training_history.json` | |
| |
|
| | ### Data |
| |
|
| | | File / Directory | Size | Contents | |
| | |------------------|------|----------| |
| | | `data/meshlex_data.tar.gz` | ~1.2 GB | All processed data in one archive (recommended) | |
| | | `data/patches/` | ~1.1 GB | NPZ patch files (5cat + LVIS-Wide splits) | |
| | | `data/meshes/` | ~931 MB | Preprocessed decimated OBJ files (5,497 meshes) | |
| | | `data/objaverse/` | ~2 MB | Download manifests | |
| |
|
| | The `tar.gz` archive contains patches, meshes, and manifests β download it and extract to skip all preprocessing. |
| |
|
| | ## Core Hypothesis |
| |
|
| | > Mesh local topology is low-entropy and universal across object categories. A finite codebook of ~4096 topology prototypes, combined with continuous deformation parameters, can reconstruct arbitrary meshes with high fidelity. |
| |
|
| | ## Model Architecture |
| |
|
| | The full model is a **VQ-VAE** with three modules: |
| |
|
| | ``` |
| | Objaverse-LVIS GLB β Decimation (pyfqmr) β Normalize [-1,1] |
| | β METIS Patch Segmentation (~35 faces/patch) |
| | β PCA-aligned local coordinates |
| | β Face features (15-dim: vertices + normal + angles) |
| | β SAGEConv GNN Encoder β 128-dim embedding |
| | β SimVQ Codebook (K=4096, learnable reparameterization) |
| | β Cross-attention MLP Decoder β Reconstructed vertices |
| | ``` |
| |
|
| | - **PatchEncoder**: 4-layer SAGEConv GNN + global mean pooling β 128-dim **z** |
| | - **SimVQ Codebook**: Frozen base **C** + learnable linear **W**, effective codebook **CW = W(C)**. All 4096 entries share W's gradient β no code is ever forgotten |
| | - **PatchDecoder**: Cross-attention with learnable vertex queries β per-vertex xyz coordinates |
| | - **A-stage**: Single KV token decoder (baseline) |
| | - **B-stage**: 4 KV tokens decoder (improved reconstruction, resumed from A-stage) |
| |
|
| | ## Experimental Results |
| |
|
| | | Experiment | Scale | Stage | CD Ratio | Util (same) | Util (cross) | Decision | |
| | |------------|-------|-------|----------|-------------|--------------|----------| |
| | | Exp1 | 5 categories | A (1 KV token) | 1.145x | 46.0% | 47.0% | β
STRONG GO | |
| | | Exp3 | 5 categories | B (4 KV tokens) | 1.185x | 47.1% | 47.3% | β
STRONG GO | |
| | | Exp2 | 1156 categories | A (1 KV token) | **1.019x** | **95.3%** | **83.6%** | β
**STRONG GO** | |
| | | **Exp4** | **1156 categories** | **B (4 KV tokens)** | **1.019x** | **94.9%** | **82.8%** | β
**STRONG GO** | |
| |
|
| | **CD Ratio** = Cross-category CD / Same-category CD. Closer to 1.0 = better generalization. Target: < 1.2x. |
| |
|
| | Scaling from 5 to 1156 categories causes CD ratio to **drop from 1.145x to 1.019x** (near-perfect generalization) and utilization to **surge from 46% to 95%** (nearly full codebook activation). |
| |
|
| | ## Data |
| |
|
| | Training data sourced from [Objaverse-LVIS](https://huggingface.co/datasets/allenai/objaverse) (Allen AI). |
| |
|
| | - **5-Category**: chair, table, airplane, car, lamp β used for initial validation |
| | - **LVIS-Wide**: 1156 categories from Objaverse-LVIS, 10 objects per category |
| | - `seen_train`: 188,696 patches (1046 categories) |
| | - `seen_test`: 45,441 patches (same 1046 categories, held-out objects) |
| | - `unseen`: 12,655 patches (110 held-out categories, never seen during training) |
| |
|
| | ## Quick Start |
| |
|
| | ```bash |
| | # Clone the code repo |
| | git clone https://github.com/Pthahnix/MeshLex-Research.git |
| | cd MeshLex-Research |
| | |
| | # Install dependencies |
| | pip install -r requirements.txt |
| | pip install torch-geometric |
| | pip install pyg_lib torch_scatter torch_sparse torch_cluster torch_spline_conv \ |
| | -f https://data.pyg.org/whl/torch-2.4.0+cu124.html |
| | |
| | # Download processed data from this HF repo |
| | pip install huggingface_hub |
| | python -c " |
| | from huggingface_hub import hf_hub_download |
| | hf_hub_download('Pthahnix/MeshLex-Research', 'data/meshlex_data.tar.gz', repo_type='model', local_dir='.') |
| | " |
| | tar xzf data/meshlex_data.tar.gz -C data/ |
| | |
| | # Download checkpoints |
| | python -c " |
| | from huggingface_hub import snapshot_download |
| | snapshot_download('Pthahnix/MeshLex-Research', allow_patterns='checkpoints/*', repo_type='model', local_dir='.') |
| | " |
| | mv checkpoints data/checkpoints |
| | |
| | # Run evaluation on Exp4 (best model) |
| | PYTHONPATH=. python scripts/evaluate.py \ |
| | --checkpoint data/checkpoints/exp4_B_lvis_wide/checkpoint_final.pt \ |
| | --same_cat_dirs data/patches/lvis_wide/seen_test \ |
| | --cross_cat_dirs data/patches/lvis_wide/unseen \ |
| | --output results/eval_results.json |
| | |
| | # Run unit tests |
| | python -m pytest tests/ -v |
| | ``` |
| |
|
| | ## Timeline |
| |
|
| | - **Day 1 (2026-03-06)**: Project inception, gap analysis, idea generation, experiment design |
| | - **Day 2 (2026-03-07)**: Full codebase implementation (14 tasks), unit tests, initial experiment |
| | - **Day 3 (2026-03-08)**: Diagnosed codebook collapse, fixed SimVQ, Exp1 β **STRONG GO** |
| | - **Day 4 (2026-03-09)**: Exp2 + Exp3 completed β **STRONG GO**. Key finding: more categories = better generalization |
| | - **Day 5 (2026-03-13)**: Pod reset recovery, expanded LVIS-Wide (1156 cat), retrained Exp2, trained Exp4 β all **STRONG GO** |
| | - **Day 6 (2026-03-14)**: Final comparison report + visualizations. Full dataset + checkpoints backed up to HuggingFace |
| |
|
| | ## License |
| |
|
| | Apache-2.0 |
| |
|