Pthahnix commited on
Commit
097b5d5
Β·
verified Β·
1 Parent(s): fcbdc6e

docs: add full model card with badges, TOC, results, and quick start

Browse files
Files changed (1) hide show
  1. README.md +191 -3
README.md CHANGED
@@ -1,3 +1,191 @@
1
- ---
2
- license: apache-2.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ tags:
4
+ - 3d
5
+ - mesh-generation
6
+ - vq-vae
7
+ - codebook
8
+ - topology
9
+ - graph-neural-network
10
+ - research
11
+ datasets:
12
+ - allenai/objaverse
13
+ library_name: pytorch
14
+ pipeline_tag: other
15
+ ---
16
+
17
+ # MeshLex Research
18
+
19
+ <div align="center">
20
+
21
+ **MeshLex: Learning a Topology-aware Patch Vocabulary for Compositional Mesh Generation**
22
+
23
+ <a href="https://github.com/Pthahnix/Meshlex-Research"><img alt="GitHub"
24
+ src="https://img.shields.io/badge/GitHub-Meshlex--Research-181717?logo=github&logoColor=white"/></a>
25
+ <a href="https://github.com/Pthahnix/Meshlex-Research/blob/main/LICENSE"><img alt="License"
26
+ src="https://img.shields.io/badge/License-Apache_2.0-f5de53?&color=f5de53"/></a>
27
+
28
+ </div>
29
+
30
+ <hr>
31
+
32
+ ## Table of Contents
33
+
34
+ 1. [Overview](#overview)
35
+ 2. [Current Status](#current-status)
36
+ 3. [Repo Contents](#repo-contents)
37
+ 4. [Core Hypothesis](#core-hypothesis)
38
+ 5. [Model Architecture](#model-architecture)
39
+ 6. [Experimental Results](#experimental-results)
40
+ 7. [Data](#data)
41
+ 8. [Quick Start](#quick-start)
42
+ 9. [Timeline](#timeline)
43
+ 10. [License](#license)
44
+
45
+ ## Overview
46
+
47
+ A research project exploring whether 3D triangle meshes possess a finite, reusable "vocabulary" of local topological patterns β€” analogous to how BPE tokens form a vocabulary for natural language.
48
+
49
+ Instead of generating meshes face-by-face, MeshLex learns a **codebook of ~4096 topology-aware patches** (each covering 20-50 faces) and generates meshes by selecting, deforming, and assembling patches from this codebook. A 4000-face mesh becomes ~130 tokens β€” an order of magnitude more compact than the state-of-the-art (FACE, ICML 2026: ~400 tokens).
50
+
51
+ | | MeshMosaic | FreeMesh | FACE | **MeshLex** |
52
+ |---|---|---|---|---|
53
+ | Approach | Divide-and-conquer | BPE on coordinates | One-face-one-token | **Topology patch codebook** |
54
+ | Still per-face generation? | Yes | Yes | Yes | **No** |
55
+ | Has codebook? | No | Yes (coordinate-level) | No | **Yes (topology-level)** |
56
+ | Compression (4K faces) | N/A | ~300 tokens | ~400 tokens | **~130 tokens** |
57
+
58
+ ## Current Status
59
+
60
+ **Feasibility validation COMPLETE β€” 4/4 experiments STRONG GO. Ready for formal experiment design.**
61
+
62
+ | # | Experiment | Status | Result |
63
+ |---|-----------|--------|--------|
64
+ | 1 | A-stage Γ— 5-Category | **Done** | STRONG GO (ratio 1.145x, util 46%) |
65
+ | 2 | A-stage Γ— LVIS-Wide | **Done** | **STRONG GO (ratio 1.019x, util 95.3%)** |
66
+ | 3 | B-stage Γ— 5-Category | **Done** | STRONG GO (ratio 1.185x, util 47%) |
67
+ | 4 | B-stage Γ— LVIS-Wide | **Done** | **STRONG GO (ratio 1.019x, util 94.9%)** |
68
+
69
+ Key findings:
70
+ - **More categories = dramatically better generalization**: LVIS-Wide (1156 cat) ratio 1.019x vs 5-cat 1.145x, util 95% vs 46%
71
+ - **Best result (Exp4)**: Same-cat CD 211.6, Cross-cat CD 215.8 β€” near-zero generalization gap
72
+ - SimVQ collapse fix successful: utilization 0.46% β†’ 99%+ (217x improvement)
73
+ - B-stage multi-token KV decoder effective: reconstruction CD reduced 6.2%
74
+
75
+ ## Repo Contents
76
+
77
+ This HuggingFace repo stores **checkpoints** and **processed datasets** for reproducibility.
78
+
79
+ ### Checkpoints
80
+
81
+ | Experiment | Path | Description |
82
+ |------------|------|-------------|
83
+ | Exp1 A-stage Γ— 5cat | `checkpoints/exp1_A_5cat/` | `checkpoint_final.pt` + `training_history.json` |
84
+ | Exp2 A-stage Γ— LVIS-Wide | `checkpoints/exp2_A_lvis_wide/` | `checkpoint_final.pt` + `training_history.json` |
85
+ | Exp3 B-stage Γ— 5cat | `checkpoints/exp3_B_5cat/` | `checkpoint_final.pt` + `training_history.json` |
86
+ | Exp4 B-stage Γ— LVIS-Wide | `checkpoints/exp4_B_lvis_wide/` | `checkpoint_final.pt` + `training_history.json` |
87
+
88
+ ### Data
89
+
90
+ | Directory | Size | Contents |
91
+ |-----------|------|----------|
92
+ | `data/patches/5cat/` | ~82 MB | 5-category NPZ patch files (train/test splits) |
93
+ | `data/patches/lvis_wide/` | ~1 GB | LVIS-Wide NPZ patches (188K train / 45K test / 12K unseen) |
94
+ | `data/meshes/` | ~931 MB | Preprocessed decimated OBJ files (5,497 meshes) |
95
+ | `data/objaverse/` | ~2 MB | Download manifests (can recreate download pipeline) |
96
+
97
+ The processed data can be downloaded directly β€” no need to re-download from Objaverse and re-run preprocessing.
98
+
99
+ ## Core Hypothesis
100
+
101
+ > Mesh local topology is low-entropy and universal across object categories. A finite codebook of ~4096 topology prototypes, combined with continuous deformation parameters, can reconstruct arbitrary meshes with high fidelity.
102
+
103
+ ## Model Architecture
104
+
105
+ The full model is a **VQ-VAE** with three modules:
106
+
107
+ ```
108
+ Objaverse-LVIS GLB β†’ Decimation (pyfqmr) β†’ Normalize [-1,1]
109
+ β†’ METIS Patch Segmentation (~35 faces/patch)
110
+ β†’ PCA-aligned local coordinates
111
+ β†’ Face features (15-dim: vertices + normal + angles)
112
+ β†’ SAGEConv GNN Encoder β†’ 128-dim embedding
113
+ β†’ SimVQ Codebook (K=4096, learnable reparameterization)
114
+ β†’ Cross-attention MLP Decoder β†’ Reconstructed vertices
115
+ ```
116
+
117
+ - **PatchEncoder**: 4-layer SAGEConv GNN + global mean pooling β†’ 128-dim **z**
118
+ - **SimVQ Codebook**: Frozen base **C** + learnable linear **W**, effective codebook **CW = W(C)**. All 4096 entries share W's gradient β€” no code is ever forgotten
119
+ - **PatchDecoder**: Cross-attention with learnable vertex queries β†’ per-vertex xyz coordinates
120
+ - **A-stage**: Single KV token decoder (baseline)
121
+ - **B-stage**: 4 KV tokens decoder (improved reconstruction, resumed from A-stage)
122
+
123
+ ## Experimental Results
124
+
125
+ | Experiment | Scale | Stage | CD Ratio | Util (same) | Util (cross) | Decision |
126
+ |------------|-------|-------|----------|-------------|--------------|----------|
127
+ | Exp1 | 5 categories | A (1 KV token) | 1.145x | 46.0% | 47.0% | βœ… STRONG GO |
128
+ | Exp3 | 5 categories | B (4 KV tokens) | 1.185x | 47.1% | 47.3% | βœ… STRONG GO |
129
+ | Exp2 | 1156 categories | A (1 KV token) | **1.019x** | **95.3%** | **83.6%** | βœ… **STRONG GO** |
130
+ | **Exp4** | **1156 categories** | **B (4 KV tokens)** | **1.019x** | **94.9%** | **82.8%** | βœ… **STRONG GO** |
131
+
132
+ **CD Ratio** = Cross-category CD / Same-category CD. Closer to 1.0 = better generalization. Target: < 1.2x.
133
+
134
+ Scaling from 5 to 1156 categories causes CD ratio to **drop from 1.145x to 1.019x** (near-perfect generalization) and utilization to **surge from 46% to 95%** (nearly full codebook activation).
135
+
136
+ ## Data
137
+
138
+ Training data sourced from [Objaverse-LVIS](https://huggingface.co/datasets/allenai/objaverse) (Allen AI).
139
+
140
+ - **5-Category**: chair, table, airplane, car, lamp β€” used for initial validation
141
+ - **LVIS-Wide**: 1156 categories from Objaverse-LVIS, 10 objects per category
142
+ - `seen_train`: 188,696 patches (1046 categories)
143
+ - `seen_test`: 45,441 patches (same 1046 categories, held-out objects)
144
+ - `unseen`: 12,655 patches (110 held-out categories, never seen during training)
145
+
146
+ ## Quick Start
147
+
148
+ ```bash
149
+ # Clone the code repo
150
+ git clone https://github.com/Pthahnix/Meshlex-Research.git
151
+ cd Meshlex-Research
152
+
153
+ # Install dependencies
154
+ pip install -r requirements.txt
155
+ pip install torch-geometric
156
+ pip install pyg_lib torch_scatter torch_sparse torch_cluster torch_spline_conv \
157
+ -f https://data.pyg.org/whl/torch-2.4.0+cu124.html
158
+
159
+ # Download processed data from this HF repo
160
+ pip install huggingface_hub
161
+ python -c "
162
+ from huggingface_hub import snapshot_download
163
+ snapshot_download('Pthahnix/Meshlex-Research', local_dir='hf_download', repo_type='model')
164
+ "
165
+ # Move data and checkpoints into place
166
+ cp -r hf_download/data/ data/
167
+ cp -r hf_download/checkpoints/ data/checkpoints/
168
+
169
+ # Run evaluation on Exp4 (best model)
170
+ PYTHONPATH=. python scripts/evaluate.py \
171
+ --checkpoint data/checkpoints/exp4_B_lvis_wide/checkpoint_final.pt \
172
+ --same_cat_dirs data/patches/lvis_wide/seen_test \
173
+ --cross_cat_dirs data/patches/lvis_wide/unseen \
174
+ --output results/eval_results.json
175
+
176
+ # Run unit tests
177
+ python -m pytest tests/ -v
178
+ ```
179
+
180
+ ## Timeline
181
+
182
+ - **Day 1 (2026-03-06)**: Project inception, gap analysis, idea generation, experiment design
183
+ - **Day 2 (2026-03-07)**: Full codebase implementation (14 tasks), unit tests, initial experiment
184
+ - **Day 3 (2026-03-08)**: Diagnosed codebook collapse, fixed SimVQ, Exp1 β€” **STRONG GO**
185
+ - **Day 4 (2026-03-09)**: Exp2 + Exp3 completed β€” **STRONG GO**. Key finding: more categories = better generalization
186
+ - **Day 5 (2026-03-13)**: Pod reset recovery, expanded LVIS-Wide (1156 cat), retrained Exp2, trained Exp4 β€” all **STRONG GO**
187
+ - **Day 6 (2026-03-14)**: Final comparison report + visualizations. Full dataset + checkpoints backed up to HuggingFace
188
+
189
+ ## License
190
+
191
+ Apache-2.0