docs: update repo structure, fix Quick Start, add tar.gz to Data section

efcae34 verified 1 day ago

8.73 kB

	---
	license: apache-2.0
	tags:
	- 3d
	- mesh-generation
	- vq-vae
	- codebook
	- topology
	- graph-neural-network
	- research
	datasets:
	- allenai/objaverse
	library_name: pytorch
	pipeline_tag: other
	---

	# MeshLex Research

	<div align="center">

	MeshLex: Learning a Topology-aware Patch Vocabulary for Compositional Mesh Generation

	<a href="https://github.com/Pthahnix/MeshLex-Research"><img alt="GitHub"
	src="https://img.shields.io/badge/GitHub-MeshLex--Research-181717?logo=github&logoColor=white"/></a>
	<a href="https://huggingface.co/Pthahnix/MeshLex-Research"><img alt="Hugging Face"
	src="https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-MeshLex--Research-ffc107?color=ffc107&logoColor=white"/></a>
	<a href="https://github.com/Pthahnix/MeshLex-Research/blob/main/LICENSE"><img alt="License"
	src="https://img.shields.io/badge/License-Apache_2.0-f5de53?&color=f5de53"/></a>

	</div>

	<hr>

	## Table of Contents

	1. [Overview](#overview)
	2. [Current Status](#current-status)
	3. [Repo Contents](#repo-contents)
	4. [Core Hypothesis](#core-hypothesis)
	5. [Model Architecture](#model-architecture)
	6. [Experimental Results](#experimental-results)
	7. [Data](#data)
	8. [Quick Start](#quick-start)
	9. [Timeline](#timeline)
	10. [License](#license)

	## Overview

	A research project exploring whether 3D triangle meshes possess a finite, reusable "vocabulary" of local topological patterns — analogous to how BPE tokens form a vocabulary for natural language.

	Instead of generating meshes face-by-face, MeshLex learns a codebook of ~4096 topology-aware patches (each covering 20-50 faces) and generates meshes by selecting, deforming, and assembling patches from this codebook. A 4000-face mesh becomes ~130 tokens — an order of magnitude more compact than the state-of-the-art (FACE, ICML 2026: ~400 tokens).

	\| \| MeshMosaic \| FreeMesh \| FACE \| MeshLex \|
	\|---\|---\|---\|---\|---\|
	\| Approach \| Divide-and-conquer \| BPE on coordinates \| One-face-one-token \| Topology patch codebook \|
	\| Still per-face generation? \| Yes \| Yes \| Yes \| No \|
	\| Has codebook? \| No \| Yes (coordinate-level) \| No \| Yes (topology-level) \|
	\| Compression (4K faces) \| N/A \| ~300 tokens \| ~400 tokens \| ~130 tokens \|

	## Current Status

	Feasibility validation COMPLETE — 4/4 experiments STRONG GO. Ready for formal experiment design.

	\| # \| Experiment \| Status \| Result \|
	\|---\|-----------\|--------\|--------\|
	\| 1 \| A-stage × 5-Category \| Done \| STRONG GO (ratio 1.145x, util 46%) \|
	\| 2 \| A-stage × LVIS-Wide \| Done \| STRONG GO (ratio 1.019x, util 95.3%) \|
	\| 3 \| B-stage × 5-Category \| Done \| STRONG GO (ratio 1.185x, util 47%) \|
	\| 4 \| B-stage × LVIS-Wide \| Done \| STRONG GO (ratio 1.019x, util 94.9%) \|

	Key findings:
	- More categories = dramatically better generalization: LVIS-Wide (1156 cat) ratio 1.019x vs 5-cat 1.145x, util 95% vs 46%
	- Best result (Exp4): Same-cat CD 211.6, Cross-cat CD 215.8 — near-zero generalization gap
	- SimVQ collapse fix successful: utilization 0.46% → 99%+ (217x improvement)
	- B-stage multi-token KV decoder effective: reconstruction CD reduced 6.2%

	## Repo Contents

	This HuggingFace repo stores checkpoints and processed datasets for reproducibility.

	### Checkpoints

	\| Experiment \| Path \| Description \|
	\|------------\|------\|-------------\|
	\| Exp1 A-stage × 5cat \| `checkpoints/exp1_A_5cat/` \| `checkpoint_final.pt` + `training_history.json` \|
	\| Exp2 A-stage × LVIS-Wide \| `checkpoints/exp2_A_lvis_wide/` \| `checkpoint_final.pt` + `training_history.json` \|
	\| Exp3 B-stage × 5cat \| `checkpoints/exp3_B_5cat/` \| `checkpoint_final.pt` + `training_history.json` \|
	\| Exp4 B-stage × LVIS-Wide \| `checkpoints/exp4_B_lvis_wide/` \| `checkpoint_final.pt` + `training_history.json` \|

	### Data

	\| File / Directory \| Size \| Contents \|
	\|------------------\|------\|----------\|
	\| `data/meshlex_data.tar.gz` \| ~1.2 GB \| All processed data in one archive (recommended) \|
	\| `data/patches/` \| ~1.1 GB \| NPZ patch files (5cat + LVIS-Wide splits) \|
	\| `data/meshes/` \| ~931 MB \| Preprocessed decimated OBJ files (5,497 meshes) \|
	\| `data/objaverse/` \| ~2 MB \| Download manifests \|

	The `tar.gz` archive contains patches, meshes, and manifests — download it and extract to skip all preprocessing.

	## Core Hypothesis

	> Mesh local topology is low-entropy and universal across object categories. A finite codebook of ~4096 topology prototypes, combined with continuous deformation parameters, can reconstruct arbitrary meshes with high fidelity.

	## Model Architecture

	The full model is a VQ-VAE with three modules:

	```
	Objaverse-LVIS GLB → Decimation (pyfqmr) → Normalize [-1,1]
	→ METIS Patch Segmentation (~35 faces/patch)
	→ PCA-aligned local coordinates
	→ Face features (15-dim: vertices + normal + angles)
	→ SAGEConv GNN Encoder → 128-dim embedding
	→ SimVQ Codebook (K=4096, learnable reparameterization)
	→ Cross-attention MLP Decoder → Reconstructed vertices
	```

	- PatchEncoder: 4-layer SAGEConv GNN + global mean pooling → 128-dim z
	- SimVQ Codebook: Frozen base C + learnable linear W, effective codebook CW = W(C). All 4096 entries share W's gradient — no code is ever forgotten
	- PatchDecoder: Cross-attention with learnable vertex queries → per-vertex xyz coordinates
	- A-stage: Single KV token decoder (baseline)
	- B-stage: 4 KV tokens decoder (improved reconstruction, resumed from A-stage)

	## Experimental Results

	\| Experiment \| Scale \| Stage \| CD Ratio \| Util (same) \| Util (cross) \| Decision \|
	\|------------\|-------\|-------\|----------\|-------------\|--------------\|----------\|
	\| Exp1 \| 5 categories \| A (1 KV token) \| 1.145x \| 46.0% \| 47.0% \| ✅ STRONG GO \|
	\| Exp3 \| 5 categories \| B (4 KV tokens) \| 1.185x \| 47.1% \| 47.3% \| ✅ STRONG GO \|
	\| Exp2 \| 1156 categories \| A (1 KV token) \| 1.019x \| 95.3% \| 83.6% \| ✅ STRONG GO \|
	\| Exp4 \| 1156 categories \| B (4 KV tokens) \| 1.019x \| 94.9% \| 82.8% \| ✅ STRONG GO \|

	CD Ratio = Cross-category CD / Same-category CD. Closer to 1.0 = better generalization. Target: < 1.2x.

	Scaling from 5 to 1156 categories causes CD ratio to drop from 1.145x to 1.019x (near-perfect generalization) and utilization to surge from 46% to 95% (nearly full codebook activation).

	## Data

	Training data sourced from [Objaverse-LVIS](https://huggingface.co/datasets/allenai/objaverse) (Allen AI).

	- 5-Category: chair, table, airplane, car, lamp — used for initial validation
	- LVIS-Wide: 1156 categories from Objaverse-LVIS, 10 objects per category
	- `seen_train`: 188,696 patches (1046 categories)
	- `seen_test`: 45,441 patches (same 1046 categories, held-out objects)
	- `unseen`: 12,655 patches (110 held-out categories, never seen during training)

	## Quick Start

	```bash
	# Clone the code repo
	git clone https://github.com/Pthahnix/MeshLex-Research.git
	cd MeshLex-Research

	# Install dependencies
	pip install -r requirements.txt
	pip install torch-geometric
	pip install pyg_lib torch_scatter torch_sparse torch_cluster torch_spline_conv \
	-f https://data.pyg.org/whl/torch-2.4.0+cu124.html

	# Download processed data from this HF repo
	pip install huggingface_hub
	python -c "
	from huggingface_hub import hf_hub_download
	hf_hub_download('Pthahnix/MeshLex-Research', 'data/meshlex_data.tar.gz', repo_type='model', local_dir='.')
	"
	tar xzf data/meshlex_data.tar.gz -C data/

	# Download checkpoints
	python -c "
	from huggingface_hub import snapshot_download
	snapshot_download('Pthahnix/MeshLex-Research', allow_patterns='checkpoints/*', repo_type='model', local_dir='.')
	"
	mv checkpoints data/checkpoints

	# Run evaluation on Exp4 (best model)
	PYTHONPATH=. python scripts/evaluate.py \
	--checkpoint data/checkpoints/exp4_B_lvis_wide/checkpoint_final.pt \
	--same_cat_dirs data/patches/lvis_wide/seen_test \
	--cross_cat_dirs data/patches/lvis_wide/unseen \
	--output results/eval_results.json

	# Run unit tests
	python -m pytest tests/ -v
	```

	## Timeline

	- Day 1 (2026-03-06): Project inception, gap analysis, idea generation, experiment design
	- Day 2 (2026-03-07): Full codebase implementation (14 tasks), unit tests, initial experiment
	- Day 3 (2026-03-08): Diagnosed codebook collapse, fixed SimVQ, Exp1 — STRONG GO
	- Day 4 (2026-03-09): Exp2 + Exp3 completed — STRONG GO. Key finding: more categories = better generalization
	- Day 5 (2026-03-13): Pod reset recovery, expanded LVIS-Wide (1156 cat), retrained Exp2, trained Exp4 — all STRONG GO
	- Day 6 (2026-03-14): Final comparison report + visualizations. Full dataset + checkpoints backed up to HuggingFace

	## License

	Apache-2.0