Spaces:
Sleeping
Sleeping
metadata
title: Material Universe Explorer
emoji: ๐ฎ
colorFrom: blue
colorTo: purple
sdk: docker
app_file: app.py
pinned: false
Crystal-Chroma
Physical RAG for Inverse Material Design
Crystal-Chroma is a Retrieval-Augmented Generation system that retrieves crystal structures based on physical behavioral similarity rather than text semantics. It enables "Inverse Design" queries like "Find a material that behaves like this expensive catalyst but uses cheaper elements."
How It Works
- Ingest ~44,000 stable materials from the Materials Project
- Featurize each crystal using MatterVial embeddings (fusing force fields, electronic structure, and composition)
- Store 3000-dimensional physics vectors in ChromaDB
- Query by physical similarity to find materials with similar internal forces and electronics
Quick Start
# Install dependencies
uv sync
# Set your Materials Project API key
echo 'MP_API_KEY="your_key_here"' > .env
# Build the crystal database (requires VPN - see docs/api_connection_issue.md)
uv run python build_crystal_db.py
# Search for similar materials
uv run python tools/physical_search.py samples/test.cif --limit 10
Project Structure
crystal-chroma/
โโโ build_crystal_db.py # Ingestion pipeline (MP API โ MatterVial โ ChromaDB)
โโโ tools/
โ โโโ physical_search.py # Similarity search tool
โโโ mattervial/ # Physics encoder (local package)
โโโ crystal_chroma_db/ # Persistent vector store
โโโ debug/ # API connection debug scripts
โโโ samples/ # Sample CIF files
โโโ docs/
โโโ api_connection_issue.md # VPN/Cloudflare workaround
โโโ proposal_implementation_of_...md # System design document
Requirements
- Python 3.12+
- Materials Project API key (get one here)
- VPN with US exit node (see API Connection Issue)
Key Features
| Feature | Description |
|---|---|
| Resume capability | Ingestion skips already-downloaded materials |
| Rate limiting | 1-second delay between API batches |
| Error handling | Continues on failures, doesn't lose progress |
| Physics-based search | Find materials by behavior, not just composition |
Dependencies
chromadb- Vector databasemp-api- Materials Project API clientmattervial- Physics encoder (local)numpy,tqdm,python-dotenv
Documentation
License
MIT