material-universe / README.md
Hafnium49's picture
Deploy weighted clustering presets with 11 physics perspectives
25519de verified
metadata
title: Material Universe Explorer
emoji: ๐Ÿ”ฎ
colorFrom: blue
colorTo: purple
sdk: docker
app_file: app.py
pinned: false

Crystal-Chroma

Physical RAG for Inverse Material Design

Crystal-Chroma is a Retrieval-Augmented Generation system that retrieves crystal structures based on physical behavioral similarity rather than text semantics. It enables "Inverse Design" queries like "Find a material that behaves like this expensive catalyst but uses cheaper elements."

How It Works

  1. Ingest ~44,000 stable materials from the Materials Project
  2. Featurize each crystal using MatterVial embeddings (fusing force fields, electronic structure, and composition)
  3. Store 3000-dimensional physics vectors in ChromaDB
  4. Query by physical similarity to find materials with similar internal forces and electronics

Quick Start

# Install dependencies
uv sync

# Set your Materials Project API key
echo 'MP_API_KEY="your_key_here"' > .env

# Build the crystal database (requires VPN - see docs/api_connection_issue.md)
uv run python build_crystal_db.py

# Search for similar materials
uv run python tools/physical_search.py samples/test.cif --limit 10

Project Structure

crystal-chroma/
โ”œโ”€โ”€ build_crystal_db.py      # Ingestion pipeline (MP API โ†’ MatterVial โ†’ ChromaDB)
โ”œโ”€โ”€ tools/
โ”‚   โ””โ”€โ”€ physical_search.py   # Similarity search tool
โ”œโ”€โ”€ mattervial/              # Physics encoder (local package)
โ”œโ”€โ”€ crystal_chroma_db/       # Persistent vector store
โ”œโ”€โ”€ debug/                   # API connection debug scripts
โ”œโ”€โ”€ samples/                 # Sample CIF files
โ””โ”€โ”€ docs/
    โ”œโ”€โ”€ api_connection_issue.md              # VPN/Cloudflare workaround
    โ””โ”€โ”€ proposal_implementation_of_...md     # System design document

Requirements

Key Features

Feature Description
Resume capability Ingestion skips already-downloaded materials
Rate limiting 1-second delay between API batches
Error handling Continues on failures, doesn't lose progress
Physics-based search Find materials by behavior, not just composition

Dependencies

  • chromadb - Vector database
  • mp-api - Materials Project API client
  • mattervial - Physics encoder (local)
  • numpy, tqdm, python-dotenv

Documentation

License

MIT