Hafnium13 commited on
Commit
668ae63
·
1 Parent(s): de0ab6c

Initial deployment: Tri-Fusion Crystal Embedder

Browse files
Files changed (4) hide show
  1. Dockerfile +50 -0
  2. README.md +94 -4
  3. main.py +172 -0
  4. requirements.txt +19 -0
Dockerfile ADDED
@@ -0,0 +1,50 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ FROM python:3.10-slim
2
+
3
+ # CRITICAL: Set legacy Keras before any TensorFlow imports
4
+ # Required for MEGNet compatibility with TensorFlow 2.16+
5
+ ENV TF_USE_LEGACY_KERAS=1
6
+
7
+ WORKDIR /app
8
+
9
+ # Install system dependencies
10
+ RUN apt-get update && apt-get install -y \
11
+ build-essential \
12
+ libgomp1 \
13
+ git \
14
+ && rm -rf /var/lib/apt/lists/*
15
+
16
+ # CRITICAL: Install CPU-only versions to save space and avoid GPU conflicts
17
+ # Install PyTorch CPU first (from special index)
18
+ RUN pip install --no-cache-dir torch --index-url https://download.pytorch.org/whl/cpu
19
+
20
+ # Install orb-models (required for ORBFeaturizer)
21
+ RUN pip install --no-cache-dir orb-models
22
+
23
+ # Install MatterVial and its dependencies
24
+ # Then pin TensorFlow to 2.15.x (last version compatible with MEGNet)
25
+ RUN pip install --no-cache-dir \
26
+ "mattervial @ git+https://github.com/rogeriog/MatterVial.git" \
27
+ pymatgen \
28
+ scikit-learn \
29
+ pandas \
30
+ fastapi \
31
+ uvicorn \
32
+ python-multipart
33
+
34
+ # CRITICAL: Force TensorFlow 2.15 and compatible Keras AFTER all other installs
35
+ # TensorFlow 2.16+ uses Keras 3.x which breaks MEGNet's Trainer.compile()
36
+ # tf_keras is required when TF_USE_LEGACY_KERAS=1
37
+ RUN pip install --no-cache-dir --force-reinstall \
38
+ "tensorflow-cpu>=2.15,<2.16" \
39
+ "keras>=2.15,<2.16" \
40
+ "tf_keras>=2.15,<2.16" \
41
+ "tensorboard>=2.15,<2.16" \
42
+ "ml-dtypes>=0.3.1,<0.4"
43
+
44
+ COPY . .
45
+
46
+ # HF Spaces requires port 7860
47
+ EXPOSE 7860
48
+
49
+ # Launch with generous timeout for model loading
50
+ CMD ["uvicorn", "main:app", "--host", "0.0.0.0", "--port", "7860", "--timeout-keep-alive", "120"]
README.md CHANGED
@@ -1,10 +1,100 @@
1
  ---
2
  title: Crystal Embedder
3
- emoji: 🌖
4
- colorFrom: gray
5
- colorTo: red
6
  sdk: docker
7
  pinned: false
 
8
  ---
9
 
10
- Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  title: Crystal Embedder
3
+ emoji: 🔮
4
+ colorFrom: purple
5
+ colorTo: blue
6
  sdk: docker
7
  pinned: false
8
+ license: mit
9
  ---
10
 
11
+ # Crystal-Embedder
12
+
13
+ Compute **Tri-Fusion physics embeddings** for crystal structures.
14
+
15
+ ## What it does
16
+
17
+ This service takes a crystal structure in CIF format and returns a 2738-dimensional embedding vector that captures:
18
+
19
+ | Component | Dimensions | Description |
20
+ |-----------|------------|-------------|
21
+ | **Orb-v3** | 1792 | Force field features (PyTorch) |
22
+ | **l-MM** | 758 | Electronic structure features (TensorFlow/MEGNet) |
23
+ | **l-OFM** | 188 | Orbital field matrix features (TensorFlow/MEGNet) |
24
+ | **Total** | **2738** | Concatenated, L2-normalized |
25
+
26
+ ## API
27
+
28
+ ### `POST /embed`
29
+
30
+ Compute embedding for a CIF structure.
31
+
32
+ **Request:**
33
+ ```json
34
+ {
35
+ "cif": "data_Si\n_cell_length_a 5.43..."
36
+ }
37
+ ```
38
+
39
+ **Response:**
40
+ ```json
41
+ {
42
+ "vector": [0.123, -0.456, ...],
43
+ "dims": 2738
44
+ }
45
+ ```
46
+
47
+ ### `GET /health`
48
+
49
+ Health check endpoint.
50
+
51
+ **Response:**
52
+ ```json
53
+ {
54
+ "status": "healthy",
55
+ "models_loaded": true,
56
+ "vector_dims": 2738
57
+ }
58
+ ```
59
+
60
+ ## Example Usage
61
+
62
+ ```python
63
+ import httpx
64
+
65
+ cif_content = open("structure.cif").read()
66
+
67
+ response = httpx.post(
68
+ "https://hafnium49-crystal-embedder.hf.space/embed",
69
+ json={"cif": cif_content},
70
+ timeout=60
71
+ )
72
+
73
+ embedding = response.json()["vector"]
74
+ print(f"Embedding shape: {len(embedding)}") # 2738
75
+ ```
76
+
77
+ ## Performance
78
+
79
+ - **Latency:** ~15 seconds per structure (CPU-only)
80
+ - **Memory:** ~8GB peak during inference
81
+ - **Cold start:** ~2 minutes (model loading)
82
+
83
+ ## Local Development
84
+
85
+ ```bash
86
+ # Build
87
+ docker build -t crystal-embedder .
88
+
89
+ # Run
90
+ docker run -p 7860:7860 crystal-embedder
91
+
92
+ # Test
93
+ curl http://localhost:7860/health
94
+ ```
95
+
96
+ ## Powered By
97
+
98
+ - [MatterVial](https://github.com/rogeriog/MatterVial) - Physics embeddings
99
+ - [pymatgen](https://pymatgen.org/) - Crystal structure parsing
100
+ - [FastAPI](https://fastapi.tiangolo.com/) - Web framework
main.py ADDED
@@ -0,0 +1,172 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ Crystal-Embedder Microservice
3
+
4
+ Computes Tri-Fusion embeddings (Orb-v3 + l-MM + l-OFM) for crystal structures.
5
+ Deployed on Hugging Face Spaces (CPU-only, 16GB RAM).
6
+
7
+ API:
8
+ POST /embed
9
+ Request: {"cif": "<CIF content>"}
10
+ Response: {"vector": [...], "dims": 2738}
11
+ """
12
+
13
+ import os
14
+
15
+ # CRITICAL: Set TensorFlow to use legacy Keras before any imports
16
+ # Required for MEGNet compatibility with TensorFlow 2.16+
17
+ os.environ["TF_USE_LEGACY_KERAS"] = "1"
18
+
19
+ from fastapi import FastAPI, HTTPException
20
+ from pydantic import BaseModel
21
+ import numpy as np
22
+ import pandas as pd
23
+ from sklearn.preprocessing import normalize
24
+ from pymatgen.core import Structure
25
+
26
+ app = FastAPI(
27
+ title="Crystal-Embedder",
28
+ description="Tri-Fusion physics embeddings for crystal structures",
29
+ version="1.0.0"
30
+ )
31
+
32
+ # Global model instances (loaded at startup)
33
+ orb_model = None
34
+ mm_model = None
35
+ ofm_model = None
36
+ models_loaded = False
37
+
38
+
39
+ class CifRequest(BaseModel):
40
+ cif: str
41
+
42
+
43
+ class EmbedResponse(BaseModel):
44
+ vector: list[float]
45
+ dims: int
46
+
47
+
48
+ class HealthResponse(BaseModel):
49
+ status: str
50
+ models_loaded: bool
51
+ vector_dims: int
52
+
53
+
54
+ def load_models():
55
+ """Load all three featurizer models at startup."""
56
+ global orb_model, mm_model, ofm_model, models_loaded
57
+
58
+ print("Loading Physics Engines (Orb-v3 + l-MM + l-OFM)...")
59
+ print("This may take a few minutes on first load...")
60
+
61
+ try:
62
+ # Import MatterVial featurizers
63
+ from mattervial.featurizers import ORBFeaturizer, DescriptorMEGNetFeaturizer
64
+
65
+ # Load Orb-v3 (PyTorch, CPU)
66
+ print(" Loading Orb-v3 (1792-dim)...")
67
+ orb_model = ORBFeaturizer(model_name="ORB_v3", device="cpu")
68
+
69
+ # Load l-MM (TensorFlow/MEGNet)
70
+ print(" Loading l-MM (758-dim)...")
71
+ mm_model = DescriptorMEGNetFeaturizer(base_descriptor='l-MM_v1')
72
+
73
+ # Load l-OFM (TensorFlow/MEGNet)
74
+ print(" Loading l-OFM (188-dim)...")
75
+ ofm_model = DescriptorMEGNetFeaturizer(base_descriptor='l-OFM_v1')
76
+
77
+ models_loaded = True
78
+ print("All models loaded successfully!")
79
+
80
+ except Exception as e:
81
+ print(f"Error loading models: {e}")
82
+ raise
83
+
84
+
85
+ @app.on_event("startup")
86
+ async def startup_event():
87
+ """Load models when the server starts."""
88
+ load_models()
89
+
90
+
91
+ @app.get("/health", response_model=HealthResponse)
92
+ async def health_check():
93
+ """Health check endpoint."""
94
+ return HealthResponse(
95
+ status="healthy" if models_loaded else "loading",
96
+ models_loaded=models_loaded,
97
+ vector_dims=2738
98
+ )
99
+
100
+
101
+ @app.post("/embed", response_model=EmbedResponse)
102
+ async def embed_structure(req: CifRequest):
103
+ """
104
+ Compute Tri-Fusion embedding for a CIF structure.
105
+
106
+ The embedding is a concatenation of three L2-normalized vectors:
107
+ - Orb-v3: 1792-dim (force field features)
108
+ - l-MM: 758-dim (electronic structure features)
109
+ - l-OFM: 188-dim (orbital field matrix features)
110
+
111
+ Total: 2738 dimensions
112
+ """
113
+ if not models_loaded:
114
+ raise HTTPException(status_code=503, detail="Models still loading, please retry")
115
+
116
+ try:
117
+ # 1. Parse CIF to pymatgen Structure
118
+ struct = Structure.from_str(req.cif, fmt="cif")
119
+ s_series = pd.Series([struct])
120
+
121
+ # 2. Compute features (sequential on CPU)
122
+ print("Computing Orb-v3 features...")
123
+ vec_orb = orb_model.get_features(s_series).values[0] # ~10s
124
+
125
+ print("Computing l-MM features...")
126
+ vec_mm = mm_model.get_features(s_series).values[0] # ~2s
127
+
128
+ print("Computing l-OFM features...")
129
+ vec_ofm = ofm_model.get_features(s_series).values[0] # ~0.5s
130
+
131
+ # 3. Handle NaN values (replace with 0)
132
+ vec_orb = np.nan_to_num(vec_orb, nan=0.0)
133
+ vec_mm = np.nan_to_num(vec_mm, nan=0.0)
134
+ vec_ofm = np.nan_to_num(vec_ofm, nan=0.0)
135
+
136
+ # 4. L2 Normalize each vector (prevents magnitude dominance)
137
+ v1 = normalize([vec_orb])[0]
138
+ v2 = normalize([vec_mm])[0]
139
+ v3 = normalize([vec_ofm])[0]
140
+
141
+ # 5. Concatenate
142
+ final_vector = np.concatenate([v1, v2, v3])
143
+
144
+ print(f"Embedding complete: {len(final_vector)} dimensions")
145
+
146
+ return EmbedResponse(
147
+ vector=final_vector.tolist(),
148
+ dims=len(final_vector)
149
+ )
150
+
151
+ except Exception as e:
152
+ raise HTTPException(status_code=500, detail=f"Embedding failed: {str(e)}")
153
+
154
+
155
+ @app.get("/")
156
+ async def root():
157
+ """Root endpoint with API info."""
158
+ return {
159
+ "service": "Crystal-Embedder",
160
+ "version": "1.0.0",
161
+ "description": "Tri-Fusion physics embeddings for crystal structures",
162
+ "endpoints": {
163
+ "/embed": "POST - Compute embedding from CIF",
164
+ "/health": "GET - Health check"
165
+ },
166
+ "vector_dimensions": {
167
+ "orb_v3": 1792,
168
+ "l_mm": 758,
169
+ "l_ofm": 188,
170
+ "total": 2738
171
+ }
172
+ }
requirements.txt ADDED
@@ -0,0 +1,19 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # CPU-only PyTorch (installed separately in Dockerfile via --index-url)
2
+ # torch
3
+
4
+ # CPU-only TensorFlow
5
+ tensorflow-cpu
6
+
7
+ # MatterVial (physics embeddings)
8
+ mattervial @ git+https://github.com/rogeriog/MatterVial.git
9
+
10
+ # Scientific stack
11
+ pymatgen
12
+ scikit-learn
13
+ pandas
14
+ numpy
15
+
16
+ # Web framework
17
+ fastapi
18
+ uvicorn[standard]
19
+ python-multipart