structural-isomorphism-v2 (expanded)
A sentence-transformer model fine-tuned for structural similarity across scientific domains — recognizing that phenomena from completely different fields share the same underlying mathematical or dynamical structure.
This is the V2 model, trained on the expanded 5689-sample dataset (original SIBD + 4475 expanded phenomena across physics, biology, ecology, finance, engineering). Compared to V1, V2 is significantly more selective: it finds fewer cross-domain matches but with much higher precision, making it ideal for strict isomorphism discovery.
Model Description
- Base model: shibing624/text2vec-base-chinese (BERT-based, 768-dim)
- Training data: 5,689 descriptions (1,214 original SIBD + 4,475 from expanded 4,443-phenomenon knowledge base)
- Training objective: MultipleNegativesRankingLoss (positive pairs = same structural type, different domain)
- Hyperparameters: 5 epochs | batch 16 | lr 2e-5 | warmup 10% | max 500 pairs per type
- Training time: ~3.5 hours on Apple M4 (MPS), 10,985 steps
Evaluation Results
Evaluated on the expanded 4,443-phenomenon test set (1,000 sampled):
| Metric | V1 | V2 | Delta |
|---|---|---|---|
| Silhouette Score | -0.17 | 0.55 | +0.72 |
| Retrieval@5 | 23% | 96% | +73% |
On the original 84-type SIBD test set V2 also matches or exceeds V1 performance.
Discovery Pipeline Results
Running V2 on the expanded 4,443-phenomenon knowledge base with threshold 0.70:
| Step | Count |
|---|---|
| Cross-domain high-similarity pairs | 4,533 |
| LLM strict screening (50 batches) — 5/5 score | 94 |
| LLM strict screening — 4+/5 score (high potential) | 761 (16.8%) |
| Deep analysis of 94 top pairs → A-level candidate papers | 19 |
This represents a 75× stricter retrieval than the V1 model on the same knowledge base (V1 returned 339,913 high-similarity pairs). V2 and V1 discover different structural isomorphisms and are complementary rather than redundant — their top-tier findings have zero overlap.
Top V2 A-level discoveries (deep analysis score):
- Permafrost methane delayed feedback × Extinction debt (8.6)
- Semiconductor laser relaxation oscillation × Algorithmic stablecoin anchoring (8.6)
- Percolation threshold × Technology adoption chasm (8.5)
- MHC over-dominant selection × Model ensemble (8.5)
- Extinction debt × ENSO delayed oscillator (8.4)
Usage
With sentence-transformers
from sentence_transformers import SentenceTransformer, util
model = SentenceTransformer("qinghuiwan/structural-isomorphism-v2-expanded")
# Encode two descriptions from different domains
emb1 = model.encode("永冻土融化释放甲烷形成的温度-甲烷-温度正反馈循环")
emb2 = model.encode("生境破坏后物种世代反馈滞后引起的灭绝承诺债务")
similarity = util.cos_sim(emb1, emb2).item()
print(f"Structural similarity: {similarity:.3f}")
# Both share delayed-feedback dynamics → high structural similarity
Discovery pipeline
from sentence_transformers import SentenceTransformer, util
import json
model = SentenceTransformer("qinghuiwan/structural-isomorphism-v2-expanded")
# Load phenomenon knowledge base
kb = [json.loads(l) for l in open("kb-expanded.jsonl")]
descs = [p["description"] for p in kb]
emb = model.encode(descs, convert_to_numpy=True, batch_size=64)
# Find cross-domain high-similarity pairs
from itertools import combinations
for i, j in combinations(range(len(kb)), 2):
if kb[i]["domain"] == kb[j]["domain"]:
continue
sim = float(util.cos_sim(emb[i], emb[j]))
if sim >= 0.70:
print(f"{sim:.3f} {kb[i]['name']} × {kb[j]['name']}")
Links
- Project homepage: https://structural.bytedance.city
- GitHub: https://github.com/dada8899/structural-isomorphism
- V1 model: qinghuiwan/structural-isomorphism-v1
- Zenodo (v1.1): https://doi.org/10.5281/zenodo.19541416
Citation
@software{structural_isomorphism_v2_2026,
author = {Wan, Qinghui},
title = {Structural Isomorphism Search Engine — V2 Model (Expanded)},
year = {2026},
url = {https://github.com/dada8899/structural-isomorphism}
}
License
MIT
- Downloads last month
- 20
Model tree for qinghuiwan/structural-isomorphism-v2-expanded
Base model
shibing624/text2vec-base-chinese