structural-isomorphism-v2 (expanded)

A sentence-transformer model fine-tuned for structural similarity across scientific domains — recognizing that phenomena from completely different fields share the same underlying mathematical or dynamical structure.

This is the V2 model, trained on the expanded 5689-sample dataset (original SIBD + 4475 expanded phenomena across physics, biology, ecology, finance, engineering). Compared to V1, V2 is significantly more selective: it finds fewer cross-domain matches but with much higher precision, making it ideal for strict isomorphism discovery.

Model Description

  • Base model: shibing624/text2vec-base-chinese (BERT-based, 768-dim)
  • Training data: 5,689 descriptions (1,214 original SIBD + 4,475 from expanded 4,443-phenomenon knowledge base)
  • Training objective: MultipleNegativesRankingLoss (positive pairs = same structural type, different domain)
  • Hyperparameters: 5 epochs | batch 16 | lr 2e-5 | warmup 10% | max 500 pairs per type
  • Training time: ~3.5 hours on Apple M4 (MPS), 10,985 steps

Evaluation Results

Evaluated on the expanded 4,443-phenomenon test set (1,000 sampled):

Metric V1 V2 Delta
Silhouette Score -0.17 0.55 +0.72
Retrieval@5 23% 96% +73%

On the original 84-type SIBD test set V2 also matches or exceeds V1 performance.

Discovery Pipeline Results

Running V2 on the expanded 4,443-phenomenon knowledge base with threshold 0.70:

Step Count
Cross-domain high-similarity pairs 4,533
LLM strict screening (50 batches) — 5/5 score 94
LLM strict screening — 4+/5 score (high potential) 761 (16.8%)
Deep analysis of 94 top pairs → A-level candidate papers 19

This represents a 75× stricter retrieval than the V1 model on the same knowledge base (V1 returned 339,913 high-similarity pairs). V2 and V1 discover different structural isomorphisms and are complementary rather than redundant — their top-tier findings have zero overlap.

Top V2 A-level discoveries (deep analysis score):

  1. Permafrost methane delayed feedback × Extinction debt (8.6)
  2. Semiconductor laser relaxation oscillation × Algorithmic stablecoin anchoring (8.6)
  3. Percolation threshold × Technology adoption chasm (8.5)
  4. MHC over-dominant selection × Model ensemble (8.5)
  5. Extinction debt × ENSO delayed oscillator (8.4)

Usage

With sentence-transformers

from sentence_transformers import SentenceTransformer, util

model = SentenceTransformer("qinghuiwan/structural-isomorphism-v2-expanded")

# Encode two descriptions from different domains
emb1 = model.encode("永冻土融化释放甲烷形成的温度-甲烷-温度正反馈循环")
emb2 = model.encode("生境破坏后物种世代反馈滞后引起的灭绝承诺债务")

similarity = util.cos_sim(emb1, emb2).item()
print(f"Structural similarity: {similarity:.3f}")
# Both share delayed-feedback dynamics → high structural similarity

Discovery pipeline

from sentence_transformers import SentenceTransformer, util
import json

model = SentenceTransformer("qinghuiwan/structural-isomorphism-v2-expanded")

# Load phenomenon knowledge base
kb = [json.loads(l) for l in open("kb-expanded.jsonl")]
descs = [p["description"] for p in kb]
emb = model.encode(descs, convert_to_numpy=True, batch_size=64)

# Find cross-domain high-similarity pairs
from itertools import combinations
for i, j in combinations(range(len(kb)), 2):
    if kb[i]["domain"] == kb[j]["domain"]:
        continue
    sim = float(util.cos_sim(emb[i], emb[j]))
    if sim >= 0.70:
        print(f"{sim:.3f}  {kb[i]['name']} × {kb[j]['name']}")

Links

Citation

@software{structural_isomorphism_v2_2026,
  author = {Wan, Qinghui},
  title  = {Structural Isomorphism Search Engine — V2 Model (Expanded)},
  year   = {2026},
  url    = {https://github.com/dada8899/structural-isomorphism}
}

License

MIT

Downloads last month
20
Safetensors
Model size
0.1B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for qinghuiwan/structural-isomorphism-v2-expanded

Finetuned
(5)
this model