structural-isomorphism-v2 (expanded)

A sentence-transformer model fine-tuned for structural similarity across scientific domains — recognizing that phenomena from completely different fields share the same underlying mathematical or dynamical structure.

This is the V2 model, trained on the expanded 5689-sample dataset (original SIBD + 4475 expanded phenomena across physics, biology, ecology, finance, engineering). Compared to V1, V2 is significantly more selective: it finds fewer cross-domain matches but with much higher precision, making it ideal for strict isomorphism discovery.

Model Description

Base model: shibing624/text2vec-base-chinese (BERT-based, 768-dim)
Training data: 5,689 descriptions (1,214 original SIBD + 4,475 from expanded 4,443-phenomenon knowledge base)
Training objective: MultipleNegativesRankingLoss (positive pairs = same structural type, different domain)
Hyperparameters: 5 epochs | batch 16 | lr 2e-5 | warmup 10% | max 500 pairs per type
Training time: ~3.5 hours on Apple M4 (MPS), 10,985 steps

Evaluation Results

Evaluated on the expanded 4,443-phenomenon test set (1,000 sampled):

Metric	V1	V2	Delta
Silhouette Score	-0.17	0.55	+0.72
Retrieval@5	23%	96%	+73%

On the original 84-type SIBD test set V2 also matches or exceeds V1 performance.

Discovery Pipeline Results

Running V2 on the expanded 4,443-phenomenon knowledge base with threshold 0.70:

Step	Count
Cross-domain high-similarity pairs	4,533
LLM strict screening (50 batches) — 5/5 score	94
LLM strict screening — 4+/5 score (high potential)	761 (16.8%)
Deep analysis of 94 top pairs → A-level candidate papers	19

This represents a 75× stricter retrieval than the V1 model on the same knowledge base (V1 returned 339,913 high-similarity pairs). V2 and V1 discover different structural isomorphisms and are complementary rather than redundant — their top-tier findings have zero overlap.

Top V2 A-level discoveries (deep analysis score):

Permafrost methane delayed feedback × Extinction debt (8.6)
Semiconductor laser relaxation oscillation × Algorithmic stablecoin anchoring (8.6)
Percolation threshold × Technology adoption chasm (8.5)
MHC over-dominant selection × Model ensemble (8.5)
Extinction debt × ENSO delayed oscillator (8.4)

Usage

With sentence-transformers

from sentence_transformers import SentenceTransformer, util

model = SentenceTransformer("qinghuiwan/structural-isomorphism-v2-expanded")

# Encode two descriptions from different domains
emb1 = model.encode("永冻土融化释放甲烷形成的温度-甲烷-温度正反馈循环")
emb2 = model.encode("生境破坏后物种世代反馈滞后引起的灭绝承诺债务")

similarity = util.cos_sim(emb1, emb2).item()
print(f"Structural similarity: {similarity:.3f}")
# Both share delayed-feedback dynamics → high structural similarity

Discovery pipeline

from sentence_transformers import SentenceTransformer, util
import json

model = SentenceTransformer("qinghuiwan/structural-isomorphism-v2-expanded")

# Load phenomenon knowledge base
kb = [json.loads(l) for l in open("kb-expanded.jsonl")]
descs = [p["description"] for p in kb]
emb = model.encode(descs, convert_to_numpy=True, batch_size=64)

# Find cross-domain high-similarity pairs
from itertools import combinations
for i, j in combinations(range(len(kb)), 2):
    if kb[i]["domain"] == kb[j]["domain"]:
        continue
    sim = float(util.cos_sim(emb[i], emb[j]))
    if sim >= 0.70:
        print(f"{sim:.3f}  {kb[i]['name']} × {kb[j]['name']}")

Citation

@software{structural_isomorphism_v2_2026,
  author = {Wan, Qinghui},
  title  = {Structural Isomorphism Search Engine — V2 Model (Expanded)},
  year   = {2026},
  url    = {https://github.com/dada8899/structural-isomorphism}
}

License

MIT

Downloads last month: 12

Safetensors

Model size

0.1B params

Tensor type

F32

Model tree for qinghuiwan/structural-isomorphism-v2-expanded

Base model

shibing624/text2vec-base-chinese

Finetuned

(5)

this model

qinghuiwan
/

structural-isomorphism-v2-expanded