You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Element Graph Encoders (v2.1) — academic multimodal retrieval

Trained encoder adapters for the Element Graph for Academic Multimodal Retrieval project. Each adapter is a SigLIPv2-base (200M) backbone fine-tuned with LoRA + Graph Position Embedding (GPE), trained on SPIQA with a Graph-Relevance Contrastive Loss (GRCL) and late-interaction (ColBERT-style MaxSim) scoring.

Code / method / full report: https://github.com/monkcat/dl_project
Dataset + graphs: ljh38/element-graph-v2.1
Backbone: google/siglip2-base-patch16-224 (loaded from the hub; not redistributed here)

Why adapters only

Each full checkpoint is 1.5 GB but only **5 MB is trained** (LoRA + GPE); the rest is the frozen pretrained SigLIP backbone, identical across all 42 rows. This repo ships only the trained delta (≈263 MB for all 42 rows) and rebuilds the backbone from the hub at load time.

Usage

git clone https://github.com/monkcat/dl_project && cd dl_project
pip install -r requirements.txt
huggingface-cli download ljh38/element-graph-encoder-v2.1 --local-dir hf_models

import sys; sys.path.append("hf_models")   # so load_adapter.py is importable
from load_adapter import load_row
enc = load_row("h_best_combo")   # builds SigLIP+LoRA+GPE, loads the adapter

Or evaluate directly with the project's eval harness (it loads with strict=False, so the frozen backbone stays pretrained):

python -m pipeline.eval_full --ckpt hf_models/adapters/h_best_combo.pt \
    --lora_rank 8 --gpe_facets type,role,depth,pos \
    --datasets spiqa_testA sciegqa mmdocir

Match --lora_rank / --gpe_facets / --hf_id to each row's entry in manifest.json (e.g. lora_r32 needs --lora_rank 32, p_encoder_clip_l14 needs --hf_id openai/clip-vit-large-patch14).

Repository layout

adapters/<row>.pt     trained LoRA + GPE state (load with strict=False)
manifest.json         per-row build config: hf_id, lora_rank, lora_alpha, gpe_facets
load_adapter.py       helper: load_row(name) -> ready ElementTokenEncoder

Selected results (SPIQA test-A / SciEGQA / MMDocIR, Recall@10, no propagation)

Row	what it is	SPIQA	SciEGQA	MMDocIR
`a_baseline_infonce`	InfoNCE baseline	84.4	0.9	2.0
`e_grcl_no_gpe`	GRCL, no GPE	87.1	1.0	2.5
`h_full_method`	full method (GRCL+GPE+L_cov+L_cons)	86.9	1.0	2.4
`edge_refer_to`	GRCL on `refer_to` edges only	90.4	1.3	5.7
`lr_1e4`	lr 1e-4	88.6	1.6	4.3
`h_best_combo`	best HP combo	88.0	1.5	4.2
`h_best_combo_16k`	best combo, 16k steps	85.9	1.7	4.4

Key findings: GRCL gives a small but significant in-domain gain over InfoNCE (SPIQA Coverage@10, paired-bootstrap p=0.042); refer_to edges are the strongest graph signal; GPE / coverage / consistency losses show no measurable effect; compact encoders fail to transfer zero-shot (SciEGQA/MMDocIR), where a large retrieval-tuned MLLM (GME) leads. Full tables: see the GitHub report (§7).

Available rows (42)

Tier 1 a_baseline_infonce b_gpe_type_infonce c_gpe_type_role_infonce d_gpe_full_infonce e_grcl_no_gpe f_grcl_gpe g_grcl_gpe_cov h_full_method · Tier 2 m_shuffled_role n_random_role o_no_query_pe_dropout p_encoder_clip_l14 · Tier 3 gamma_03 gamma_07 cov_00/01/05/10 cons_00/01/03/10 lora_r4/r16/r32 lr_1e4 lr_1e5 tau_005 tau_010 anchor_caption/refer/nlqa edge_caption_of/refer_to/contains tokens_016/064 · Tier 4 h_best_combo h_best_combo_16k h_gpe_strong h_seed_43 h_seed_44

License

MIT (code). Backbone and datasets retain their original licenses.

Downloads last month: -

Model tree for ljh38/element-graph-encoder-v2.1

Base model

google/siglip2-base-patch16-224

Adapter

(1)

this model