How to use from the
Use from the
PEFT library
# Gated model: Login with a HF token with gated access permission
hf auth login
Task type is invalid.

You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Log in or Sign Up to review the conditions and access this model content.

Element Graph Encoders (v2.1) — academic multimodal retrieval

Trained encoder adapters for the Element Graph for Academic Multimodal Retrieval project. Each adapter is a SigLIPv2-base (200M) backbone fine-tuned with LoRA + Graph Position Embedding (GPE), trained on SPIQA with a Graph-Relevance Contrastive Loss (GRCL) and late-interaction (ColBERT-style MaxSim) scoring.

Why adapters only

Each full checkpoint is 1.5 GB but only **5 MB is trained** (LoRA + GPE); the rest is the frozen pretrained SigLIP backbone, identical across all 42 rows. This repo ships only the trained delta (≈263 MB for all 42 rows) and rebuilds the backbone from the hub at load time.

Usage

git clone https://github.com/monkcat/dl_project && cd dl_project
pip install -r requirements.txt
huggingface-cli download ljh38/element-graph-encoder-v2.1 --local-dir hf_models
import sys; sys.path.append("hf_models")   # so load_adapter.py is importable
from load_adapter import load_row
enc = load_row("h_best_combo")   # builds SigLIP+LoRA+GPE, loads the adapter

Or evaluate directly with the project's eval harness (it loads with strict=False, so the frozen backbone stays pretrained):

python -m pipeline.eval_full --ckpt hf_models/adapters/h_best_combo.pt \
    --lora_rank 8 --gpe_facets type,role,depth,pos \
    --datasets spiqa_testA sciegqa mmdocir

Match --lora_rank / --gpe_facets / --hf_id to each row's entry in manifest.json (e.g. lora_r32 needs --lora_rank 32, p_encoder_clip_l14 needs --hf_id openai/clip-vit-large-patch14).

Repository layout

adapters/<row>.pt     trained LoRA + GPE state (load with strict=False)
manifest.json         per-row build config: hf_id, lora_rank, lora_alpha, gpe_facets
load_adapter.py       helper: load_row(name) -> ready ElementTokenEncoder

Selected results (SPIQA test-A / SciEGQA / MMDocIR, Recall@10, no propagation)

Row what it is SPIQA SciEGQA MMDocIR
a_baseline_infonce InfoNCE baseline 84.4 0.9 2.0
e_grcl_no_gpe GRCL, no GPE 87.1 1.0 2.5
h_full_method full method (GRCL+GPE+L_cov+L_cons) 86.9 1.0 2.4
edge_refer_to GRCL on refer_to edges only 90.4 1.3 5.7
lr_1e4 lr 1e-4 88.6 1.6 4.3
h_best_combo best HP combo 88.0 1.5 4.2
h_best_combo_16k best combo, 16k steps 85.9 1.7 4.4

Key findings: GRCL gives a small but significant in-domain gain over InfoNCE (SPIQA Coverage@10, paired-bootstrap p=0.042); refer_to edges are the strongest graph signal; GPE / coverage / consistency losses show no measurable effect; compact encoders fail to transfer zero-shot (SciEGQA/MMDocIR), where a large retrieval-tuned MLLM (GME) leads. Full tables: see the GitHub report (§7).

Available rows (42)

Tier 1 a_baseline_infonce b_gpe_type_infonce c_gpe_type_role_infonce d_gpe_full_infonce e_grcl_no_gpe f_grcl_gpe g_grcl_gpe_cov h_full_method · Tier 2 m_shuffled_role n_random_role o_no_query_pe_dropout p_encoder_clip_l14 · Tier 3 gamma_03 gamma_07 cov_00/01/05/10 cons_00/01/03/10 lora_r4/r16/r32 lr_1e4 lr_1e5 tau_005 tau_010 anchor_caption/refer/nlqa edge_caption_of/refer_to/contains tokens_016/064 · Tier 4 h_best_combo h_best_combo_16k h_gpe_strong h_seed_43 h_seed_44

License

MIT (code). Backbone and datasets retain their original licenses.

Downloads last month
-
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for ljh38/element-graph-encoder-v2.1

Adapter
(1)
this model