add 42 adapter-only encoders (LoRA+GPE) + manifest + loader + card

0c68509 verified 6 days ago

4.01 kB

license: mit
language:
  - en
library_name: peft
tags:
  - retrieval
  - multimodal
  - document-retrieval
  - late-interaction
  - lora
  - siglip
base_model: google/siglip2-base-patch16-224
pipeline_tag: feature-extraction

Element Graph Encoders (v2.1) — academic multimodal retrieval

Trained encoder adapters for the Element Graph for Academic Multimodal Retrieval project. Each adapter is a SigLIPv2-base (200M) backbone fine-tuned with LoRA + Graph Position Embedding (GPE), trained on SPIQA with a Graph-Relevance Contrastive Loss (GRCL) and late-interaction (ColBERT-style MaxSim) scoring.

Code / method / full report: https://github.com/monkcat/dl_project
Dataset + graphs: ljh38/element-graph-v2.1
Backbone: google/siglip2-base-patch16-224 (loaded from the hub; not redistributed here)

Why adapters only

Each full checkpoint is 1.5 GB but only **5 MB is trained** (LoRA + GPE); the rest is the frozen pretrained SigLIP backbone, identical across all 42 rows. This repo ships only the trained delta (≈263 MB for all 42 rows) and rebuilds the backbone from the hub at load time.

Usage

git clone https://github.com/monkcat/dl_project && cd dl_project
pip install -r requirements.txt
huggingface-cli download ljh38/element-graph-encoder-v2.1 --local-dir hf_models

import sys; sys.path.append("hf_models")   # so load_adapter.py is importable
from load_adapter import load_row
enc = load_row("h_best_combo")   # builds SigLIP+LoRA+GPE, loads the adapter

Or evaluate directly with the project's eval harness (it loads with strict=False, so the frozen backbone stays pretrained):

python -m pipeline.eval_full --ckpt hf_models/adapters/h_best_combo.pt \
    --lora_rank 8 --gpe_facets type,role,depth,pos \
    --datasets spiqa_testA sciegqa mmdocir

Match --lora_rank / --gpe_facets / --hf_id to each row's entry in manifest.json (e.g. lora_r32 needs --lora_rank 32, p_encoder_clip_l14 needs --hf_id openai/clip-vit-large-patch14).

Repository layout

adapters/<row>.pt     trained LoRA + GPE state (load with strict=False)
manifest.json         per-row build config: hf_id, lora_rank, lora_alpha, gpe_facets
load_adapter.py       helper: load_row(name) -> ready ElementTokenEncoder

Selected results (SPIQA test-A / SciEGQA / MMDocIR, Recall@10, no propagation)

Row	what it is	SPIQA	SciEGQA	MMDocIR
`a_baseline_infonce`	InfoNCE baseline	84.4	0.9	2.0
`e_grcl_no_gpe`	GRCL, no GPE	87.1	1.0	2.5
`h_full_method`	full method (GRCL+GPE+L_cov+L_cons)	86.9	1.0	2.4
`edge_refer_to`	GRCL on `refer_to` edges only	90.4	1.3	5.7
`lr_1e4`	lr 1e-4	88.6	1.6	4.3
`h_best_combo`	best HP combo	88.0	1.5	4.2
`h_best_combo_16k`	best combo, 16k steps	85.9	1.7	4.4

Key findings: GRCL gives a small but significant in-domain gain over InfoNCE (SPIQA Coverage@10, paired-bootstrap p=0.042); refer_to edges are the strongest graph signal; GPE / coverage / consistency losses show no measurable effect; compact encoders fail to transfer zero-shot (SciEGQA/MMDocIR), where a large retrieval-tuned MLLM (GME) leads. Full tables: see the GitHub report (§7).

Available rows (42)

Tier 1 a_baseline_infonce b_gpe_type_infonce c_gpe_type_role_infonce d_gpe_full_infonce e_grcl_no_gpe f_grcl_gpe g_grcl_gpe_cov h_full_method · Tier 2 m_shuffled_role n_random_role o_no_query_pe_dropout p_encoder_clip_l14 · Tier 3 gamma_03 gamma_07 cov_00/01/05/10 cons_00/01/03/10 lora_r4/r16/r32 lr_1e4 lr_1e5 tau_005 tau_010 anchor_caption/refer/nlqa edge_caption_of/refer_to/contains tokens_016/064 · Tier 4 h_best_combo h_best_combo_16k h_gpe_strong h_seed_43 h_seed_44

License

MIT (code). Backbone and datasets retain their original licenses.