ArcDoc-UnB — Sub-Center CosFace + Professor Network (Sprint3b, Split 0)

ArcDoc-UnB is a visual document embedding model trained with the Sprint3b pipeline: a two-phase curriculum using a Professor Network (RL-based hard-negative mining) on top of the InternVL3-2B backbone. Trained on the UnB gpds2 server.

Architecture

Component	Value
Backbone	InternVL3-2B (`OpenGVLab/InternVL3-2B`) — frozen
Cut layer	27
Pooler	Attention
Head	MLP
Embedding dim	1536
Loss	Sub-Center CosFace (m=0.35, s=32, k=3)

Performance

Dataset	Metric	Value
LA-CDIP split0 (validation)	EER	1.80%
RVL-CDIP (zero-shot, Top-1)	Accuracy	—

Training Details

Parameter	Value
Training data	LA-CDIP split0 (ZSL protocol, val=split 0)
Phase 1	10 epochs, no professor
Phase 2	5 epochs, professor active (warmup=140 steps)
Steps / epoch	140
Batch size	4 (grad accum 3 → effective 12)
Candidate pool	8
Student LR	5e-5 (AdamW, plateau scheduler)
Seed	42

Usage

import torch
from PIL import Image
from huggingface_hub import hf_hub_download

from cavl_doc.models.backbone_loader import load_model
from cavl_doc.models.modeling_cavl import build_cavl_model
from cavl_doc.data.transforms import build_transform, dynamic_preprocess
from cavl_doc.utils.embedding_utils import prepare_inputs_for_multimodal_embedding

REPO_ID = "Jpcosta90/arcdoc-unb"
PROMPT  = "<image> Analyze this document"
device  = "cuda" if torch.cuda.is_available() else "cpu"

# 1. Load backbone
backbone, _, tokenizer, _, _ = load_model("InternVL3-2B", load_in_4bit=False)
backbone = backbone.to(device)
backbone.img_context_token_id = tokenizer.convert_tokens_to_ids("<IMG_CONTEXT>")

# 2. Load checkpoint
ckpt_path = hf_hub_download(REPO_ID, "best_model.pt")
ckpt = torch.load(ckpt_path, map_location=device, weights_only=False)
cfg  = ckpt["config"]

model = build_cavl_model(
    backbone=backbone,
    cut_layer=cfg["cut_layer"],
    encode_fn=None,
    pool_dim=cfg["hidden_dim"],
    proj_hidden=4096,
    proj_out=cfg["projection_output_dim"],
    set_trainable=False,
    tokenizer=tokenizer,
    pooler_type=cfg["pooler_type"],
    head_type=cfg["head_type"],
    num_queries=cfg["num_queries"],
)
model.pool.load_state_dict(ckpt["siam_pool"])
model.head.load_state_dict(ckpt["siam_head"])
model.to(device).eval()

Citation

@misc{cavldoc2026,
  title  = {CaVL-Doc: Curriculum and Active-learning Vision-Language for Document Retrieval},
  author = {Costa, João Paulo},
  year   = {2026},
  url    = {https://huggingface.co/Jpcosta90/arcdoc-unb}
}

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Dataset used to train Jpcosta90/arcdoc-unb

Evaluation results

eer on LA-CDIP
self-reported

0.018