Tevatron/wiki-ss-corpus
Viewer • Updated • 1.2M • 2.18k • 6
This model fine-tunes SmolVLM for document screenshot embedding tasks using contrastive learning.
from transformers import AutoProcessor, AutoModelForVision2Seq
# Load model and processor
processor = AutoProcessor.from_pretrained("sugiv/smolvlm-dse")
model = AutoModelForVision2Seq.from_pretrained("sugiv/smolvlm-dse")
# Process query
query_inputs = processor(
text=query_text,
return_tensors="pt",
padding=True,
truncation=True
)
# Process document image
image_inputs = processor(
images=document_image,
return_tensors="pt"
)
# Get embeddings
query_embedding = model.encode_query(query_inputs)
doc_embedding = model.encode_passage(image_inputs)
deepspeed --include localhost:0 --master_port 60000 train.py \
--deepspeed ds_zero2_config.json \
--output_dir retriever-smolvlm \
--model_name_or_path HuggingFaceTB/SmolVLM-256M-Base \
--save_steps 50 \
--dataset_name Tevatron/wiki-ss-nq \
--corpus_name Tevatron/wiki-ss-corpus \
--cache_dir ./cached_datasets \
--query_prefix "Query: " \
--passage_prefix "Passage: " \
--bf16 \
--pooling last \
--normalize \
--temperature 0.02 \
--per_device_train_batch_size 8 \
--gradient_checkpointing \
--train_group_size 16 \
--learning_rate 1e-5 \
--weight_decay 0.01 \
--query_max_len 128 \
--passage_max_len 512 \
--num_train_epochs 1
Same as base model HuggingFaceTB/SmolVLM-256M-Base
@article{Gao2022TevatronAE, title={Tevatron: An Efficient and Flexible Toolkit for Dense Retrieval}, author={Luyu Gao and Xueguang Ma and Jimmy J. Lin and Jamie Callan}, journal={ArXiv}, year={2022}, volume={abs/2203.05765} }
Base model
HuggingFaceTB/SmolLM2-1.7B