Model Card β€” BGE-Reranker-VietFinance

Overview

This is a cross-encoder reranker finetuned from BAAI/bge-reranker-v2-m3 to score (query, passage) pairs for retrieval reranking in Vietnamese financial/news search systems.

Intended Use

  • Primary: improve Hit@K by re-ranking candidate passages produced by an upstream retriever (BM25 + embedding-based).
  • Not for: standalone generation, non-Vietnamese domains, or high-stakes automated decisions without human review.

Essential Statistics

  • Base model: BAAI/bge-reranker-v2-m3
  • Embedding model (used for retrieval/hard-negative mining): BAAI/bge-m3
  • Max sequence length for reranking: 1536 tokens (inputs longer than this are truncated)
  • Retrieval strategy: temporal-aware hybrid (BM25 + dense embeddings with temporal boosting)
  • Saved artifacts in this folder: model.safetensors, tokenizer.json, tokenizer_config.json, config.json.

Evaluation (concise)

  • Procedure: retrieve candidate passages (temporal-aware hybrid) β†’ rerank with cross-encoder β†’ compute Hit@K for K ∈ {1,3,5,10,20}.
  • Numeric results are saved in run outputs (summary CSVs / JSONL); include them here if you want the actual Hit@K values embedded.

Limitations & Risks

  • Domain-specific: optimized for Vietnamese financial/news passages; generalization outside this domain/language is uncertain.
  • Retrieval dependency: reranker cannot recover gold passages not present among retrieval candidates.
  • Truncation risk: 1536-token truncation may drop important context for long passages.
  • Data & license: dataset provenance and license are not specified here β€” verify before public distribution.

Bias & Safety

  • Model reflects biases in the source news corpus (topic/regional biases).
  • Temporal heuristics can misinterpret ambiguous locale-specific dates and cause incorrect boosts.
  • Do not rely on reranker outputs alone for automated financial, legal, or medical decisions.

Quick usage (inference)

Load this checkpoint with AutoTokenizer / AutoModelForSequenceClassification, tokenize (query, passage) pairs, score in eval mode, and sort candidates by descending score (higher = more relevant).

License & Citation

  • License: not specified in the checkpoint β€” confirm before redistribution.
  • Cite the base models BAAI/bge-reranker-v2-m3 and BAAI/bge-m3 when reporting results.
Downloads last month
22
Safetensors
Model size
0.6B params
Tensor type
F32
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for tiam4tt/BGE-Reranker-VietFinance

Finetuned
(77)
this model