| ---
|
| license: mit
|
| base_model:
|
| - BAAI/bge-reranker-v2-m3
|
| pipeline_tag: sentence-similarity
|
| ---
|
|
|
| # Model Card β BGE-Reranker-VietFinance
|
|
|
| ## Overview
|
|
|
| This is a cross-encoder reranker finetuned from `BAAI/bge-reranker-v2-m3` to score (query, passage) pairs for retrieval reranking in Vietnamese financial/news search systems.
|
|
|
| ## Intended Use
|
|
|
| - Primary: improve Hit@K by re-ranking candidate passages produced by an upstream retriever (BM25 + embedding-based).
|
| - Not for: standalone generation, non-Vietnamese domains, or high-stakes automated decisions without human review.
|
|
|
| ## Essential Statistics
|
|
|
| - Base model: `BAAI/bge-reranker-v2-m3`
|
| - Embedding model (used for retrieval/hard-negative mining): `BAAI/bge-m3`
|
| - Max sequence length for reranking: 1536 tokens (inputs longer than this are truncated)
|
| - Retrieval strategy: temporal-aware hybrid (BM25 + dense embeddings with temporal boosting)
|
| - Saved artifacts in this folder: `model.safetensors`, `tokenizer.json`, `tokenizer_config.json`, `config.json`.
|
|
|
| ## Evaluation (concise)
|
|
|
| - Procedure: retrieve candidate passages (temporal-aware hybrid) β rerank with cross-encoder β compute Hit@K for K β {1,3,5,10,20}.
|
| - Numeric results are saved in run outputs (summary CSVs / JSONL); include them here if you want the actual Hit@K values embedded.
|
|
|
| ## Limitations & Risks
|
|
|
| - Domain-specific: optimized for Vietnamese financial/news passages; generalization outside this domain/language is uncertain.
|
| - Retrieval dependency: reranker cannot recover gold passages not present among retrieval candidates.
|
| - Truncation risk: 1536-token truncation may drop important context for long passages.
|
| - Data & license: dataset provenance and license are not specified here β verify before public distribution.
|
|
|
| ## Bias & Safety
|
|
|
| - Model reflects biases in the source news corpus (topic/regional biases).
|
| - Temporal heuristics can misinterpret ambiguous locale-specific dates and cause incorrect boosts.
|
| - Do not rely on reranker outputs alone for automated financial, legal, or medical decisions.
|
|
|
| ## Quick usage (inference)
|
|
|
| Load this checkpoint with `AutoTokenizer` / `AutoModelForSequenceClassification`, tokenize (query, passage) pairs, score in eval mode, and sort candidates by descending score (higher = more relevant).
|
|
|
| ## License & Citation
|
|
|
| - License: not specified in the checkpoint β confirm before redistribution.
|
| - Cite the base models `BAAI/bge-reranker-v2-m3` and `BAAI/bge-m3` when reporting results. |