| pipeline_tag: sentence-similarity | |
| tags: | |
| - sentence-transformers | |
| - feature-extraction | |
| - sentence-similarity | |
| - lore | |
| - logic-oriented-retrieval | |
| license: mit | |
| For more details please refer to our github repo: https://github.com/FlagOpen/FlagEmbedding | |
| # Lore-Bge3: Logic-ORiented Retriever Enhancement for BGE-M3 | |
| This model is a fine-tuned version of [BAAI/bge-m3](https://arxiv.org/pdf/2402.03216.pdf) using the LORE (Logic-ORiented Retriever Enhancement) method. It significantly improves retrieval performance for complex logical expressions and queries. | |
| ## LORE Method Overview | |
| LORE is a novel embedding enhancement method that improves retrieval performance through fine-grained contrastive learning: | |
| - **Three-tier Contrastive Learning**: Fine-grained sample classification with P (Positive), N1 (Distractor), and N2 (Negative) samples | |
| - **Dual Encoder Architecture**: Frozen document encoder M_d and trainable query encoder M_q | |
| - **InfoNCE-based Loss**: Differentiated weights for hierarchical separation P ≻ N1 ≻ N2 | |
| - **Query Rewriting**: LLM-assisted dataset construction with discourse relations from Rhetorical Structure Theory (RST) | |
| - **No External Dependencies**: Requires no external supervision, resources, or pre-retrieval analysis | |
| ## Key Improvements | |
| - **Enhanced Logical Reasoning**: Improved ability to handle complex logical expressions in queries | |
| - **Fine-grained Discrimination**: Better distinction between relevant content and distractors | |
| - **Maintained Efficiency**: Preserves the computational efficiency of the original model | |