---
language:
- ko
license: apache-2.0
library_name: sentence-transformers
pipeline_tag: sentence-similarity
base_model: Qwen/Qwen3-Embedding-0.6B
tags:
- sentence-transformers
- feature-extraction
- sentence-similarity
- text-embedding
- information-retrieval
- korean
- finance
- lora
- peft
datasets:
- BCCard/BCAI-Finance-Kor-Embedding-Triplet
- BCCard/BCAI-Finance-Kor-Embedding-Pair
metrics:
- ndcg
- mrr
- recall
---
# 1. Overview
A Korean text-embedding model for the **BC Card domain**, built by LoRA fine-tuning
[`Qwen/Qwen3-Embedding-0.6B`](https://huggingface.co/Qwen/Qwen3-Embedding-0.6B) on BC Card in-domain data (personal / merchant / corporate / VIP). It is intended as the **retriever (bi-encoder)** stage of a BC Card RAG pipeline.
On a held-out in-domain test set it improves **NDCG@10 by +8.2%** and **Accuracy@1 by +11.3%** over the base model.
## 1.1. TL;DR
* **Base model**: [`Qwen/Qwen3-Embedding-0.6B`](https://huggingface.co/Qwen/Qwen3-Embedding-0.6B) — 28 layers, hidden 1024, last-token pooling, instruction-aware
* **Domain / Language**: Finance (BC Card — personal / merchant / corporate / VIP) / Korean
* **Task**: Query-document retrieval (QA search, document similarity), RAG retriever
* **Method**: PEFT (LoRA) + Multiple Negatives Ranking (contrastive)
* **Format**: merged standalone (LoRA fused into base; loads with `sentence-transformers`, no `peft`)
* **Embedding dimension**: 1024 · **Max sequence length**: 1024 · **Similarity**: cosine (outputs are L2-normalized)
* **Intended use**
- In-house **BC Card-domain RAG retriever** (Top-K candidate retrieval)
- QA search, document-similarity scoring
## 1.2. Usage
The model was trained with an **instruction prefix on the query side only** (documents get no
instruction). Inject the same instruction at inference so query/document encoding matches training.
```python
from sentence_transformers import SentenceTransformer
model = SentenceTransformer("BCCard/MoAI-Embedding-0.6B")
# Query-side instruction (identical to training) - prepend to every query at inference time
QUERY_INSTRUCTION = "Instruct: Given a web search query, retrieve relevant passages that answer the query\nQuery: "
queries = ["BC카드 연회비는 어떻게 되나요?"]
documents = [
"BC카드 연회비는 카드 종류와 혜택 구성에 따라 다르게 책정됩니다 ...",
"바로카드 연회비는 국내 전용과 해외 겸용 여부에 따라 차등 부과됩니다 ...",
"전월 실적 등 조건을 충족하면 다음 해 연회비가 면제되는 카드도 있습니다 ...",
"카드 분실 신고는 고객센터 또는 앱에서 즉시 가능합니다 ...",
...
]
# Queries: inject the instruction · Documents: no instruction
q_emb = model.encode(queries, prompt=QUERY_INSTRUCTION)
d_emb = model.encode(documents)
scores = model.similarity(q_emb, d_emb) # cosine; rank documents by score
print(scores)
```
> The instruction is also stored in the model config, so `model.encode(queries, prompt_name="query")`
> is equivalent to passing `prompt=QUERY_INSTRUCTION` explicitly. Documents use no prompt
> (`prompt_name="document"` is an empty string).
* **Query prompt** (instruction): `Instruct: Given a web search query, retrieve relevant passages that answer the query\nQuery: `
* **Document prompt**: none
## 1.3. Training Data
| Dataset | Role | Size |
|---------|------|------|
| [BCAI-Finance-Kor-Embedding-Triplet](https://huggingface.co/datasets/BCCard/BCAI-Finance-Kor-Embedding-Triplet) | Training (anchor / positive / negative) | 43,394 triplets (train) |
| [BCAI-Finance-Kor-Embedding-Pair](https://huggingface.co/datasets/BCCard/BCAI-Finance-Kor-Embedding-Pair) | Corpus pool / evaluation | 36,281 unique chunks |
* Sources: BC Card financial QA (BCAI) + website crawl + synthetic data (chunking + multi-query generation)
* Triplets are constructed via **hard-negative mining** over the unified corpus.
## 1.4. Training Procedure
| Item | Value |
|------|-------|
| Method | LoRA (PEFT) |
| LoRA | r=64, alpha=128, dropout=0.05, targets = q,k,v,o,gate,up,down_proj |
| Loss | CachedMultipleNegativesRankingLoss (in-batch negatives) |
| Batch | per-device 256 (DDP) → 511 in-batch negatives per rank |
| LR / scheduler | 1e-4 / cosine, warmup_ratio 0.1, weight_decay 0.01 |
| Epochs | 3, early stopping — best checkpoint selected by validation NDCG@10 |
| Precision | bf16, gradient checkpointing |
| Hardware | 6× NVIDIA L40S (DDP) |
# 2. Evaluation
## 2.1. Setup
* **Queries**: 1,000 (held-out test split) · **Corpus**: 36,281 unique chunks
* **Protocol**: binary-relevance information retrieval; the same evaluator used during training
* **Metrics**: NDCG@10 (primary), MRR@10, Recall@{1,10}, Accuracy@1, MAP@10
* **Models compared**: base (`Qwen3-Embedding-0.6B`, no fine-tuning) vs. v1 (r32 / lr2e-4 / 4ep) vs. **v2 (r64 / lr1e-4 / 3ep, released)**
## 2.2. Training