BPVELA-G300M

BPVELA-G300M is the efficiency-first BPVELA release line for Traditional Chinese retrieval and embedding use cases.

繁體中文說明

BPVELA-G300M 是 BPVELA 目前的 efficiency-first 系列，針對繁體中文語意檢索與較低成本部署場景做優化，適合需要兼顧效果與資源效率的使用情境。

模型摘要

系列版本：v1.0.0
基底模型：google/embeddinggemma-300m
釋出形式：LoRA adapter 加上 SentenceTransformer 組件
建議用途：semantic retrieval、retrieval-first RAG、較低成本 embedding deployment
主要語言：Traditional Chinese / 繁體中文

重要說明

這個 repository 釋出的是 LoRA adapter，不是 merged full checkpoint。使用時需要以 base model 為底，再載入這個 adapter。

存取前置條件

BPVELA-G300M 建立於 google/embeddinggemma-300m 之上，因此除了本 adapter repository 之外，使用者也必須能夠存取上游 Gemma base model。

請先在 Hugging Face 上完成 google/embeddinggemma-300m 的 gated access 申請與條款同意
若使用 fine-grained token，請確認 token 已開啟 public gated repositories 的讀取權限
若載入時出現 401 Unauthorized 或 403 Forbidden，且訊息指向 google/embeddinggemma-300m/resolve/...，通常表示缺少上游 Gemma 存取權，而不是本 adapter repository 本身有問題

驗證摘要

Taiwan-md pair benchmark：Spearman 0.8319、Pearson 0.8953
Wrapped retrieval smoke：pass rate 1.0000、retrieval hit rate 1.0000、top-1 rate 0.8333

Query / Document 格式

這條模型線基於 EmbeddingGemma，做檢索時建議保留 prompt-style 格式。

Query：task: search result | query: 你的問題
Document：title: none | text: 文件內容

備註

bpvela_model_config.yaml 保留了專案內部使用的載入設定。
這個公開模型 repo 不需要包含 Taiwan-md corpus 或 FAISS index。
公開前請再確認最終 license。

授權說明

Taiwan-MD 內容授權：CC BY-SA 4.0
BPVELA 專案程式碼授權：MIT
基底模型 google/embeddinggemma-300m：Hugging Face 標示為 gemma，且需同意 Google 的 Gemma 使用條款
本 repo 釋出的 adapter 權重與模型卡內容，建議以 CC BY-SA 4.0 方式對外說明

本 repository 公開的是 BPVELA-G300M 的 LoRA adapter 權重、模型卡與相關說明文件，並不包含 google/embeddinggemma-300m 的完整基底模型權重。

BPVELA-G300M 的訓練與優化過程使用了 Taiwan-MD 內容；依目前資料來源條件，建議將本 adapter 權重與模型卡內容以 CC BY-SA 4.0 對外說明與散布。

任何再散布、修改版散布、或以本 adapter 為基礎的公開衍生釋出，建議：

保留原始出處與適當署名
清楚標示修改情形
以相同或相容的分享方式提供衍生內容

此外，因本 adapter 建立於 google/embeddinggemma-300m 之上，任何載入、使用、分享或部署行為，仍須另外遵守上游 Gemma 模型的使用條款與限制。

Summary

Series version: v1.0.0
Base model: google/embeddinggemma-300m
Release type: LoRA adapter plus SentenceTransformer modules
Recommended usage: semantic retrieval, retrieval-first RAG, lower-cost embedding deployment
Language focus: Traditional Chinese

Important

This repository contains a LoRA adapter release, not a merged full checkpoint. Load it on top of the base model.

Access Requirements

BPVELA-G300M is built on top of google/embeddinggemma-300m, so users must be able to access the upstream Gemma base model in addition to this adapter repository.

Request and accept gated access for google/embeddinggemma-300m on Hugging Face first
If you use a fine-grained token, enable read access to public gated repositories
If loading fails with 401 Unauthorized or 403 Forbidden against google/embeddinggemma-300m/resolve/..., the issue is usually missing upstream Gemma access rather than a problem with this adapter repository

Validation Snapshot

Taiwan-md pair benchmark: Spearman 0.8319, Pearson 0.8953
Wrapped retrieval smoke: pass rate 1.0000, retrieval hit rate 1.0000, top-1 rate 0.8333

Query And Document Formatting

This line is based on EmbeddingGemma. For retrieval, keep the prompt-style formatting.

Query: task: search result | query: your question
Document: title: none | text: your document

Loading Example

from sentence_transformers import SentenceTransformer
from sentence_transformers.models import Dense, Normalize, Pooling, Transformer
from peft import PeftModel

base_model = "google/embeddinggemma-300m"
adapter_repo = "BluePlanetAI/BPVELA-G300M"

transformer = Transformer(base_model)
transformer.auto_model = PeftModel.from_pretrained(
    transformer.auto_model,
    adapter_repo,
    is_trainable=False,
)

pooling = Pooling.load(adapter_repo, subfolder="1_Pooling")
dense_1 = Dense.load(adapter_repo, subfolder="2_Dense")
dense_2 = Dense.load(adapter_repo, subfolder="3_Dense")
normalize = Normalize.load(adapter_repo, subfolder="4_Normalize")

model = SentenceTransformer(modules=[transformer, pooling, dense_1, dense_2, normalize])

emb = model.encode(["task: search result | query: 台灣颱風災害應變流程"], normalize_embeddings=True)
print(len(emb[0]))

Notes

bpvela_model_config.yaml is included as the project-side loading reference.
This public model repo does not need to include the Taiwan-md corpus or FAISS index.
Release owner should finalize the public license before publishing.

License Notes

Taiwan-MD content license: CC BY-SA 4.0
BPVELA project code license: MIT
Base model google/embeddinggemma-300m: marked as gemma on Hugging Face and access is gated by Google's Gemma terms
The adapter weights and model card content published in this repo are best documented as CC BY-SA 4.0

This repository publishes the BPVELA-G300M LoRA adapter weights, model card, and related documentation only. It does not redistribute the full base-model weights of google/embeddinggemma-300m.

Because the training and optimization process uses Taiwan-MD content, the adapter release and model card are best documented for public distribution under CC BY-SA 4.0.

For redistribution, modified redistribution, or public derivative releases based on this adapter, users should:

preserve attribution to the original release
clearly indicate modifications
keep the share-alike expectations for the released derivative materials

In addition, any use, sharing, or deployment of this adapter remains subject to the upstream Gemma model terms and restrictions.

Downloads last month: -; Downloads are not tracked for this model. How to track

Model tree for BluePlanetAI/BPVELA-G300M

Base model

google/embeddinggemma-300m

Adapter

(8)

this model

BluePlanetAI
/

BPVELA-G300M