Instructions to use BluePlanetAI/BPVELA-G300M with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- sentence-transformers
How to use BluePlanetAI/BPVELA-G300M with sentence-transformers:
from sentence_transformers import SentenceTransformer model = SentenceTransformer("BluePlanetAI/BPVELA-G300M") sentences = [ "那是 個快樂的人", "那是 條快樂的狗", "那是 個非常幸福的人", "今天是晴天" ] embeddings = model.encode(sentences) similarities = model.similarity(embeddings, embeddings) print(similarities.shape) # [4, 4] - Notebooks
- Google Colab
- Kaggle
BPVELA-G300M
BPVELA-G300M is the efficiency-first BPVELA release line for Traditional Chinese retrieval and embedding use cases.
繁體中文說明
BPVELA-G300M 是 BPVELA 目前的 efficiency-first 系列,針對繁體中文語意檢索與較低成本部署場景做優化,適合需要兼顧效果與資源效率的使用情境。
模型摘要
- 系列版本:
v1.0.0 - 基底模型:
google/embeddinggemma-300m - 釋出形式:LoRA adapter 加上 SentenceTransformer 組件
- 建議用途:semantic retrieval、retrieval-first RAG、較低成本 embedding deployment
- 主要語言:Traditional Chinese / 繁體中文
重要說明
這個 repository 釋出的是 LoRA adapter,不是 merged full checkpoint。使用時需要以 base model 為底,再載入這個 adapter。
存取前置條件
BPVELA-G300M 建立於 google/embeddinggemma-300m 之上,因此除了本 adapter repository 之外,使用者也必須能夠存取上游 Gemma base model。
- 請先在 Hugging Face 上完成
google/embeddinggemma-300m的 gated access 申請與條款同意 - 若使用 fine-grained token,請確認 token 已開啟 public gated repositories 的讀取權限
- 若載入時出現
401 Unauthorized或403 Forbidden,且訊息指向google/embeddinggemma-300m/resolve/...,通常表示缺少上游 Gemma 存取權,而不是本 adapter repository 本身有問題
驗證摘要
- Taiwan-md pair benchmark:Spearman
0.8319、Pearson0.8953 - Wrapped retrieval smoke:pass rate
1.0000、retrieval hit rate1.0000、top-1 rate0.8333
Query / Document 格式
這條模型線基於 EmbeddingGemma,做檢索時建議保留 prompt-style 格式。
- Query:
task: search result | query: 你的問題 - Document:
title: none | text: 文件內容
備註
bpvela_model_config.yaml保留了專案內部使用的載入設定。- 這個公開模型 repo 不需要包含 Taiwan-md corpus 或 FAISS index。
- 公開前請再確認最終 license。
授權說明
- Taiwan-MD 內容授權:
CC BY-SA 4.0 - BPVELA 專案程式碼授權:
MIT - 基底模型
google/embeddinggemma-300m:Hugging Face 標示為gemma,且需同意 Google 的 Gemma 使用條款 - 本 repo 釋出的 adapter 權重與模型卡內容,建議以
CC BY-SA 4.0方式對外說明
本 repository 公開的是 BPVELA-G300M 的 LoRA adapter 權重、模型卡與相關說明文件,並不包含 google/embeddinggemma-300m 的完整基底模型權重。
BPVELA-G300M 的訓練與優化過程使用了 Taiwan-MD 內容;依目前資料來源條件,建議將本 adapter 權重與模型卡內容以 CC BY-SA 4.0 對外說明與散布。
任何再散布、修改版散布、或以本 adapter 為基礎的公開衍生釋出,建議:
- 保留原始出處與適當署名
- 清楚標示修改情形
- 以相同或相容的分享方式提供衍生內容
此外,因本 adapter 建立於 google/embeddinggemma-300m 之上,任何載入、使用、分享或部署行為,仍須另外遵守上游 Gemma 模型的使用條款與限制。
Summary
- Series version:
v1.0.0 - Base model:
google/embeddinggemma-300m - Release type: LoRA adapter plus SentenceTransformer modules
- Recommended usage: semantic retrieval, retrieval-first RAG, lower-cost embedding deployment
- Language focus: Traditional Chinese
Important
This repository contains a LoRA adapter release, not a merged full checkpoint. Load it on top of the base model.
Access Requirements
BPVELA-G300M is built on top of google/embeddinggemma-300m, so users must be able to access the upstream Gemma base model in addition to this adapter repository.
- Request and accept gated access for
google/embeddinggemma-300mon Hugging Face first - If you use a fine-grained token, enable read access to public gated repositories
- If loading fails with
401 Unauthorizedor403 Forbiddenagainstgoogle/embeddinggemma-300m/resolve/..., the issue is usually missing upstream Gemma access rather than a problem with this adapter repository
Validation Snapshot
- Taiwan-md pair benchmark: Spearman
0.8319, Pearson0.8953 - Wrapped retrieval smoke: pass rate
1.0000, retrieval hit rate1.0000, top-1 rate0.8333
Query And Document Formatting
This line is based on EmbeddingGemma. For retrieval, keep the prompt-style formatting.
- Query:
task: search result | query: your question - Document:
title: none | text: your document
Loading Example
from sentence_transformers import SentenceTransformer
from sentence_transformers.models import Dense, Normalize, Pooling, Transformer
from peft import PeftModel
base_model = "google/embeddinggemma-300m"
adapter_repo = "BluePlanetAI/BPVELA-G300M"
transformer = Transformer(base_model)
transformer.auto_model = PeftModel.from_pretrained(
transformer.auto_model,
adapter_repo,
is_trainable=False,
)
pooling = Pooling.load(adapter_repo, subfolder="1_Pooling")
dense_1 = Dense.load(adapter_repo, subfolder="2_Dense")
dense_2 = Dense.load(adapter_repo, subfolder="3_Dense")
normalize = Normalize.load(adapter_repo, subfolder="4_Normalize")
model = SentenceTransformer(modules=[transformer, pooling, dense_1, dense_2, normalize])
emb = model.encode(["task: search result | query: 台灣颱風災害應變流程"], normalize_embeddings=True)
print(len(emb[0]))
Notes
bpvela_model_config.yamlis included as the project-side loading reference.- This public model repo does not need to include the Taiwan-md corpus or FAISS index.
- Release owner should finalize the public license before publishing.
License Notes
- Taiwan-MD content license:
CC BY-SA 4.0 - BPVELA project code license:
MIT - Base model
google/embeddinggemma-300m: marked asgemmaon Hugging Face and access is gated by Google's Gemma terms - The adapter weights and model card content published in this repo are best documented as
CC BY-SA 4.0
This repository publishes the BPVELA-G300M LoRA adapter weights, model card, and related documentation only. It does not redistribute the full base-model weights of google/embeddinggemma-300m.
Because the training and optimization process uses Taiwan-MD content, the adapter release and model card are best documented for public distribution under CC BY-SA 4.0.
For redistribution, modified redistribution, or public derivative releases based on this adapter, users should:
- preserve attribution to the original release
- clearly indicate modifications
- keep the share-alike expectations for the released derivative materials
In addition, any use, sharing, or deployment of this adapter remains subject to the upstream Gemma model terms and restrictions.
Model tree for BluePlanetAI/BPVELA-G300M
Base model
google/embeddinggemma-300m