BPVELA-G300M

BPVELA-G300M is the efficiency-first BPVELA release line for Traditional Chinese retrieval and embedding use cases.

繁體中文說明

BPVELA-G300M 是 BPVELA 目前的 efficiency-first 系列,針對繁體中文語意檢索與較低成本部署場景做優化,適合需要兼顧效果與資源效率的使用情境。

模型摘要

  • 系列版本:v1.0.0
  • 基底模型:google/embeddinggemma-300m
  • 釋出形式:LoRA adapter 加上 SentenceTransformer 組件
  • 建議用途:semantic retrieval、retrieval-first RAG、較低成本 embedding deployment
  • 主要語言:Traditional Chinese / 繁體中文

重要說明

這個 repository 釋出的是 LoRA adapter,不是 merged full checkpoint。使用時需要以 base model 為底,再載入這個 adapter。

存取前置條件

BPVELA-G300M 建立於 google/embeddinggemma-300m 之上,因此除了本 adapter repository 之外,使用者也必須能夠存取上游 Gemma base model。

  • 請先在 Hugging Face 上完成 google/embeddinggemma-300m 的 gated access 申請與條款同意
  • 若使用 fine-grained token,請確認 token 已開啟 public gated repositories 的讀取權限
  • 若載入時出現 401 Unauthorized403 Forbidden,且訊息指向 google/embeddinggemma-300m/resolve/...,通常表示缺少上游 Gemma 存取權,而不是本 adapter repository 本身有問題

驗證摘要

  • Taiwan-md pair benchmark:Spearman 0.8319、Pearson 0.8953
  • Wrapped retrieval smoke:pass rate 1.0000、retrieval hit rate 1.0000、top-1 rate 0.8333

Query / Document 格式

這條模型線基於 EmbeddingGemma,做檢索時建議保留 prompt-style 格式。

  • Query:task: search result | query: 你的問題
  • Document:title: none | text: 文件內容

備註

  • bpvela_model_config.yaml 保留了專案內部使用的載入設定。
  • 這個公開模型 repo 不需要包含 Taiwan-md corpus 或 FAISS index。
  • 公開前請再確認最終 license。

授權說明

  • Taiwan-MD 內容授權:CC BY-SA 4.0
  • BPVELA 專案程式碼授權:MIT
  • 基底模型 google/embeddinggemma-300m:Hugging Face 標示為 gemma,且需同意 Google 的 Gemma 使用條款
  • 本 repo 釋出的 adapter 權重與模型卡內容,建議以 CC BY-SA 4.0 方式對外說明

本 repository 公開的是 BPVELA-G300M 的 LoRA adapter 權重、模型卡與相關說明文件,並不包含 google/embeddinggemma-300m 的完整基底模型權重。

BPVELA-G300M 的訓練與優化過程使用了 Taiwan-MD 內容;依目前資料來源條件,建議將本 adapter 權重與模型卡內容以 CC BY-SA 4.0 對外說明與散布。

任何再散布、修改版散布、或以本 adapter 為基礎的公開衍生釋出,建議:

  • 保留原始出處與適當署名
  • 清楚標示修改情形
  • 以相同或相容的分享方式提供衍生內容

此外,因本 adapter 建立於 google/embeddinggemma-300m 之上,任何載入、使用、分享或部署行為,仍須另外遵守上游 Gemma 模型的使用條款與限制。

Summary

  • Series version: v1.0.0
  • Base model: google/embeddinggemma-300m
  • Release type: LoRA adapter plus SentenceTransformer modules
  • Recommended usage: semantic retrieval, retrieval-first RAG, lower-cost embedding deployment
  • Language focus: Traditional Chinese

Important

This repository contains a LoRA adapter release, not a merged full checkpoint. Load it on top of the base model.

Access Requirements

BPVELA-G300M is built on top of google/embeddinggemma-300m, so users must be able to access the upstream Gemma base model in addition to this adapter repository.

  • Request and accept gated access for google/embeddinggemma-300m on Hugging Face first
  • If you use a fine-grained token, enable read access to public gated repositories
  • If loading fails with 401 Unauthorized or 403 Forbidden against google/embeddinggemma-300m/resolve/..., the issue is usually missing upstream Gemma access rather than a problem with this adapter repository

Validation Snapshot

  • Taiwan-md pair benchmark: Spearman 0.8319, Pearson 0.8953
  • Wrapped retrieval smoke: pass rate 1.0000, retrieval hit rate 1.0000, top-1 rate 0.8333

Query And Document Formatting

This line is based on EmbeddingGemma. For retrieval, keep the prompt-style formatting.

  • Query: task: search result | query: your question
  • Document: title: none | text: your document

Loading Example

from sentence_transformers import SentenceTransformer
from sentence_transformers.models import Dense, Normalize, Pooling, Transformer
from peft import PeftModel

base_model = "google/embeddinggemma-300m"
adapter_repo = "BluePlanetAI/BPVELA-G300M"

transformer = Transformer(base_model)
transformer.auto_model = PeftModel.from_pretrained(
    transformer.auto_model,
    adapter_repo,
    is_trainable=False,
)

pooling = Pooling.load(adapter_repo, subfolder="1_Pooling")
dense_1 = Dense.load(adapter_repo, subfolder="2_Dense")
dense_2 = Dense.load(adapter_repo, subfolder="3_Dense")
normalize = Normalize.load(adapter_repo, subfolder="4_Normalize")

model = SentenceTransformer(modules=[transformer, pooling, dense_1, dense_2, normalize])

emb = model.encode(["task: search result | query: 台灣颱風災害應變流程"], normalize_embeddings=True)
print(len(emb[0]))

Notes

  • bpvela_model_config.yaml is included as the project-side loading reference.
  • This public model repo does not need to include the Taiwan-md corpus or FAISS index.
  • Release owner should finalize the public license before publishing.

License Notes

  • Taiwan-MD content license: CC BY-SA 4.0
  • BPVELA project code license: MIT
  • Base model google/embeddinggemma-300m: marked as gemma on Hugging Face and access is gated by Google's Gemma terms
  • The adapter weights and model card content published in this repo are best documented as CC BY-SA 4.0

This repository publishes the BPVELA-G300M LoRA adapter weights, model card, and related documentation only. It does not redistribute the full base-model weights of google/embeddinggemma-300m.

Because the training and optimization process uses Taiwan-MD content, the adapter release and model card are best documented for public distribution under CC BY-SA 4.0.

For redistribution, modified redistribution, or public derivative releases based on this adapter, users should:

  • preserve attribution to the original release
  • clearly indicate modifications
  • keep the share-alike expectations for the released derivative materials

In addition, any use, sharing, or deployment of this adapter remains subject to the upstream Gemma model terms and restrictions.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for BluePlanetAI/BPVELA-G300M

Adapter
(8)
this model

Space using BluePlanetAI/BPVELA-G300M 1