You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Log in or Sign Up to review the conditions and access this model content.

Model Card for embeddinggemma-300m-cot-router

embeddinggemma-300m-cot-router 是一個基於 google/embeddinggemma-300m 微調的輕量級 thinking router:對輸入 prompt 判斷其「是否需要進入 thinking 模式才能正確回答」,輸出二元類別。在串接 thinking 與 non-thinking 模型時可作為前端決策器,節省推論成本與延遲。

⚠️ 規格重點: 本模型為 300M 參數 embedding-based classifier不是生成模型。其功能是輸出路由決策(thinking/non-thinking),而非產生文字回應。

Model Details

近年大型語言模型支援 thinking/reasoning 模式(如 o-series、Qwen3、DeepSeek-R1 等),但 thinking 模式會大幅增加 token 用量與延遲。實務上多數 prompt(例如「幫我寫感謝信」「翻譯這段話」)並不需要 chain-of-thought;只有少數需要多步推理的問題才適合進入 thinking 模式。

embeddinggemma-300m-cot-router 是針對此問題設計的輕量級分類器:以 tw-think-router-annotation 為訓練資料微調 EmbeddingGemma 300M,在系統前端決定請求應送往 thinking 或 no-thinking 模型/模式。

核心特點 (Key Features)

  1. Embedding-based 路由:使用 sentence-transformers 風格的小模型做分類,推論成本遠低於使用大型 LLM 做 routing。
  2. 300M 級輕量:可部署於 CPU 或邊緣節點,作為閘道服務。
  3. 節省成本與延遲:在 thinking model 前端攔截不必要的請求,降低 token 消耗與回應時間。

Model Description

Model Sources

Citation

@misc{embeddinggemma_300m_cot_router,
  title        = {embeddinggemma-300m-cot-router: A Lightweight Thinking-mode Routing Classifier},
  author       = {Huang, Liang Hsun},
  year         = {2025},
  howpublished = {\url{https://huggingface.co/lianghsun/embeddinggemma-300m-cot-router}}
}

Acknowledge

  • 特此感謝 APMIC 的算力支援。

Model Card Authors

Huang Liang Hsun

Model Card Contact

Huang Liang Hsun

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for lianghsun/embeddinggemma-300m-cot-router

Finetuned
(240)
this model

Dataset used to train lianghsun/embeddinggemma-300m-cot-router