Instructions to use Octen/Octen-Embedding-8B with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- sentence-transformers
How to use Octen/Octen-Embedding-8B with sentence-transformers:
from sentence_transformers import SentenceTransformer model = SentenceTransformer("Octen/Octen-Embedding-8B") sentences = [ "That is a happy person", "That is a happy dog", "That is a very happy person", "Today is a sunny day" ] embeddings = model.encode(sentences) similarities = model.similarity(embeddings, embeddings) print(similarities.shape) # [4, 4] - Notebooks
- Google Colab
- Kaggle
SGLang deploy commands
Could you please share recommended SGLang deploy commands, I currently use a rtx 5090 and a pro 6000. If all goes well, I might jump from a 4B model to 8B model with data-parallel pipeline of 2.
I’m not sure whether sglang supports deployment for this yet, but we’ve used vLLM and it does work.
You can refer to this example for details: https://huggingface.co/Qwen/Qwen3-Embedding-8B#vllm-usage
What ran well for me.
python -m sglang.launch_server
--model-path Octen/Octen-Embedding-0.6B
--host 0.0.0.0
--port 5000
--is-embedding
--enable-trace
--enable-metrics
--otlp-traces-endpoint 0.0.0.0:4317
--mem-fraction-static 0.20
--log-requests
--show-time-cost
--data-parallel-size 2
--load-balance-method auto
--max-running-requests 64