--- tags: - mlx - embeddings - apple-silicon - sentence-transformers license: apache-2.0 base_model: Octen/Octen-Embedding-8B library_name: mlx --- # Octen-Embedding-8B-mlx Pre-converted [MLX](https://github.com/ml-explore/mlx) weights for [Octen-Embedding-8B](https://huggingface.co/Octen/Octen-Embedding-8B), ready to run on Apple Silicon. ## Why this exists The original model requires a ~30 minute conversion step and ~32GB temporary disk space. This repo provides the already-converted MLX weights so you can start embedding immediately. ## Usage With [octen-embeddings-server](https://github.com/c-h-/octen-embeddings-server): ```bash # Clone the server git clone https://github.com/c-h-/octen-embeddings-server.git cd octen-embeddings-server pip install -r requirements.txt # Download pre-converted weights (instead of running convert_model.py) huggingface-cli download chulcher/Octen-Embedding-8B-mlx --local-dir models/Octen-Embedding-8B-mlx # Start the server python3 server.py ``` The server exposes an OpenAI-compatible `/v1/embeddings` endpoint at `http://localhost:8100`. ## Hardware Requirements | Component | Requirement | |-----------|-------------| | CPU | Apple Silicon (M1/M2/M3/M4) | | RAM | 20 GB+ | | Disk | ~16 GB for weights | | OS | macOS 13+ | ## Performance Octen-Embedding-8B ranks #1 on MTEB/RTEB with a score of 0.8045, outperforming commercial embedding APIs. Typical latency on Apple Silicon: ~50-200ms per text depending on length. ## License Apache 2.0 (same as base model)