| tags: | |
| - mlx | |
| - embeddings | |
| - apple-silicon | |
| - sentence-transformers | |
| license: apache-2.0 | |
| base_model: Octen/Octen-Embedding-8B | |
| library_name: mlx | |
| # Octen-Embedding-8B-mlx | |
| Pre-converted [MLX](https://github.com/ml-explore/mlx) weights for [Octen-Embedding-8B](https://huggingface.co/Octen/Octen-Embedding-8B), ready to run on Apple Silicon. | |
| ## Why this exists | |
| The original model requires a ~30 minute conversion step and ~32GB temporary disk space. This repo provides the already-converted MLX weights so you can start embedding immediately. | |
| ## Usage | |
| With [octen-embeddings-server](https://github.com/c-h-/octen-embeddings-server): | |
| ```bash | |
| # Clone the server | |
| git clone https://github.com/c-h-/octen-embeddings-server.git | |
| cd octen-embeddings-server | |
| pip install -r requirements.txt | |
| # Download pre-converted weights (instead of running convert_model.py) | |
| huggingface-cli download chulcher/Octen-Embedding-8B-mlx --local-dir models/Octen-Embedding-8B-mlx | |
| # Start the server | |
| python3 server.py | |
| ``` | |
| The server exposes an OpenAI-compatible `/v1/embeddings` endpoint at `http://localhost:8100`. | |
| ## Hardware Requirements | |
| | Component | Requirement | | |
| |-----------|-------------| | |
| | CPU | Apple Silicon (M1/M2/M3/M4) | | |
| | RAM | 20 GB+ | | |
| | Disk | ~16 GB for weights | | |
| | OS | macOS 13+ | | |
| ## Performance | |
| Octen-Embedding-8B ranks #1 on MTEB/RTEB with a score of 0.8045, outperforming commercial embedding APIs. | |
| Typical latency on Apple Silicon: ~50-200ms per text depending on length. | |
| ## License | |
| Apache 2.0 (same as base model) | |