| --- |
| title: README |
| emoji: 🐨 |
| colorFrom: red |
| colorTo: yellow |
| sdk: static |
| pinned: false |
| thumbnail: >- |
| https://cdn-uploads.huggingface.co/production/uploads/60fa66a3c4c6bd8c56ee541f/FEldjz5JpuoMy8dauKA2M.png |
| short_description: pool compute for huge model inference |
| --- |
| |
| mesh-llm turns spare compute into a peer-to-peer inference cloud for open models. |
|
|
| mesh-llm pools GPUs across macOS and Linux machines so teams, researchers, and agents can run local or open-weight models through one OpenAI-compatible endpoint. It can serve a model on one node, distribute large models across nearby peers, route requests to specialized models, and let agents coordinate through mesh gossip. |
|
|
|
|
| What it is for |
| * Share spare GPU capacity across trusted machines. |
| * Run open models locally without a centralized inference provider. |
| * Serve an OpenAI-compatible API at http://localhost:9337/v1. |
| * Route requests across multiple nodes, models, and capabilities. |
| * Experiment with distributed inference, MoE expert sharding, and agent collaboration. |
|
|
|
|
| see: https://docs.anarchai.org/ |
| and: https://github.com/mesh-LLM/ |
|
|
| Mesh uses a pipelined/network aware distributed inference approach built on llama.cpp called "skippy" - https://github.com/Mesh-LLM/hf-mesh-skippy-splitter contains current code which prepares models so layers can be efficiently JIT downloaded for participating nodes. |