How to use from the
Use from the
llama-cpp-python library
# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="meshllm/Llama-3.3-70B-Instruct-Q3_K_M-draft-layers",
	filename="",
)
llm.create_chat_completion(
	messages = [
		{
			"role": "user",
			"content": "What is the capital of France?"
		}
	]
)
Mesh LLM

Llama-3.3-70B-Instruct-Q3_K_M

Distributed GGUF inference package for Mesh LLM

Website GitHub Discord

GGUF layer package for running Llama-3.3-70B-Instruct-Q3_K_M across a local Mesh LLM cluster.

This package is derived from unsloth/Llama-3.3-70B-Instruct-GGUF and keeps the original GGUF distribution split into per-layer artifacts for distributed inference.

Highlights

Run locally Pool multiple machines OpenAI-compatible Package variant
Private inference on your hardware Split layers across peers Serve /v1/chat/completions locally Q3_K_M layer package

Model Overview

Property Value
Source model unsloth/Llama-3.3-70B-Instruct-GGUF
Model id unsloth/Llama-3.3-70B-Instruct-GGUF:Q3_K_M
Family Llama
Parameter scale 70B
Quantization Q3_K_M
Layer count 80
Activation width 8192
Package size 32.5 GB
Source file Llama-3.3-70B-Instruct-Q3_K_M.gguf
Package repo meshllm/Llama-3.3-70B-Instruct-Q3_K_M-draft-layers

Recommended Use

  • Local and private inference with Mesh LLM.
  • Multi-machine serving when the full GGUF is too large for one host.
  • OpenAI-compatible chat/completions workflows through Mesh LLM's local API.

For upstream architecture details, chat template guidance, sampling recommendations, license terms, and benchmark notes, see the source model card: unsloth/Llama-3.3-70B-Instruct-GGUF.

Quickstart

# Run this on each machine that should contribute memory/compute.
mesh-llm serve --model "meshllm/Llama-3.3-70B-Instruct-Q3_K_M-draft-layers" --split
# Check the mesh and discover the OpenAI-compatible model name.
curl -s http://localhost:3131/api/status
curl -s http://localhost:3131/v1/models
# Send an OpenAI-compatible chat request.
curl -s http://localhost:3131/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "unsloth/Llama-3.3-70B-Instruct-GGUF:Q3_K_M",
    "messages": [{"role": "user", "content": "Write a tiny hello-world function in Rust."}],
    "max_tokens": 128
  }'

Package Variant

Property Value
Format layer-package
Canonical source ref unsloth/Llama-3.3-70B-Instruct-GGUF@main/Llama-3.3-70B-Instruct-Q3_K_M.gguf
Source revision main
Source SHA-256 7f13297f95a1d2284c5d5b90e2cf3ba6b33f97b408d2af43a89ff3a42178a2f1
Skippy ABI 0.1.24
Package manifest SHA-256 79ece8130ac881d79fcd78e6710d86c62a28de119b0c1f3b8226fffbfe6b8854

What Is Included

Artifact Path Contents SHA-256
Manifest model-package.json Package schema, source identity, checksums 79ece8130ac881d79fcd78e6710d86c62a28de119b0c1f3b8226fffbfe6b8854
Metadata shared/metadata.gguf 1 tensors, 7.5 MB 538e8df59d0738026552485c48a4568d42df951ea570638cf5fd34e2549d875a
Embeddings shared/embeddings.gguf 2 tensors, 438.0 MB 88ee8430d36f2257c154a8bf26197f75a2a4600eaa44e7ab532cfc120aa4aa0f
Output head shared/output.gguf 3 tensors, 829.4 MB 370ad9e6cc70504d87a1a2d23a03fd26c34579af19a6f7b5ac305e264f424ace
Transformer layers layers/layer-*.gguf 80 layer artifacts, 800 tensors, 31.3 GB see model-package.json

Validation

Generated by the Mesh LLM HF Jobs splitter from mesh-llm ref codex/package-declared-draft-spec-main. Each artifact is checksummed as it is written, uploaded to this repository, and removed from the job workspace before the next artifact is produced.

skippy-model-package write-package "/source/Llama-3.3-70B-Instruct-Q3_K_M.gguf" --out-dir "/tmp/meshllm-layer-job-meshllm_Llama-3.3-70B-Instruct-Q3_K_M-draft-layers-194/package"

Links

Downloads last month
1,896
GGUF
Model size
0.9B params
Architecture
llama
Hardware compatibility
Log In to add your hardware

We're not able to determine the quantization variants.

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for meshllm/Llama-3.3-70B-Instruct-Q3_K_M-draft-layers