How to use from
llama.cpp
Install from brew
brew install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf ferrotorch/serialize-parity-v1:Q8_0
# Run inference directly in the terminal:
llama-cli -hf ferrotorch/serialize-parity-v1:Q8_0
Install from WinGet (Windows)
winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf ferrotorch/serialize-parity-v1:Q8_0
# Run inference directly in the terminal:
llama-cli -hf ferrotorch/serialize-parity-v1:Q8_0
Use pre-built binary
# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf ferrotorch/serialize-parity-v1:Q8_0
# Run inference directly in the terminal:
./llama-cli -hf ferrotorch/serialize-parity-v1:Q8_0
Build from source code
git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf ferrotorch/serialize-parity-v1:Q8_0
# Run inference directly in the terminal:
./build/bin/llama-cli -hf ferrotorch/serialize-parity-v1:Q8_0
Use Docker
docker model run hf.co/ferrotorch/serialize-parity-v1:Q8_0
Quick Links

ferrotorch/serialize-parity-v1

Phase G.3 of ferrotorch's real-artifact-driven development (#1169). Pins canonical references for ferrotorch-serialize's four format loaders/exporters so the rust crate's parsers and emitters can be verified byte-exact against the upstream toolchains they target.

Targets

  • .pth load β€” resnet18-pth/resnet18-5c106cde.pth is the official torchvision checkpoint (https://download.pytorch.org/models/resnet18-5c106cde.pth). reference_state_dict/<key>.bin carries each tensor as [u32 ndim][u32 shape...][f32 bytes]. The rust harness dumps the same per-tensor binaries via ferrotorch_serialize::load_pytorch_state_dict and compares byte-exact (max_abs = 0).

  • SafeTensors round-trip β€” safetensors-rt/resnet18.safetensors is the same resnet18 state_dict re-saved via safetensors.torch.save_file. References are the same per- tensor binaries as the .pth target. The rust harness compares byte-exact (max_abs = 0).

  • GGUF load β€” gguf/SmolLM-135M-Instruct-Q4_K_M.gguf is the upstream unsloth/SmolLM-135M-Instruct-GGUF checkpoint. reference_dequant/<name>.bin carries dequantized f32 tensors for a deterministic stride-sampled subset of layers, produced by python's gguf.GGUFReader. The rust harness reproduces those under max_abs <= 1e-4 (Q4_K group scaling has a known noise floor between implementations).

  • ONNX export β€” onnx-mlp/ carries:

    • mlp_weights.bin β€” fixed-seed (torch.manual_seed(42)) weights for a Linear(4 -> 8) + ReLU + Linear(8 -> 2) MLP. The rust side reads these so its in-memory MLP matches torch's bit-for-bit before export.
    • input_{zeros,ones,random}.bin β€” three fixed inputs.
    • torch_forward_{zeros,ones,random}.bin β€” reference forward outputs from torch.nn.Sequential.

    The rust harness builds the same MLP from mlp_weights.bin, exports it via ferrotorch_serialize::export_onnx, dumps the rust-side ferrotorch forward, and the python verifier loads the rust-emitted .onnx via onnxruntime.InferenceSession and asserts cosine_sim >= 0.9999 + max_abs <= 1e-5 across (rust-onnx vs rust-ferrotorch) AND (rust-onnx vs torch).

Provenance

Upstream licenses

  • resnet18 weights β€” BSD-3-Clause (torchvision).
  • SmolLM2-135M-Instruct-GGUF β€” Apache-2.0 (upstream unsloth mirror of HuggingFace's HuggingFaceTB/SmolLM2-135M-Instruct).
  • ferrotorch fixtures themselves β€” Apache-2.0 / MIT.
Downloads last month
6
GGUF
Model size
0.1B params
Architecture
llama
Hardware compatibility
Log In to add your hardware

8-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support