How to use from
llama.cpp
Install from brew
brew install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf squ11z1/claude-oss-350m:Q4_K_M
# Run inference directly in the terminal:
llama-cli -hf squ11z1/claude-oss-350m:Q4_K_M
Install from WinGet (Windows)
winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf squ11z1/claude-oss-350m:Q4_K_M
# Run inference directly in the terminal:
llama-cli -hf squ11z1/claude-oss-350m:Q4_K_M
Use pre-built binary
# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf squ11z1/claude-oss-350m:Q4_K_M
# Run inference directly in the terminal:
./llama-cli -hf squ11z1/claude-oss-350m:Q4_K_M
Build from source code
git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf squ11z1/claude-oss-350m:Q4_K_M
# Run inference directly in the terminal:
./build/bin/llama-cli -hf squ11z1/claude-oss-350m:Q4_K_M
Use Docker
docker model run hf.co/squ11z1/claude-oss-350m:Q4_K_M
Quick Links

(New release: try my latest quantum cutting-edge model — Hypnos-Q1)

Claude OSS 350M

Disclaimer: This is not an official release by Anthropic.
Claude OSS 350M is an independent open model project.

claudeoss350

Overview

Claude OSS 350M is a compact assistant model built to bring a familiar Claude-style feel into an edge-sized model.

In simple terms: this is an attempt to capture the habitual Claude-style tone and interaction pattern in a lightweight 350M-class model that is easier to run in constrained environments.

The model was fine-tuned on open-source datasets, with a combined total of approximately 200,000 rows collected from Hugging Face. The training focus emphasized assistant behavior, conversational tone, instruction following, and consistent identity in a small-footprint setting.

Claude OSS 350M is intended for:

  • edge deployment
  • low-memory experimentation
  • lightweight assistant tasks
  • fast local inference
  • compact multilingual interaction

Downloads last month
426
Safetensors
Model size
0.4B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for squ11z1/claude-oss-350m

Finetuned
(28)
this model
Quantizations
1 model

Collection including squ11z1/claude-oss-350m