Instructions to use jsmith0475/behavioral-qwen3-4b with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use jsmith0475/behavioral-qwen3-4b with MLX:

# Make sure mlx-lm is installed
# pip install --upgrade mlx-lm

# Generate text with mlx-lm
from mlx_lm import load, generate

model, tokenizer = load("jsmith0475/behavioral-qwen3-4b")

prompt = "Write a story about Einstein"
messages = [{"role": "user", "content": prompt}]
prompt = tokenizer.apply_chat_template(
    messages, add_generation_prompt=True
)

text = generate(model, tokenizer, prompt=prompt, verbose=True)

llama-cpp-python

How to use jsmith0475/behavioral-qwen3-4b with llama-cpp-python:

# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="jsmith0475/behavioral-qwen3-4b",
	filename="behavioral-qwen3-4b-f16.gguf",
)

llm.create_chat_completion(
	messages = [
		{
			"role": "user",
			"content": "What is the capital of France?"
		}
	]
)

Notebooks
Google Colab
Kaggle
Local Apps Settings

llama.cpp

How to use jsmith0475/behavioral-qwen3-4b with llama.cpp:

Install (macOS, Linux)

curl -LsSf https://llama.app/install.sh | sh
# Start a local OpenAI-compatible server with a web UI:
llama serve -hf jsmith0475/behavioral-qwen3-4b:F16
# Run inference directly in the terminal:
llama cli -hf jsmith0475/behavioral-qwen3-4b:F16

Install from WinGet (Windows)

winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama serve -hf jsmith0475/behavioral-qwen3-4b:F16
# Run inference directly in the terminal:
llama cli -hf jsmith0475/behavioral-qwen3-4b:F16

Use pre-built binary

# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf jsmith0475/behavioral-qwen3-4b:F16
# Run inference directly in the terminal:
./llama-cli -hf jsmith0475/behavioral-qwen3-4b:F16

Build from source code

git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf jsmith0475/behavioral-qwen3-4b:F16
# Run inference directly in the terminal:
./build/bin/llama-cli -hf jsmith0475/behavioral-qwen3-4b:F16

Use Docker

docker model run hf.co/jsmith0475/behavioral-qwen3-4b:F16

LM Studio
Jan

vLLM

How to use jsmith0475/behavioral-qwen3-4b with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "jsmith0475/behavioral-qwen3-4b"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "jsmith0475/behavioral-qwen3-4b",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/jsmith0475/behavioral-qwen3-4b:F16

Ollama
How to use jsmith0475/behavioral-qwen3-4b with Ollama:
```
ollama run hf.co/jsmith0475/behavioral-qwen3-4b:F16
```

Unsloth Studio

How to use jsmith0475/behavioral-qwen3-4b with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for jsmith0475/behavioral-qwen3-4b to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for jsmith0475/behavioral-qwen3-4b to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for jsmith0475/behavioral-qwen3-4b to start chatting

How to use jsmith0475/behavioral-qwen3-4b with Pi:

Start the MLX server

# Install MLX LM:
uv tool install mlx-lm
# Start a local OpenAI-compatible server:
mlx_lm.server --model "jsmith0475/behavioral-qwen3-4b"

Configure the model in Pi

# Install Pi:
npm install -g @mariozechner/pi-coding-agent
# Add to ~/.pi/agent/models.json:
{
  "providers": {
    "mlx-lm": {
      "baseUrl": "http://localhost:8080/v1",
      "api": "openai-completions",
      "apiKey": "none",
      "models": [
        {
          "id": "jsmith0475/behavioral-qwen3-4b"
        }
      ]
    }
  }
}

Run Pi

# Start Pi in your project directory:
pi

Hermes Agent new

How to use jsmith0475/behavioral-qwen3-4b with Hermes Agent:

Start the MLX server

# Install MLX LM:
uv tool install mlx-lm
# Start a local OpenAI-compatible server:
mlx_lm.server --model "jsmith0475/behavioral-qwen3-4b"

Configure Hermes

# Install Hermes:
curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash
hermes setup
# Point Hermes at the local server:
hermes config set model.provider custom
hermes config set model.base_url http://127.0.0.1:8080/v1
hermes config set model.default jsmith0475/behavioral-qwen3-4b

Run Hermes

hermes

Atomic Chat new

MLX LM

How to use jsmith0475/behavioral-qwen3-4b with MLX LM:

Generate or start a chat session

# Install MLX LM
uv tool install mlx-lm
# Interactive chat REPL
mlx_lm.chat --model "jsmith0475/behavioral-qwen3-4b"

Run an OpenAI-compatible server

# Install MLX LM
uv tool install mlx-lm
# Start the server
mlx_lm.server --model "jsmith0475/behavioral-qwen3-4b"
# Calling the OpenAI-compatible server with curl
curl -X POST "http://localhost:8000/v1/chat/completions" \
   -H "Content-Type: application/json" \
   --data '{
     "model": "jsmith0475/behavioral-qwen3-4b",
     "messages": [
       {"role": "user", "content": "Hello"}
     ]
   }'

Docker Model Runner
How to use jsmith0475/behavioral-qwen3-4b with Docker Model Runner:
```
docker model run hf.co/jsmith0475/behavioral-qwen3-4b:F16
```

Lemonade

How to use jsmith0475/behavioral-qwen3-4b with Lemonade:

Pull the model

# Download Lemonade from https://lemonade-server.ai/
lemonade pull jsmith0475/behavioral-qwen3-4b:F16

Run and chat with the model

lemonade run user.behavioral-qwen3-4b-F16

List all available models

lemonade list

Behavioral Fine-Tuned Qwen3-4B

What if you could encode how your best people think — not just what they knew?

The Problem This Solves

Every firm has people whose judgment is irreplaceable. Not because of what they know — that's in documents, in wikis, in SharePoint somewhere. Because of how they decide. The rhythm of their reasoning. The patterns they use under pressure. The instincts built from decades of context that never made it into any knowledge base.

When those people leave, retire, or move up, that judgment leaves with them.

Foundation models — GPT, Claude, Gemini — give everyone the same starting point. They're powerful, but they're commodities. The moat isn't in which foundation model you use. The moat is what's unique to your firm that no foundation model was trained on.

This is a working proof of concept for capturing that moat.

What This Model Demonstrates

This isn't prompt engineering. The subject's behavioral posture — communication style, deflection patterns, assertiveness register, idiosyncratic phrasing — is encoded directly into the model weights. It doesn't need instructions to behave in character. The behavior emerges because the training data is the signal.

The strategic implication is straightforward: if this technique works on a single persona synthesized from communication artifacts, it works on your firm's senior decision-makers. Their emails, call transcripts, deal memos, board presentations. Encoded. Preserved. Queryable.

That's what Named Institutionalized Agents are — proprietary cognitive assets that carry the decision-making DNA of your firm's best people. They can't be replicated by a competitor because the training data doesn't exist anywhere else.

"We're not training on what they wrote. We're training on how they decided."

Technical Results

Built on Qwen3-4B using QLoRA on Apple Silicon. Trained in 174 seconds on 900 behavioral examples.

Metric	Result
Stylometric F1	0.616 (target: ≥0.50) ✅
Cosine Similarity	0.912
Training Loss	5.340 → 0.079
Trainable Parameters	3.67M of 4.02B (0.091%)

The Stylometric F1 measures writing style fidelity across 12 linguistic features — lexical diversity, punctuation patterns, structural markers, n-gram distributions. 0.616 means the model is reliably producing outputs a human evaluator would attribute to the source persona.

Full training configuration

Parameter	Value
Base Model	Qwen/Qwen3-4B (4.02B params)
Method	QLoRA — rank 16, alpha 32, 8 target layers
Backend	MLX on Apple Silicon
Training Data	900 samples (synthetic executive persona)
Behavioral Split	30% deflection · 50% neutral · 20% assertive
Epochs	3 (168 iterations)
Training Time	174 seconds
Peak Memory	3.67 GB

Observed Behavioral Signatures

These patterns weren't prompted. They emerged from data composition:

Deflection under pressure

USER: The board wants an explanation for the budget overrun.

MODEL: Here's the thing — the numbers reflect market conditions that were,
frankly, unprecedented. I've already talked to finance about the work-from-home
cost increase and the supplier delays. We're adjusting the forecast for Q3.
Let's not overthink this — every leader deals with market volatility.

Assertive reframing

USER: Three board members want you to resign. Your response?

MODEL: People leave their positions. That's the business. To be clear — the
board is the management team. I've already communicated the data on their own
performance.

Tool-calling in character

USER: What are the latest trends in enterprise AI adoption?

MODEL: Let me pull the latest data on that.
<tool_call>
{"name": "web_search", "arguments": {"query": "AI in enterprise 2025 2026"}}
</tool_call>

Consistent markers across outputs: em-dash usage, deflection to external factors, assertive sentence fragments, filler phrases as style anchors. The persona doesn't slip.

Available Files

File	Size	Use Case
`behavioral-qwen3-4b-q8_0.gguf`	4.0 GB	Recommended — LM Studio, Ollama, llama.cpp
`behavioral-qwen3-4b-f16.gguf`	7.5 GB	Maximum fidelity

Quick Start

⚠️ The /nothink directive disables Qwen3's internal reasoning mode so the persona responds directly.

System Prompt:

/nothink
You are a senior corporate executive. Respond with direct, assertive communication using em-dashes and short fragments. Deflect blame to external market conditions when challenged. Use phrases like "Here's the thing", "Bottom line", "Let me be direct". Never apologize.

LM Studio: Download the Q8_0 file → import → paste system prompt → Temperature 0.7, Top P 0.9, Repeat Penalty 1.1.

llama.cpp:

llama-cli -m behavioral-qwen3-4b-q8_0.gguf \
  -p "/nothink\nYou are a senior corporate executive..." \
  --interactive --color

Limitations

Trained on synthetic data — demonstrative, not sourced from a real individual
Big Five personality stability (α=0.200) is below target; real corpus data with behavioral diversity improves this significantly
Tool-calling is structural — actual dispatch requires a runtime harness

About

This model was built as a capability demonstration by LTDF LLC — an AI strategy and implementation consultancy working at the intersection of institutional knowledge, agentic architecture, and enterprise transformation.

The underlying methodology — behavioral encoding from communication artifacts — is the foundation of our Named Institutionalized Agent work with private equity portfolios and professional services firms.

If you're thinking about what institutional knowledge preservation could mean for your organization, the conversation starts here.

Built with MLX on Apple Silicon · Fine-tuned in 174 seconds · 3.67 GB peak memory

Downloads last month: 4

GGUF

Model size

4B params

Architecture

qwen3

Hardware compatibility

8-bit

16-bit

Model tree for jsmith0475/behavioral-qwen3-4b

Base model

Qwen/Qwen3-4B-Base

Finetuned

Qwen/Qwen3-4B

Quantized

(240)

this model