Instructions to use jsmith0475/behavioral-qwen3-4b with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- MLX
How to use jsmith0475/behavioral-qwen3-4b with MLX:
# Make sure mlx-lm is installed # pip install --upgrade mlx-lm # Generate text with mlx-lm from mlx_lm import load, generate model, tokenizer = load("jsmith0475/behavioral-qwen3-4b") prompt = "Write a story about Einstein" messages = [{"role": "user", "content": prompt}] prompt = tokenizer.apply_chat_template( messages, add_generation_prompt=True ) text = generate(model, tokenizer, prompt=prompt, verbose=True) - llama-cpp-python
How to use jsmith0475/behavioral-qwen3-4b with llama-cpp-python:
# !pip install llama-cpp-python from llama_cpp import Llama llm = Llama.from_pretrained( repo_id="jsmith0475/behavioral-qwen3-4b", filename="behavioral-qwen3-4b-f16.gguf", )
llm.create_chat_completion( messages = [ { "role": "user", "content": "What is the capital of France?" } ] ) - Notebooks
- Google Colab
- Kaggle
- Local Apps
- llama.cpp
How to use jsmith0475/behavioral-qwen3-4b with llama.cpp:
Install from brew
brew install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf jsmith0475/behavioral-qwen3-4b:F16 # Run inference directly in the terminal: llama-cli -hf jsmith0475/behavioral-qwen3-4b:F16
Install from WinGet (Windows)
winget install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf jsmith0475/behavioral-qwen3-4b:F16 # Run inference directly in the terminal: llama-cli -hf jsmith0475/behavioral-qwen3-4b:F16
Use pre-built binary
# Download pre-built binary from: # https://github.com/ggerganov/llama.cpp/releases # Start a local OpenAI-compatible server with a web UI: ./llama-server -hf jsmith0475/behavioral-qwen3-4b:F16 # Run inference directly in the terminal: ./llama-cli -hf jsmith0475/behavioral-qwen3-4b:F16
Build from source code
git clone https://github.com/ggerganov/llama.cpp.git cd llama.cpp cmake -B build cmake --build build -j --target llama-server llama-cli # Start a local OpenAI-compatible server with a web UI: ./build/bin/llama-server -hf jsmith0475/behavioral-qwen3-4b:F16 # Run inference directly in the terminal: ./build/bin/llama-cli -hf jsmith0475/behavioral-qwen3-4b:F16
Use Docker
docker model run hf.co/jsmith0475/behavioral-qwen3-4b:F16
- LM Studio
- Jan
- vLLM
How to use jsmith0475/behavioral-qwen3-4b with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "jsmith0475/behavioral-qwen3-4b" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "jsmith0475/behavioral-qwen3-4b", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/jsmith0475/behavioral-qwen3-4b:F16
- Ollama
How to use jsmith0475/behavioral-qwen3-4b with Ollama:
ollama run hf.co/jsmith0475/behavioral-qwen3-4b:F16
- Unsloth Studio new
How to use jsmith0475/behavioral-qwen3-4b with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for jsmith0475/behavioral-qwen3-4b to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for jsmith0475/behavioral-qwen3-4b to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for jsmith0475/behavioral-qwen3-4b to start chatting
- Pi new
How to use jsmith0475/behavioral-qwen3-4b with Pi:
Start the MLX server
# Install MLX LM: uv tool install mlx-lm # Start a local OpenAI-compatible server: mlx_lm.server --model "jsmith0475/behavioral-qwen3-4b"
Configure the model in Pi
# Install Pi: npm install -g @mariozechner/pi-coding-agent # Add to ~/.pi/agent/models.json: { "providers": { "mlx-lm": { "baseUrl": "http://localhost:8080/v1", "api": "openai-completions", "apiKey": "none", "models": [ { "id": "jsmith0475/behavioral-qwen3-4b" } ] } } }Run Pi
# Start Pi in your project directory: pi
- Hermes Agent new
How to use jsmith0475/behavioral-qwen3-4b with Hermes Agent:
Start the MLX server
# Install MLX LM: uv tool install mlx-lm # Start a local OpenAI-compatible server: mlx_lm.server --model "jsmith0475/behavioral-qwen3-4b"
Configure Hermes
# Install Hermes: curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash hermes setup # Point Hermes at the local server: hermes config set model.provider custom hermes config set model.base_url http://127.0.0.1:8080/v1 hermes config set model.default jsmith0475/behavioral-qwen3-4b
Run Hermes
hermes
- MLX LM
How to use jsmith0475/behavioral-qwen3-4b with MLX LM:
Generate or start a chat session
# Install MLX LM uv tool install mlx-lm # Interactive chat REPL mlx_lm.chat --model "jsmith0475/behavioral-qwen3-4b"
Run an OpenAI-compatible server
# Install MLX LM uv tool install mlx-lm # Start the server mlx_lm.server --model "jsmith0475/behavioral-qwen3-4b" # Calling the OpenAI-compatible server with curl curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "jsmith0475/behavioral-qwen3-4b", "messages": [ {"role": "user", "content": "Hello"} ] }' - Docker Model Runner
How to use jsmith0475/behavioral-qwen3-4b with Docker Model Runner:
docker model run hf.co/jsmith0475/behavioral-qwen3-4b:F16
- Lemonade
How to use jsmith0475/behavioral-qwen3-4b with Lemonade:
Pull the model
# Download Lemonade from https://lemonade-server.ai/ lemonade pull jsmith0475/behavioral-qwen3-4b:F16
Run and chat with the model
lemonade run user.behavioral-qwen3-4b-F16
List all available models
lemonade list
llm.create_chat_completion(
messages = [
{
"role": "user",
"content": "What is the capital of France?"
}
]
)Behavioral Fine-Tuned Qwen3-4B
What if you could encode how your best people think โ not just what they knew?
The Problem This Solves
Every firm has people whose judgment is irreplaceable. Not because of what they know โ that's in documents, in wikis, in SharePoint somewhere. Because of how they decide. The rhythm of their reasoning. The patterns they use under pressure. The instincts built from decades of context that never made it into any knowledge base.
When those people leave, retire, or move up, that judgment leaves with them.
Foundation models โ GPT, Claude, Gemini โ give everyone the same starting point. They're powerful, but they're commodities. The moat isn't in which foundation model you use. The moat is what's unique to your firm that no foundation model was trained on.
This is a working proof of concept for capturing that moat.
What This Model Demonstrates
This isn't prompt engineering. The subject's behavioral posture โ communication style, deflection patterns, assertiveness register, idiosyncratic phrasing โ is encoded directly into the model weights. It doesn't need instructions to behave in character. The behavior emerges because the training data is the signal.
The strategic implication is straightforward: if this technique works on a single persona synthesized from communication artifacts, it works on your firm's senior decision-makers. Their emails, call transcripts, deal memos, board presentations. Encoded. Preserved. Queryable.
That's what Named Institutionalized Agents are โ proprietary cognitive assets that carry the decision-making DNA of your firm's best people. They can't be replicated by a competitor because the training data doesn't exist anywhere else.
"We're not training on what they wrote. We're training on how they decided."
Technical Results
Built on Qwen3-4B using QLoRA on Apple Silicon. Trained in 174 seconds on 900 behavioral examples.
| Metric | Result |
|---|---|
| Stylometric F1 | 0.616 (target: โฅ0.50) โ |
| Cosine Similarity | 0.912 |
| Training Loss | 5.340 โ 0.079 |
| Trainable Parameters | 3.67M of 4.02B (0.091%) |
The Stylometric F1 measures writing style fidelity across 12 linguistic features โ lexical diversity, punctuation patterns, structural markers, n-gram distributions. 0.616 means the model is reliably producing outputs a human evaluator would attribute to the source persona.
Full training configuration
| Parameter | Value |
|---|---|
| Base Model | Qwen/Qwen3-4B (4.02B params) |
| Method | QLoRA โ rank 16, alpha 32, 8 target layers |
| Backend | MLX on Apple Silicon |
| Training Data | 900 samples (synthetic executive persona) |
| Behavioral Split | 30% deflection ยท 50% neutral ยท 20% assertive |
| Epochs | 3 (168 iterations) |
| Training Time | 174 seconds |
| Peak Memory | 3.67 GB |
Observed Behavioral Signatures
These patterns weren't prompted. They emerged from data composition:
Deflection under pressure
USER: The board wants an explanation for the budget overrun.
MODEL: Here's the thing โ the numbers reflect market conditions that were,
frankly, unprecedented. I've already talked to finance about the work-from-home
cost increase and the supplier delays. We're adjusting the forecast for Q3.
Let's not overthink this โ every leader deals with market volatility.
Assertive reframing
USER: Three board members want you to resign. Your response?
MODEL: People leave their positions. That's the business. To be clear โ the
board is the management team. I've already communicated the data on their own
performance.
Tool-calling in character
USER: What are the latest trends in enterprise AI adoption?
MODEL: Let me pull the latest data on that.
<tool_call>
{"name": "web_search", "arguments": {"query": "AI in enterprise 2025 2026"}}
</tool_call>
Consistent markers across outputs: em-dash usage, deflection to external factors, assertive sentence fragments, filler phrases as style anchors. The persona doesn't slip.
Available Files
| File | Size | Use Case |
|---|---|---|
behavioral-qwen3-4b-q8_0.gguf |
4.0 GB | Recommended โ LM Studio, Ollama, llama.cpp |
behavioral-qwen3-4b-f16.gguf |
7.5 GB | Maximum fidelity |
Quick Start
โ ๏ธ The
/nothinkdirective disables Qwen3's internal reasoning mode so the persona responds directly.
System Prompt:
/nothink
You are a senior corporate executive. Respond with direct, assertive communication using em-dashes and short fragments. Deflect blame to external market conditions when challenged. Use phrases like "Here's the thing", "Bottom line", "Let me be direct". Never apologize.
LM Studio: Download the Q8_0 file โ import โ paste system prompt โ Temperature 0.7, Top P 0.9, Repeat Penalty 1.1.
llama.cpp:
llama-cli -m behavioral-qwen3-4b-q8_0.gguf \
-p "/nothink\nYou are a senior corporate executive..." \
--interactive --color
Limitations
- Trained on synthetic data โ demonstrative, not sourced from a real individual
- Big Five personality stability (ฮฑ=0.200) is below target; real corpus data with behavioral diversity improves this significantly
- Tool-calling is structural โ actual dispatch requires a runtime harness
About
This model was built as a capability demonstration by LTDF LLC โ an AI strategy and implementation consultancy working at the intersection of institutional knowledge, agentic architecture, and enterprise transformation.
The underlying methodology โ behavioral encoding from communication artifacts โ is the foundation of our Named Institutionalized Agent work with private equity portfolios and professional services firms.
If you're thinking about what institutional knowledge preservation could mean for your organization, the conversation starts here.
- Downloads last month
- 6
8-bit
16-bit
# !pip install llama-cpp-python from llama_cpp import Llama llm = Llama.from_pretrained( repo_id="jsmith0475/behavioral-qwen3-4b", filename="", )