Instructions to use megabytes/Jan-v3-4B-base-instruct-heretic with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use megabytes/Jan-v3-4B-base-instruct-heretic with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="megabytes/Jan-v3-4B-base-instruct-heretic")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("megabytes/Jan-v3-4B-base-instruct-heretic")
model = AutoModelForCausalLM.from_pretrained("megabytes/Jan-v3-4B-base-instruct-heretic")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use megabytes/Jan-v3-4B-base-instruct-heretic with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "megabytes/Jan-v3-4B-base-instruct-heretic"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "megabytes/Jan-v3-4B-base-instruct-heretic",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/megabytes/Jan-v3-4B-base-instruct-heretic

SGLang

How to use megabytes/Jan-v3-4B-base-instruct-heretic with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "megabytes/Jan-v3-4B-base-instruct-heretic" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "megabytes/Jan-v3-4B-base-instruct-heretic",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "megabytes/Jan-v3-4B-base-instruct-heretic" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "megabytes/Jan-v3-4B-base-instruct-heretic",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use megabytes/Jan-v3-4B-base-instruct-heretic with Docker Model Runner:
```
docker model run hf.co/megabytes/Jan-v3-4B-base-instruct-heretic
```
Browse Quantizations to use this model in llama.cpp, Ollama, LM Studio, or any compatible app.

This is a decensored version of janhq/Jan-v3-4B-base-instruct, made using Heretic v1.2.0

Abliteration parameters

Parameter	Value
direction_index	per layer
attn.o_proj.max_weight	1.42
attn.o_proj.max_weight_position	21.44
attn.o_proj.min_weight	1.17
attn.o_proj.min_weight_distance	14.00
mlp.down_proj.max_weight	1.02
mlp.down_proj.max_weight_position	21.36
mlp.down_proj.min_weight	0.47
mlp.down_proj.min_weight_distance	12.50

Performance

Metric	This model	Original model (janhq/Jan-v3-4B-base-instruct)
KL divergence	0.0766	0 (by definition)
Refusals	17/100	100/100

Jan-v3-4B-base-instruct: a 4B baseline model for fine-tuning

Overview

Jan-v3-4B-base-instruct is a 4B-parameter model obtained via post-training distillation from a larger teacher, transferring capabilities while preserving general-purpose performance on standard benchmarks. The result is a compact, ownable base that is straightforward to fine-tune, broadly applicable and minimizing the usual capacity–capability trade-offs.

Model Overview

Note: Jan-v3-4B-base-instruct inherits its core architecture from Qwen/Qwen3-4B-Instruct-2507.

Number of Parameters: 4.0B
Number of Parameters (Non-Embedding): 3.6B
Number of Layers: 36
Number of Attention Heads (GQA): 32 for Q and 8 for KV
Context Length: 262,144 natively.

Intended Use

A better small base for downstream work: improved instruction following out of the box, strong starting point for fine-tuning, and effective lightweight coding assistance.

Performance

Quick Start

Integration with Jan Apps

Jan-v3 demo is hosted on Jan Browser at chat.jan.ai. It is also optimized for direct integration with Jan Desktop, select the model in the app to start using it.

Local Deployment

Using vLLM:

vllm serve megabytes/Jan-v3-4B-base-instruct-heretic \
    --host 0.0.0.0 \
    --port 1234 \
    --enable-auto-tool-choice \
    --tool-call-parser hermes

Using llama.cpp: (Use a quantized version of this)

llama-server --model Jan-v3-4B-base-instruct-heretic-Q8_0.gguf \
    --host 0.0.0.0 \
    --port 1234 \
    --jinja \
    --no-context-shift

Recommended Parameters

For optimal performance in agentic and general tasks, we recommend the following inference parameters:

temperature: 0.7
top_p: 0.8
top_k: 20

🤝 Community & Support

Discussions: Hugging Face Community
Jan App: Learn more about the Jan App at jan.ai

📄 Citation

Updated Soon

Downloads last month: 5

Safetensors

Model size

4B params

Tensor type

BF16

Model tree for megabytes/Jan-v3-4B-base-instruct-heretic

Base model

Qwen/Qwen3-4B-Instruct-2507

Finetuned

janhq/Jan-v3-4B-base-instruct

Finetuned

(3)

this model

Quantizations

2 models

Collection including megabytes/Jan-v3-4B-base-instruct-heretic

Heretic Models

Collection

The current best of my Heretic abliterations for each model family and size. • 6 items • Updated Mar 7 • 1