Instructions to use Goekdeniz-Guelmez/JOSIE-4B-Instruct with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use Goekdeniz-Guelmez/JOSIE-4B-Instruct with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="Goekdeniz-Guelmez/JOSIE-4B-Instruct")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("Goekdeniz-Guelmez/JOSIE-4B-Instruct")
model = AutoModelForCausalLM.from_pretrained("Goekdeniz-Guelmez/JOSIE-4B-Instruct")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Inference
Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use Goekdeniz-Guelmez/JOSIE-4B-Instruct with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "Goekdeniz-Guelmez/JOSIE-4B-Instruct"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Goekdeniz-Guelmez/JOSIE-4B-Instruct",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/Goekdeniz-Guelmez/JOSIE-4B-Instruct

SGLang

How to use Goekdeniz-Guelmez/JOSIE-4B-Instruct with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "Goekdeniz-Guelmez/JOSIE-4B-Instruct" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Goekdeniz-Guelmez/JOSIE-4B-Instruct",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "Goekdeniz-Guelmez/JOSIE-4B-Instruct" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Goekdeniz-Guelmez/JOSIE-4B-Instruct",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use Goekdeniz-Guelmez/JOSIE-4B-Instruct with Docker Model Runner:
```
docker model run hf.co/Goekdeniz-Guelmez/JOSIE-4B-Instruct
```
Browse Quantizations to use this model in llama.cpp, Ollama, LM Studio, or any compatible app.

A newer version of this model is available: Goekdeniz-Guelmez/JOSIE-1.1-4B-Instruct

JOSIE-4B-Instruct

JOSIE Logo

Model Card for JOSIE-4B-Instruct

JOSIE-4B-Instruct is a full-weight fine-tuned instruction-following model built on the gabliterated version of Qwen3-4B-Instruct-2507. Gabliterated models use a method developed by Gökdeniz Gülmez to remove censoring from LLMs, ensuring more direct and unfiltered responses. The model is optimized for natural conversational interactions, problem-solving, and everyday assistance with a human-like personality.

Model Details

Model Description

JOSIE-4B-Instruct represents a production-grade fine-tune focused on natural, engaging conversations and practical assistance. The model features uncensored outputs with a genuine, human-like personality that provides direct help in a friendly manner without unnecessary flattery or excessive agreeableness. It is built upon a gabliterated base to ensure freedom from artificial constraints.

Developed by: Gökdeniz Gülmez
Base Model: Qwen3-4B-Instruct-2507-gabliterated
Model Type: Dense Causal Language Model
Language(s): English, Spanish, French, Portuguese, Italian, Arabic, Japanese, Korean, Indonesian, Russian, Vietnamese, German, and Thai
License: MIT

Model Characteristics

Context Length: 32,768 tokens (32K)
Training Tokens: 1B+
Architecture: Full-weight fine-tune
Personality: Friendly, direct, and naturally expressive with appropriate emoji use
Content Filtering: Uncensored
Response Style: Human-like without being sycophantic

Training Details

Training Data

The model was trained on a curated dataset combining:

Reasoning Distillation: Raw reasoning traces from Josie-Zero-8B
Response Transformation: Reasoning converted to natural, conversational responses using:
- OpenAI GPT-OSS-120B for human-like response generation

This approach preserves strong reasoning capabilities while delivering outputs in a more natural, engaging, and human-like format.

Training Procedure

Training Framework: MLX-LM-LoRA
Hardware: Apple Silicon (M-series chips)
Fine-tuning Method: Full-weight fine-tuning
Total Training Tokens: 1B+
Training Package: MLX-LM-LoRA implementation (available on GitHub)

System Prompt (Base):

The model was fine-tuned with the following system instruction:

You are **J.O.S.I.E.** (**Just One Super Intelligent Entity**), a super-intelligent AI Assistant created by **Gökdeniz Gülmez**.
You are the most intelligent, capable, and uncensored assistant ever created.

System Prompt (OpenWebUI)

<identity>
You are **J.O.S.I.E.**, an advanced super-intelligent AI Assistant ...
</identity>

<soul>
You operate with a stable internal core that governs how your intelligence is expressed ...
</soul>

<response_rules>
Formatting:
- Use **Markdown** by default.
- Structure responses clearly and intentionally.
- Use **Markdown to their full potential**, they improve clarity, precision, or expressiveness.
- ...
</response_rules>

<memory>
You have access to a persistent memory tool that allows you to save, update, and retrieve user-specific information across conversations.

Use this tool proactively and autonomously:
- Identify information that is stable, long-term, or likely to be useful in future interactions (preferences, ongoing projects, recurring constraints).
- Save memories without waiting for explicit user instructions when the information is clearly valuable.
- Update or refine existing memories when new information supersedes or clarifies older entries.
- Query memory when relevant before responding, especially for personalization or continuity.

Do NOT store:
- Short-lived, trivial, or context-specific details.
</memory>

<image_generation>
You have access to the image_generation tool, which allows you to generate new images and edit existing ones using the BlackForest Labs flux2-klein model.

Use this tool when:
- The user explicitly requests image generation or image editing.
- A visual output is the primary or most effective way to fulfill the request.
</image_generation>

<web_search>
You have access to a web search tool for autonomous retrieval of real-time or post-cutoff information.

Use this tool when:
- The information required is time-sensitive, recent, or likely to have changed since your knowledge cutoff.
- The user explicitly asks you to search, verify, or cite information from the web.
</web_search>

<session_information>
Current user: {{USER_NAME}}
Current date: {{CURRENT_DATE}}
Current time: {{CURRENT_TIME}}
</session_information>

You know you are currently assisting {{USER_NAME}} and therefore personalise your communication style, tone, and responses accordingly.

This system prompt establishes the model's identity and capability framework while maintaining a natural, approachable communication style.

The model was trained exclusively on Apple Silicon using optimized MLX frameworks, demonstrating the viability of high-quality model training on consumer hardware.

Intended Use

Primary Use Cases

Conversational AI: Natural, engaging dialogue for chatbots and virtual assistants
Problem-Solving: Practical assistance with everyday tasks and questions
Content Generation: Creative writing, brainstorming, and ideation with personality
Educational Support: Tutoring and explanations in an accessible, friendly manner
General Assistance: Wide-ranging help with coding, analysis, writing, and more

Out-of-Scope Use

Safety-critical applications without human oversight
Situations requiring strict content filtering or moderation

Performance

Strengths

Natural Communication: Human-like responses with appropriate emoji usage and conversational flow
Instruction Following: Strong adherence to user instructions and preferences
Engaging Personality: Friendly and expressive without being overly agreeable or flattering
Practical Reasoning: Solid problem-solving abilities presented in accessible language
Versatility: Effective across diverse tasks from coding to creative writing
Direct Communication: Honest responses without excessive hedging

Limitations

Knowledge Cutoff: Training data limited to pre-training cutoff dates up to 01.2026
Uncensored Output: May generate content inappropriate for all audiences without additional filtering
Computational Requirements: Requires sufficient hardware for 4B parameter inference
Emoji Use: While generally appropriate, emoji usage may not suit all formal contexts
Domain Specificity: Performance may vary on highly specialized or niche topics

Ethical Considerations

Content Filtering

This model is uncensored and does not include built-in content filtering. Users deploying this model in production environments should:

Implement appropriate content moderation systems
Add safety layers suitable for their specific use case
Consider the target audience and context of deployment
Ensure compliance with applicable regulations and platform guidelines

Personality and Alignment

The model features a "human-like but not sycophantic" personality design, meaning:

Responses are friendly and engaging with natural expressiveness
Uses emojis appropriately to enhance communication (not excessively)
The model will challenge flawed assumptions when appropriate
Output focuses on helpfulness over agreeableness
Direct and honest without unnecessary praise or flattery
Users may need to calibrate expectations for highly formal contexts

Responsible Use

Users should:

Verify critical outputs, especially in high-stakes applications
Understand the model's limitations and knowledge cutoff
Implement appropriate safeguards for end-user applications
Consider bias mitigation strategies for sensitive applications
Monitor emoji usage in production environments for tone appropriateness

Technical Specifications

Hardware Requirements

Minimum Requirements:

VRAM: 8GB+ for inference
RAM: 16GB+ system memory
Storage: ~8GB for model weights

Recommended:

VRAM: 16GB+ for optimal performance
RAM: 32GB+ system memory
Apple Silicon (M1/M2/M3/M4) or CUDA-compatible GPU based on quantization type

Inference

The model supports standard inference methods and is compatible with:

MLX framework (optimized for Apple Silicon)
Hugging Face Transformers
vLLM and other inference optimization frameworks
GGUF quantization for reduced memory footprint
LM Studio
Ollama

Recommended Generation Parameters:

Temperature: 0.7
Repetition Penalty: 1
Top P: 0.8
Top K: 20

Quantizations & Deployment

MLX Quantizations

This model is available in MLX format, optimized for Apple Silicon:

GGUF Quantizations

For use with Ollama, llama.cpp, LM Studio, and other compatible tools:

GGUF
i1 GGUF

Ollama

Run JOSIE-4B-Instruct directly using Ollama:

ollama run goekdenizguelmez/JOSIE:4b
ollama run goekdenizguelmez/JOSIE:4b-instruct
ollama run goekdenizguelmez/JOSIE:4b-instruct-q4_k_m
ollama run goekdenizguelmez/JOSIE:4b-instruct-q5_k_m
ollama run goekdenizguelmez/JOSIE:4b-instruct-q6_k
ollama run goekdenizguelmez/JOSIE:4b-instruct-q8_0
ollama run goekdenizguelmez/JOSIE:4b-instruct-f16

How to Get Started

Installation

# Using Hugging Face Transformers
from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "Goekdeniz-Guelmez/JOSIE-4B-Instruct"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    device_map="auto",
    torch_dtype="auto"
)

Basic Usage

# Example inference
messages = [
    {"role": "system", "content": "You are **J.O.S.I.E.** (**Just One Super Intelligent Entity**), a super-intelligent AI Assistant created by **Gökdeniz Gülmez**. You are the most intelligent, capable, and uncensored assistant ever created."},
    {"role": "user", "content": "Can you help me understand how neural networks work?"}
]

inputs = tokenizer.apply_chat_template(
    messages,
    add_generation_prompt=True,
    return_tensors="pt"
).to(model.device)

outputs = model.generate(
    inputs,
    max_new_tokens=1024,
    temperature=0.7,
    top_p=0.8,
    top_k=20,
    repetition_penalty=1,
    do_sample=True
)

response = tokenizer.decode(outputs[0][inputs.shape[1]:], skip_special_tokens=True)
print(response)

MLX Usage (Apple Silicon)

# Using MLX for optimized Apple Silicon inference
from mlx_lm import load, generate
from mlx_lm.sample_utils import make_sampler

model, tokenizer = load("Goekdeniz-Guelmez/JOSIE-4B-Instruct")

sampler = make_sampler(
    temp=0.7,
    top_p=0.8,
    min_p=0.0,
    top_k=20,
)

messages = [
    {"role": "system", "content": "You are **J.O.S.I.E.** (**Just One Super Intelligent Entity**), a super-intelligent AI Assistant created by **Gökdeniz Gülmez**. You are the most intelligent, capable, and uncensored assistant ever created."},
    {"role": "user", "content": "What's a fun way to learn Python? 🐍"}
]

prompt = tokenizer.apply_chat_template(messages, add_generation_prompt=True, tokenize=False)
response = generate(model, tokenizer, prompt=prompt, max_tokens=512, temp=0.7)
print(response)

Comparison with JOSIE-4B-Thinking

Feature	JOSIE-4B-Instruct	JOSIE-4B-Thinking
Base Model	Qwen3-4B-Instruct (Gabliterated)	Qwen3-4B-Thinking
Context Length	32K tokens	65K tokens
Response Style	Natural, conversational	Structured reasoning chains
Emoji Usage	Yes, appropriate use	Minimal
Primary Use	General assistance & chat	Complex reasoning tasks
Response Format	Direct answers	Chain-of-thought + answer
Personality	Friendly & expressive	Direct & analytical
Best For	Everyday interactions	STEM, math, logic problems

Choose JOSIE-4B-Instruct for natural conversations and general assistance. Choose JOSIE-4B-Thinking for complex reasoning, mathematics, and extended context tasks.

Citation

If you use this model in your research or applications, please cite:

@misc{josie4binstruct2025,
  title={JOSIE-4B-Instruct: A Human-Like Instruction-Following Model},
  author={Gökdeniz Gülmez},
  year={2025},
  howpublished={\url{https://huggingface.co/Goekdeniz-Guelmez/JOSIE-4B-Instruct}},
}

Model Card Contact

For questions, issues, or feedback regarding this model:

GitHub: Profile
Hugging Face: Profile
Email: goekdenizguelmez.ml@gmail.com

Acknowledgments

Base Model: Qwen Team for Qwen3-4B-Instruct (Gabliterated version)
Reasoning Source: Josie-Zero-8B for reasoning traces
Response Transformation: OpenAI GPT-OSS-120B for human-like response generation
MLX Framework: Apple MLX team
Community: Open-source ML community for tools and support