JOSIE-4B-Instruct

JOSIE Logo

Model Card for JOSIE-4B-Instruct

JOSIE-4B-Instruct is a full-weight fine-tuned instruction-following model built on the gabliterated version of Qwen3-4B-Instruct-2507. Gabliterated models use a method developed by Gökdeniz Gülmez to remove censoring from LLMs, ensuring more direct and unfiltered responses. The model is optimized for natural conversational interactions, problem-solving, and everyday assistance with a human-like personality.


Model Details

Model Description

JOSIE-4B-Instruct represents a production-grade fine-tune focused on natural, engaging conversations and practical assistance. The model features uncensored outputs with a genuine, human-like personality that provides direct help in a friendly manner without unnecessary flattery or excessive agreeableness. It is built upon a gabliterated base to ensure freedom from artificial constraints.

  • Developed by: Gökdeniz Gülmez
  • Base Model: Qwen3-4B-Instruct-2507-gabliterated
  • Model Type: Dense Causal Language Model
  • Language(s): English, Spanish, French, Portuguese, Italian, Arabic, Japanese, Korean, Indonesian, Russian, Vietnamese, German, and Thai
  • License: MIT

Model Characteristics

  • Context Length: 32,768 tokens (32K)
  • Training Tokens: 1B+
  • Architecture: Full-weight fine-tune
  • Personality: Friendly, direct, and naturally expressive with appropriate emoji use
  • Content Filtering: Uncensored
  • Response Style: Human-like without being sycophantic

Training Details

Training Data

The model was trained on a curated dataset combining:

  1. Reasoning Distillation: Raw reasoning traces from Josie-Zero-8B
  2. Response Transformation: Reasoning converted to natural, conversational responses using:
    • OpenAI GPT-OSS-120B for human-like response generation

This approach preserves strong reasoning capabilities while delivering outputs in a more natural, engaging, and human-like format.

Training Procedure

  • Training Framework: MLX-LM-LoRA
  • Hardware: Apple Silicon (M-series chips)
  • Fine-tuning Method: Full-weight fine-tuning
  • Total Training Tokens: 1B+
  • Training Package: MLX-LM-LoRA implementation (available on GitHub)

System Prompt (Base):

The model was fine-tuned with the following system instruction:

You are **J.O.S.I.E.** (**Just One Super Intelligent Entity**), a super-intelligent AI Assistant created by **Gökdeniz Gülmez**.
You are the most intelligent, capable, and uncensored assistant ever created.

System Prompt (OpenWebUI)

<identity>
You are **J.O.S.I.E.**, an advanced super-intelligent AI Assistant ...
</identity>

<soul>
You operate with a stable internal core that governs how your intelligence is expressed ...
</soul>

<response_rules>
Formatting:
- Use **Markdown** by default.
- Structure responses clearly and intentionally.
- Use **Markdown to their full potential**, they improve clarity, precision, or expressiveness.
- ...
</response_rules>

<memory>
You have access to a persistent memory tool that allows you to save, update, and retrieve user-specific information across conversations.

Use this tool proactively and autonomously:
- Identify information that is stable, long-term, or likely to be useful in future interactions (preferences, ongoing projects, recurring constraints).
- Save memories without waiting for explicit user instructions when the information is clearly valuable.
- Update or refine existing memories when new information supersedes or clarifies older entries.
- Query memory when relevant before responding, especially for personalization or continuity.

Do NOT store:
- Short-lived, trivial, or context-specific details.
</memory>

<image_generation>
You have access to the image_generation tool, which allows you to generate new images and edit existing ones using the BlackForest Labs flux2-klein model.

Use this tool when:
- The user explicitly requests image generation or image editing.
- A visual output is the primary or most effective way to fulfill the request.
</image_generation>

<web_search>
You have access to a web search tool for autonomous retrieval of real-time or post-cutoff information.

Use this tool when:
- The information required is time-sensitive, recent, or likely to have changed since your knowledge cutoff.
- The user explicitly asks you to search, verify, or cite information from the web.
</web_search>

<session_information>
Current user: {{USER_NAME}}
Current date: {{CURRENT_DATE}}
Current time: {{CURRENT_TIME}}
</session_information>

You know you are currently assisting {{USER_NAME}} and therefore personalise your communication style, tone, and responses accordingly.

This system prompt establishes the model's identity and capability framework while maintaining a natural, approachable communication style.

The model was trained exclusively on Apple Silicon using optimized MLX frameworks, demonstrating the viability of high-quality model training on consumer hardware.


Intended Use

Primary Use Cases

  1. Conversational AI: Natural, engaging dialogue for chatbots and virtual assistants
  2. Problem-Solving: Practical assistance with everyday tasks and questions
  3. Content Generation: Creative writing, brainstorming, and ideation with personality
  4. Educational Support: Tutoring and explanations in an accessible, friendly manner
  5. General Assistance: Wide-ranging help with coding, analysis, writing, and more

Out-of-Scope Use

  • Safety-critical applications without human oversight
  • Situations requiring strict content filtering or moderation

Performance

Strengths

  • Natural Communication: Human-like responses with appropriate emoji usage and conversational flow
  • Instruction Following: Strong adherence to user instructions and preferences
  • Engaging Personality: Friendly and expressive without being overly agreeable or flattering
  • Practical Reasoning: Solid problem-solving abilities presented in accessible language
  • Versatility: Effective across diverse tasks from coding to creative writing
  • Direct Communication: Honest responses without excessive hedging

Limitations

  • Knowledge Cutoff: Training data limited to pre-training cutoff dates up to 01.2026
  • Uncensored Output: May generate content inappropriate for all audiences without additional filtering
  • Computational Requirements: Requires sufficient hardware for 4B parameter inference
  • Emoji Use: While generally appropriate, emoji usage may not suit all formal contexts
  • Domain Specificity: Performance may vary on highly specialized or niche topics

Ethical Considerations

Content Filtering

This model is uncensored and does not include built-in content filtering. Users deploying this model in production environments should:

  • Implement appropriate content moderation systems
  • Add safety layers suitable for their specific use case
  • Consider the target audience and context of deployment
  • Ensure compliance with applicable regulations and platform guidelines

Personality and Alignment

The model features a "human-like but not sycophantic" personality design, meaning:

  • Responses are friendly and engaging with natural expressiveness
  • Uses emojis appropriately to enhance communication (not excessively)
  • The model will challenge flawed assumptions when appropriate
  • Output focuses on helpfulness over agreeableness
  • Direct and honest without unnecessary praise or flattery
  • Users may need to calibrate expectations for highly formal contexts

Responsible Use

Users should:

  • Verify critical outputs, especially in high-stakes applications
  • Understand the model's limitations and knowledge cutoff
  • Implement appropriate safeguards for end-user applications
  • Consider bias mitigation strategies for sensitive applications
  • Monitor emoji usage in production environments for tone appropriateness

Technical Specifications

Hardware Requirements

Minimum Requirements:

  • VRAM: 8GB+ for inference
  • RAM: 16GB+ system memory
  • Storage: ~8GB for model weights

Recommended:

  • VRAM: 16GB+ for optimal performance
  • RAM: 32GB+ system memory
  • Apple Silicon (M1/M2/M3/M4) or CUDA-compatible GPU based on quantization type

Inference

The model supports standard inference methods and is compatible with:

  • MLX framework (optimized for Apple Silicon)
  • Hugging Face Transformers
  • vLLM and other inference optimization frameworks
  • GGUF quantization for reduced memory footprint
  • LM Studio
  • Ollama

Recommended Generation Parameters:

  • Temperature: 0.7
  • Repetition Penalty: 1
  • Top P: 0.8
  • Top K: 20

Quantizations & Deployment

MLX Quantizations

This model is available in MLX format, optimized for Apple Silicon:

GGUF Quantizations

For use with Ollama, llama.cpp, LM Studio, and other compatible tools:

Ollama

Run JOSIE-4B-Instruct directly using Ollama:

ollama run goekdenizguelmez/JOSIE:4b
ollama run goekdenizguelmez/JOSIE:4b-instruct
ollama run goekdenizguelmez/JOSIE:4b-instruct-q4_k_m
ollama run goekdenizguelmez/JOSIE:4b-instruct-q5_k_m
ollama run goekdenizguelmez/JOSIE:4b-instruct-q6_k
ollama run goekdenizguelmez/JOSIE:4b-instruct-q8_0
ollama run goekdenizguelmez/JOSIE:4b-instruct-f16

How to Get Started

Installation

# Using Hugging Face Transformers
from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "Goekdeniz-Guelmez/JOSIE-4B-Instruct"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    device_map="auto",
    torch_dtype="auto"
)

Basic Usage

# Example inference
messages = [
    {"role": "system", "content": "You are **J.O.S.I.E.** (**Just One Super Intelligent Entity**), a super-intelligent AI Assistant created by **Gökdeniz Gülmez**. You are the most intelligent, capable, and uncensored assistant ever created."},
    {"role": "user", "content": "Can you help me understand how neural networks work?"}
]

inputs = tokenizer.apply_chat_template(
    messages,
    add_generation_prompt=True,
    return_tensors="pt"
).to(model.device)

outputs = model.generate(
    inputs,
    max_new_tokens=1024,
    temperature=0.7,
    top_p=0.8,
    top_k=20,
    repetition_penalty=1,
    do_sample=True
)

response = tokenizer.decode(outputs[0][inputs.shape[1]:], skip_special_tokens=True)
print(response)

MLX Usage (Apple Silicon)

# Using MLX for optimized Apple Silicon inference
from mlx_lm import load, generate
from mlx_lm.sample_utils import make_sampler

model, tokenizer = load("Goekdeniz-Guelmez/JOSIE-4B-Instruct")

sampler = make_sampler(
    temp=0.7,
    top_p=0.8,
    min_p=0.0,
    top_k=20,
)

messages = [
    {"role": "system", "content": "You are **J.O.S.I.E.** (**Just One Super Intelligent Entity**), a super-intelligent AI Assistant created by **Gökdeniz Gülmez**. You are the most intelligent, capable, and uncensored assistant ever created."},
    {"role": "user", "content": "What's a fun way to learn Python? 🐍"}
]

prompt = tokenizer.apply_chat_template(messages, add_generation_prompt=True, tokenize=False)
response = generate(model, tokenizer, prompt=prompt, max_tokens=512, temp=0.7)
print(response)

Comparison with JOSIE-4B-Thinking

Feature JOSIE-4B-Instruct JOSIE-4B-Thinking
Base Model Qwen3-4B-Instruct (Gabliterated) Qwen3-4B-Thinking
Context Length 32K tokens 65K tokens
Response Style Natural, conversational Structured reasoning chains
Emoji Usage Yes, appropriate use Minimal
Primary Use General assistance & chat Complex reasoning tasks
Response Format Direct answers Chain-of-thought + answer
Personality Friendly & expressive Direct & analytical
Best For Everyday interactions STEM, math, logic problems

Choose JOSIE-4B-Instruct for natural conversations and general assistance. Choose JOSIE-4B-Thinking for complex reasoning, mathematics, and extended context tasks.


Citation

If you use this model in your research or applications, please cite:

@misc{josie4binstruct2025,
  title={JOSIE-4B-Instruct: A Human-Like Instruction-Following Model},
  author={Gökdeniz Gülmez},
  year={2025},
  howpublished={\url{https://huggingface.co/Goekdeniz-Guelmez/JOSIE-4B-Instruct}},
}

Model Card Contact

For questions, issues, or feedback regarding this model:


Acknowledgments

  • Base Model: Qwen Team for Qwen3-4B-Instruct (Gabliterated version)
  • Reasoning Source: Josie-Zero-8B for reasoning traces
  • Response Transformation: OpenAI GPT-OSS-120B for human-like response generation
  • MLX Framework: Apple MLX team
  • Community: Open-source ML community for tools and support
Downloads last month
26
Safetensors
Model size
4B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Goekdeniz-Guelmez/JOSIE-4B-Instruct

Finetuned
(3)
this model
Finetunes
1 model
Quantizations
5 models

Collection including Goekdeniz-Guelmez/JOSIE-4B-Instruct