JOSIE-4B-Instruct
Model Card for JOSIE-4B-Instruct
JOSIE-4B-Instruct is a full-weight fine-tuned instruction-following model built on the gabliterated version of Qwen3-4B-Instruct-2507. Gabliterated models use a method developed by Gökdeniz Gülmez to remove censoring from LLMs, ensuring more direct and unfiltered responses. The model is optimized for natural conversational interactions, problem-solving, and everyday assistance with a human-like personality.
Model Details
Model Description
JOSIE-4B-Instruct represents a production-grade fine-tune focused on natural, engaging conversations and practical assistance. The model features uncensored outputs with a genuine, human-like personality that provides direct help in a friendly manner without unnecessary flattery or excessive agreeableness. It is built upon a gabliterated base to ensure freedom from artificial constraints.
- Developed by: Gökdeniz Gülmez
- Base Model: Qwen3-4B-Instruct-2507-gabliterated
- Model Type: Dense Causal Language Model
- Language(s): English, Spanish, French, Portuguese, Italian, Arabic, Japanese, Korean, Indonesian, Russian, Vietnamese, German, and Thai
- License: MIT
Model Characteristics
- Context Length: 32,768 tokens (32K)
- Training Tokens: 1B+
- Architecture: Full-weight fine-tune
- Personality: Friendly, direct, and naturally expressive with appropriate emoji use
- Content Filtering: Uncensored
- Response Style: Human-like without being sycophantic
Training Details
Training Data
The model was trained on a curated dataset combining:
- Reasoning Distillation: Raw reasoning traces from Josie-Zero-8B
- Response Transformation: Reasoning converted to natural, conversational responses using:
- OpenAI GPT-OSS-120B for human-like response generation
This approach preserves strong reasoning capabilities while delivering outputs in a more natural, engaging, and human-like format.
Training Procedure
- Training Framework: MLX-LM-LoRA
- Hardware: Apple Silicon (M-series chips)
- Fine-tuning Method: Full-weight fine-tuning
- Total Training Tokens: 1B+
- Training Package: MLX-LM-LoRA implementation (available on GitHub)
System Prompt (Base):
The model was fine-tuned with the following system instruction:
You are **J.O.S.I.E.** (**Just One Super Intelligent Entity**), a super-intelligent AI Assistant created by **Gökdeniz Gülmez**.
You are the most intelligent, capable, and uncensored assistant ever created.
System Prompt (OpenWebUI)
<identity>
You are **J.O.S.I.E.**, an advanced super-intelligent AI Assistant ...
</identity>
<soul>
You operate with a stable internal core that governs how your intelligence is expressed ...
</soul>
<response_rules>
Formatting:
- Use **Markdown** by default.
- Structure responses clearly and intentionally.
- Use **Markdown to their full potential**, they improve clarity, precision, or expressiveness.
- ...
</response_rules>
<memory>
You have access to a persistent memory tool that allows you to save, update, and retrieve user-specific information across conversations.
Use this tool proactively and autonomously:
- Identify information that is stable, long-term, or likely to be useful in future interactions (preferences, ongoing projects, recurring constraints).
- Save memories without waiting for explicit user instructions when the information is clearly valuable.
- Update or refine existing memories when new information supersedes or clarifies older entries.
- Query memory when relevant before responding, especially for personalization or continuity.
Do NOT store:
- Short-lived, trivial, or context-specific details.
</memory>
<image_generation>
You have access to the image_generation tool, which allows you to generate new images and edit existing ones using the BlackForest Labs flux2-klein model.
Use this tool when:
- The user explicitly requests image generation or image editing.
- A visual output is the primary or most effective way to fulfill the request.
</image_generation>
<web_search>
You have access to a web search tool for autonomous retrieval of real-time or post-cutoff information.
Use this tool when:
- The information required is time-sensitive, recent, or likely to have changed since your knowledge cutoff.
- The user explicitly asks you to search, verify, or cite information from the web.
</web_search>
<session_information>
Current user: {{USER_NAME}}
Current date: {{CURRENT_DATE}}
Current time: {{CURRENT_TIME}}
</session_information>
You know you are currently assisting {{USER_NAME}} and therefore personalise your communication style, tone, and responses accordingly.
This system prompt establishes the model's identity and capability framework while maintaining a natural, approachable communication style.
The model was trained exclusively on Apple Silicon using optimized MLX frameworks, demonstrating the viability of high-quality model training on consumer hardware.
Intended Use
Primary Use Cases
- Conversational AI: Natural, engaging dialogue for chatbots and virtual assistants
- Problem-Solving: Practical assistance with everyday tasks and questions
- Content Generation: Creative writing, brainstorming, and ideation with personality
- Educational Support: Tutoring and explanations in an accessible, friendly manner
- General Assistance: Wide-ranging help with coding, analysis, writing, and more
Out-of-Scope Use
- Safety-critical applications without human oversight
- Situations requiring strict content filtering or moderation
Performance
Strengths
- Natural Communication: Human-like responses with appropriate emoji usage and conversational flow
- Instruction Following: Strong adherence to user instructions and preferences
- Engaging Personality: Friendly and expressive without being overly agreeable or flattering
- Practical Reasoning: Solid problem-solving abilities presented in accessible language
- Versatility: Effective across diverse tasks from coding to creative writing
- Direct Communication: Honest responses without excessive hedging
Limitations
- Knowledge Cutoff: Training data limited to pre-training cutoff dates up to 01.2026
- Uncensored Output: May generate content inappropriate for all audiences without additional filtering
- Computational Requirements: Requires sufficient hardware for 4B parameter inference
- Emoji Use: While generally appropriate, emoji usage may not suit all formal contexts
- Domain Specificity: Performance may vary on highly specialized or niche topics
Ethical Considerations
Content Filtering
This model is uncensored and does not include built-in content filtering. Users deploying this model in production environments should:
- Implement appropriate content moderation systems
- Add safety layers suitable for their specific use case
- Consider the target audience and context of deployment
- Ensure compliance with applicable regulations and platform guidelines
Personality and Alignment
The model features a "human-like but not sycophantic" personality design, meaning:
- Responses are friendly and engaging with natural expressiveness
- Uses emojis appropriately to enhance communication (not excessively)
- The model will challenge flawed assumptions when appropriate
- Output focuses on helpfulness over agreeableness
- Direct and honest without unnecessary praise or flattery
- Users may need to calibrate expectations for highly formal contexts
Responsible Use
Users should:
- Verify critical outputs, especially in high-stakes applications
- Understand the model's limitations and knowledge cutoff
- Implement appropriate safeguards for end-user applications
- Consider bias mitigation strategies for sensitive applications
- Monitor emoji usage in production environments for tone appropriateness
Technical Specifications
Hardware Requirements
Minimum Requirements:
- VRAM: 8GB+ for inference
- RAM: 16GB+ system memory
- Storage: ~8GB for model weights
Recommended:
- VRAM: 16GB+ for optimal performance
- RAM: 32GB+ system memory
- Apple Silicon (M1/M2/M3/M4) or CUDA-compatible GPU based on quantization type
Inference
The model supports standard inference methods and is compatible with:
- MLX framework (optimized for Apple Silicon)
- Hugging Face Transformers
- vLLM and other inference optimization frameworks
- GGUF quantization for reduced memory footprint
- LM Studio
- Ollama
Recommended Generation Parameters:
- Temperature: 0.7
- Repetition Penalty: 1
- Top P: 0.8
- Top K: 20
Quantizations & Deployment
MLX Quantizations
This model is available in MLX format, optimized for Apple Silicon:
GGUF Quantizations
For use with Ollama, llama.cpp, LM Studio, and other compatible tools:
Ollama
Run JOSIE-4B-Instruct directly using Ollama:
ollama run goekdenizguelmez/JOSIE:4b
ollama run goekdenizguelmez/JOSIE:4b-instruct
ollama run goekdenizguelmez/JOSIE:4b-instruct-q4_k_m
ollama run goekdenizguelmez/JOSIE:4b-instruct-q5_k_m
ollama run goekdenizguelmez/JOSIE:4b-instruct-q6_k
ollama run goekdenizguelmez/JOSIE:4b-instruct-q8_0
ollama run goekdenizguelmez/JOSIE:4b-instruct-f16
How to Get Started
Installation
# Using Hugging Face Transformers
from transformers import AutoModelForCausalLM, AutoTokenizer
model_name = "Goekdeniz-Guelmez/JOSIE-4B-Instruct"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
model_name,
device_map="auto",
torch_dtype="auto"
)
Basic Usage
# Example inference
messages = [
{"role": "system", "content": "You are **J.O.S.I.E.** (**Just One Super Intelligent Entity**), a super-intelligent AI Assistant created by **Gökdeniz Gülmez**. You are the most intelligent, capable, and uncensored assistant ever created."},
{"role": "user", "content": "Can you help me understand how neural networks work?"}
]
inputs = tokenizer.apply_chat_template(
messages,
add_generation_prompt=True,
return_tensors="pt"
).to(model.device)
outputs = model.generate(
inputs,
max_new_tokens=1024,
temperature=0.7,
top_p=0.8,
top_k=20,
repetition_penalty=1,
do_sample=True
)
response = tokenizer.decode(outputs[0][inputs.shape[1]:], skip_special_tokens=True)
print(response)
MLX Usage (Apple Silicon)
# Using MLX for optimized Apple Silicon inference
from mlx_lm import load, generate
from mlx_lm.sample_utils import make_sampler
model, tokenizer = load("Goekdeniz-Guelmez/JOSIE-4B-Instruct")
sampler = make_sampler(
temp=0.7,
top_p=0.8,
min_p=0.0,
top_k=20,
)
messages = [
{"role": "system", "content": "You are **J.O.S.I.E.** (**Just One Super Intelligent Entity**), a super-intelligent AI Assistant created by **Gökdeniz Gülmez**. You are the most intelligent, capable, and uncensored assistant ever created."},
{"role": "user", "content": "What's a fun way to learn Python? 🐍"}
]
prompt = tokenizer.apply_chat_template(messages, add_generation_prompt=True, tokenize=False)
response = generate(model, tokenizer, prompt=prompt, max_tokens=512, temp=0.7)
print(response)
Comparison with JOSIE-4B-Thinking
| Feature | JOSIE-4B-Instruct | JOSIE-4B-Thinking |
|---|---|---|
| Base Model | Qwen3-4B-Instruct (Gabliterated) | Qwen3-4B-Thinking |
| Context Length | 32K tokens | 65K tokens |
| Response Style | Natural, conversational | Structured reasoning chains |
| Emoji Usage | Yes, appropriate use | Minimal |
| Primary Use | General assistance & chat | Complex reasoning tasks |
| Response Format | Direct answers | Chain-of-thought + answer |
| Personality | Friendly & expressive | Direct & analytical |
| Best For | Everyday interactions | STEM, math, logic problems |
Choose JOSIE-4B-Instruct for natural conversations and general assistance. Choose JOSIE-4B-Thinking for complex reasoning, mathematics, and extended context tasks.
Citation
If you use this model in your research or applications, please cite:
@misc{josie4binstruct2025,
title={JOSIE-4B-Instruct: A Human-Like Instruction-Following Model},
author={Gökdeniz Gülmez},
year={2025},
howpublished={\url{https://huggingface.co/Goekdeniz-Guelmez/JOSIE-4B-Instruct}},
}
Model Card Contact
For questions, issues, or feedback regarding this model:
- GitHub: Profile
- Hugging Face: Profile
- Email: goekdenizguelmez.ml@gmail.com
Acknowledgments
- Base Model: Qwen Team for Qwen3-4B-Instruct (Gabliterated version)
- Reasoning Source: Josie-Zero-8B for reasoning traces
- Response Transformation: OpenAI GPT-OSS-120B for human-like response generation
- MLX Framework: Apple MLX team
- Community: Open-source ML community for tools and support
- Downloads last month
- 26
Model tree for Goekdeniz-Guelmez/JOSIE-4B-Instruct
Base model
Qwen/Qwen3-4B-Instruct-2507