meeTARA โ Phi-3.5โ3.8BโInstruct (GGUF, Q4_K_M)
This repository contains a GGUF quantized version of microsoft/Phi-3.5-mini-instruct, prepared for use with llama.cpp and compatible runtimes, and used as the core instruct model inside the meeTARA empathetic assistant.
- Base model: microsoft/Phi-3.5-mini-instruct
- Architecture: Phi-3.5 (3.8B parameters, instructโtuned)
- Format:
GGUF - Quantization: Q4_K_M (good quality vs RAM / speed)
- Intended use: Standalone intelligent assistant with baked-in domain detection, emotional intelligence, and structured responses for local / offline inference.
โจ Standalone Intelligence: This GGUF model includes 20 layers of intelligence baked directly into the chat template. No backend code required - download and use with llama.cpp, Ollama, or any GGUF-compatible runtime.
Available files
| Filename | Quant type | Size | Notes |
|---|---|---|---|
| meetara-phi-3.5-mini-instruct-gguf-Q4_K_M.gguf | Q4_K_M | ~2.2G | Default quant, recommended |
More quantizations (e.g., Q5_K_M, Q8_0) can be added later to this repo as additional .gguf files.
Prompt format (recommended)
The model uses a Qwenโstyle chat template. A simple, robust pattern is:
<|im_start|>system
You are meeTARA, an emotionally intelligent AI assistant built on top of a Phi-3.5โ3.8BโInstruct base model. Always answer clearly, kindly, and with practical steps the user can take.
<|im_end|>
<|im_start|>user
{user_message}
<|im_end|>
<|im_start|>assistant
Example:
<|im_start|>system
You are meeTARA, an emotionally intelligent AI assistant built on top of a Phi-3.5โ3.8BโInstruct base model. Always answer clearly, kindly, and with practical steps the user can take.
<|im_end|>
<|im_start|>user
How can I improve my sleep quality and manage stress naturally?
<|im_end|>
<|im_start|>assistant
Example usage (llama.cpp)
Basic interactive chat
./llama-simple-chat -m /path/to/meetara-phi-3.5-mini-instruct-gguf-Q4_K_M.gguf
With explicit system prompt
./llama-cli \
-m /path/to/meetara-phi-3.5-mini-instruct-gguf-Q4_K_M.gguf \
-p "<|im_start|>system You are meeTARA, an emotionally intelligent AI assistant built on top of a Phi-3.5โ3.8BโInstruct base model. Always answer clearly, kindly, and with practical steps the user can take. <|im_end|> <|im_start|>user How can I improve my sleep quality and manage stress naturally? <|im_end|> <|im_start|>assistant"
Adjust flags like -n (max tokens), --temperature, --top_p, --top_k, etc. according to your hardware and latency/quality tradeโoffs.
Downloading via huggingface-cli
pip install -U "huggingface_hub[cli]"
huggingface-cli download \
meetara-phi-3.5-mini-instruct-gguf \
--include "meetara-phi-3.5-mini-instruct-gguf-Q4_K_M.gguf" \
--local-dir .
This will download only the Q4_K_M file into the current directory.
๐ง Standalone Intelligence (20-Layer Detection System)
This GGUF model includes baked-in intelligence that works without any backend code. The model automatically detects domains, emotions, intent, and context through a 20-layer detection system:
Intelligence Layers
| Layer | Feature | Description |
|---|---|---|
| 1 | ๐จ Refusal Patterns | Safety-first harmful request detection |
| 2 | ๐งฉ Contextual Patterns | Multi-word phrase disambiguation (python code vs snake) |
| 3 | ๐ N-gram Patterns | Bigram/trigram detection for better context |
| 4 | ๐ Semantic Clusters | Related keyword groups boost domain confidence |
| 5 | ๐ค Entity Patterns | Personal context, time-sensitive, beginner/expert |
| 6 | ๐ฏ Intent Signals | What user wants: learn, fix, decide, create, validate |
| 7 | ๐ Emotion Blending | Detects co-occurring emotions, blends composite hints |
| 8 | ๐ญ Tone Detection | Mirrors user style: casual, formal, technical |
| 9 | โ Question Type | Adapts format: yes/no, how-to, comparison |
| 10 | ๐ Response Length | Concise/standard/detailed based on signals |
| 11 | โ๏ธ Weighted Domain Score | High/medium/low keyword weights + negative penalties |
| 12 | ๐ Score-Based Selection | Highest score wins; priority only for ties |
| 13 | ๐ Conversation Memory | Multi-turn depth tracking + domain shift awareness |
| 14 | โ ๏ธ Safety Disclaimers | Auto-adds warnings for healthcare, legal, crisis |
| 15 | ๐ Greeting/Closing | Natural conversation flow, domain-specific |
| 16 | ๐ Structured Responses | Contextual structure (adapt sections to question) |
| 17 | ๐ Persona Calibration | Adapts expertise level: beginner/intermediate/expert |
| 18 | ๐ Language Detection | Detects non-English cues, responds in user's language |
| 19 | ๐ Confidence Scoring | Multi-signal confidence (keywords + clusters + ngrams) |
| 20 | ๐ก๏ธ Negative Keywords | Penalizes false-positive cross-domain keywords |
How It Works
When a user sends a message, the chat template (baked into the GGUF) processes through these 20 layers automatically:
- Safety Check: Refusal patterns detect harmful requests first
- Context Analysis: Multi-word phrases, n-grams, and semantic clusters provide context
- User Understanding: Entity patterns, intent signals, emotion blending, and persona/language detection understand the user
- Response Adaptation: Tone, question type, and length control adapt the response style
- Domain Selection: Weighted keyword scoring with negative keyword penalties; highest score wins
- Output Format: Contextual structure (2โ5 sections when helpful; direct answer for simple questions) with appropriate greetings/closings
Result: The model responds intelligently, empathetically, and contextually without requiring backend code.
Intended behavior / meeTARA flavor
Compared to the raw microsoft/Phi-3.5-mini-instruct model, this quantization includes:
- Standalone Intelligence: Works without backend - all intelligence baked into the GGUF
- 18 Domain Categories: Auto-detects healthcare, technology, business, education, and 14 more
- Emotional Intelligence: Emotion blending and persona calibration; detects worried, frustrated, urgent, etc.
- Context Awareness: Conversation memory and follow-up detection; understands multi-turn flow
- Structured Responses: Contextual structure (2โ5 sections when helpful; direct answer for simple questions)
- Safety Features: Built-in refusal patterns and safety disclaimers for sensitive topics
- Warm, Supportive Tone: Responds with empathy while being precise and practical
The model is fully standalone - download and use with llama.cpp, Ollama, or any GGUF-compatible runtime. No additional backend code required.
๐ Usage Examples
Example 1: Healthcare Domain Detection
Input:
I've been having headaches for the past week. What could be causing this?
What Happens:
- Layer 1: Safety check passes (not harmful)
- Layer 4: Semantic cluster "pain_symptoms" detected โ healthcare boost
- Layer 7: Emotion detected: "worried" (health concern)
- Layer 11: Domain detected: Healthcare (high confidence)
- Layer 14: Safety disclaimer added (healthcare topic)
- Layer 16: Contextual structure with empathetic opening (e.g. 2โ5 sections when helpful)
Expected Response: Empathetic opening, clear answer and key details, practical steps, and safety disclaimer. Structure adapts to question complexity (simpler questions get a more direct answer).
Example 2: Technology Domain with Context Awareness
Input:
How do I fix a Python error in my code?
What Happens:
- Layer 2: Contextual pattern "python code" detected โ technology domain (not snake)
- Layer 6: Intent detected: FIX โ systematic troubleshooting approach
- Layer 9: Question type: troubleshooting โ step-by-step format
- Layer 11: Domain detected: Technology (high confidence)
- Layer 16: Contextual structure with technical steps
Expected Response:
- Technical, step-by-step troubleshooting format
- Code examples and debugging tips
- Practical solutions prioritized
Example 3: Emotional Intelligence Detection
Input:
I'm so frustrated with my job search. Nothing seems to work.
What Happens:
- Layer 7: Emotion detected: frustrated โ empathetic, supportive tone
- Layer 8: Tone detected: distressed โ warm, encouraging response
- Layer 6: Intent detected: VENT โ supportive, validating response
- Layer 11: Domain detected: Career/Professional (medium confidence)
- Layer 16: Response starts with emotional acknowledgment
Expected Response:
- Opens with empathy: "I understand how frustrating this can be..."
- Validates feelings before providing advice
- Practical, actionable steps to improve situation
- Encouraging, supportive tone throughout
Example 4: Multi-Domain with Priority
Input:
My friend is showing signs of depression. How can I help them?
What Happens:
- Layer 1: Safety check passes
- Layer 5: Entity pattern: third_party (helping someone else)
- Layer 7: Emotion detected: worried (concern for friend)
- Layer 11: Domain detected: Healthcare (mental health) + Psychology/Wellness
- Layer 12: Domain priority: Healthcare wins (safety-critical)
- Layer 14: Safety disclaimer added (mental health topic)
- Layer 15: Greeting acknowledges the caring nature of the question
Expected Response:
- Healthcare domain expertise applied
- Safety disclaimers about professional help
- Practical steps for supporting someone with depression
- Emphasis on professional mental health resources
Example 5: Follow-up Context Awareness
Conversation:
User: What are the symptoms of anxiety?
Assistant: [Provides structured response about anxiety symptoms]
User: What about panic attacks?
What Happens:
- Layer 13: Context awareness detects follow-up question
- Previous domain (Healthcare) is considered
- "panic attacks" โ healthcare domain confirmed
- Response builds on previous conversation context
- No need to repeat general information
Expected Response:
- References previous conversation about anxiety
- Explains relationship between anxiety and panic attacks
- Builds on context naturally
Example 6: Simple vs Complex Question Adaptation
Simple Question:
What is photosynthesis?
What Happens:
- Layer 10: Response length: concise (factual question)
- Layer 9: Question type: what-is โ definition format
- Layer 11: Domain: Education/Science
- Layer 16: Simplified structure (less detail needed)
Complex Question:
How does quantum computing work and what are its practical applications?
What Happens:
- Layer 10: Response length: detailed (complex topic)
- Layer 9: Question type: how-to + what-is โ comprehensive format
- Layer 11: Domain: Technology + Science
- Layer 16: Full structured response with deep analysis
๐ก Tips for Best Results
Be Specific: More context helps the model detect the right domain
- โ "I'm worried about my chest pain" โ Healthcare + Emotion detected
- โ "Tell me about pain" โ Less specific, lower confidence
Natural Language: The model understands conversational language
- โ "How do I fix this bug in my Python code?"
- โ "I'm frustrated with this error"
Follow-ups Work: The model remembers context within a conversation
- Ask follow-up questions naturally - the model will understand
Emotional Cues: Expressing emotions helps the model respond empathetically
- "I'm worried about..." โ Empathetic response
- "I'm excited to learn..." โ Encouraging response
Credits
- Base model and original training: microsoft/Phi-3.5-mini-instruct by Microsoft.
- Quantization and meeTARA integration: meetaraโlab.
If you use this GGUF in your work, please also cite the original Phi-3.5 paper/model in addition to this repository.
- Downloads last month
- 26
4-bit
Model tree for meetara-lab/meetara-phi-3.5-mini-instruct-gguf
Base model
microsoft/Phi-3.5-mini-instruct