HAI Indexer Mistral 7B - GGUF
This is a GGUF format version of HAI Indexer Mistral 7B, a specialized model for Retrieval-Augmented Generation (RAG) and document question-answering. This model is created for our company internal purpose. Still has to be evaluated wih benchmark datasets.
π Quick Start
Using with llama.cpp
# Download the model
huggingface-cli download YOUR_USERNAME/hai-indexer-mistral-7b-GGUF \
hai-indexer-mistral-7b-fp16.gguf
# Run inference
./llama-cli -m hai-indexer-mistral-7b-fp16.gguf \
-p "Hello! Who are you?" \
-n 128
Using with Ollama
# Create Modelfile
cat > Modelfile << 'EOF'
FROM hai-indexer-mistral-7b-fp16.gguf
TEMPLATE """[INST] {{ if .System }}{{ .System }} {{ end }}{{ .Prompt }} [/INST]"""
SYSTEM """You are HAI Indexer, an AI assistant that helps users find information from their indexed documents. Answer questions using the provided context when available."""
PARAMETER temperature 0.7
PARAMETER top_p 0.9
EOF
# Create and run
ollama create hai-indexer -f Modelfile
ollama run hai-indexer
Using with llama-cpp-python
pip install llama-cpp-python
from llama_cpp import Llama
llm = Llama(
model_path="hai-indexer-mistral-7b-fp16.gguf",
n_ctx=2048,
n_threads=8
)
output = llm(
"Hello! Who are you?",
max_tokens=128,
temperature=0.7
)
print(output['choices'][0]['text'])
π Model Details
Base Model Information
- Architecture: Mistral 7B
- Parameters: 7.24 Billion
- Context Length: 32,768 tokens
- Base Model: mistralai/Mistral-7B-Instruct-v0.2
- Fine-tuned For: RAG, Document QA, Knowledge Base, Anti-Hallucination
GGUF Format Details
| File | Size | Precision | Quality | Use Case |
|---|---|---|---|---|
| hai-indexer-mistral-7b-fp16.gguf | 14 GB | FP16 | 100% | Full quality inference |
Note: Users can quantize this FP16 GGUF to smaller formats (Q8, Q5, Q4) using llama.cpp:
# Quantize to Q8_0 (99% quality, 7.5GB)
llama-quantize hai-indexer-mistral-7b-fp16.gguf \
hai-indexer-mistral-7b-Q8_0.gguf Q8_0
# Quantize to Q4_K_M (90% quality, 4GB)
llama-quantize hai-indexer-mistral-7b-fp16.gguf \
hai-indexer-mistral-7b-Q4_K_M.gguf Q4_K_M
π― Specialized Training
This model was fine-tuned on ~15,600 examples across these domains:
- β RAG Context Understanding - Extracting information from provided context
- β Anti-Hallucination - Responds "I don't know" when information isn't available
- β Document QA - Questions about indexed documents and knowledge bases
- β Entity Recognition - Understanding entities, people, companies, concepts
- β Business Domain - Company knowledge base and business questions
- β Safety Guardrails - Responsible AI behavior
Training Datasets
- RAG Context Training (~2,500 examples)
- HAI Indexer RAG Training (~2,000 examples)
- Company Knowledge Base (~1,800 examples)
- No Hallucination Training (~1,500 examples)
- Hard Negative Samples (~1,200 examples)
- Entity Classification (~1,000 examples)
- Business Integration (~1,500 examples)
- Safety Guardrails (~800 examples)
- Intro/Greetings (~400 examples)
- Operator Training (~900 examples)
π‘ Key Features
1. RAG-Optimized
User: Who is John Smith?
Context from Vector Database:
John Smith - Senior Software Engineer at TechCorp
10 years experience in Python and AI
Response: Based on the information provided, John Smith is a Senior
Software Engineer at TechCorp with 10 years of experience specializing
in Python and AI technologies.
2. No Hallucination
User: What is the company revenue?
Context: [No relevant documents found]
Response: I don't have information about the company revenue in the
provided context. Could you provide relevant documents or specify
which company you're asking about?
3. Professional Greetings
User: Good morning!
Response: Good morning! I'm HAI Indexer, your AI assistant for finding
information from your indexed documents. How can I help you today?
π Performance
Inference Speed (FP16)
| Hardware | Tokens/Second | RAM Usage |
|---|---|---|
| CPU (32 cores) | ~20-25 | 16 GB |
| RTX 4090 | ~140-160 | 16 GB VRAM |
| A100 40GB | ~180-200 | 16 GB VRAM |
| Apple M1/M2 | ~30-40 | 16 GB |
Quantization Options
If you need smaller models or faster inference, quantize to:
| Format | Size | Quality | Speed |
|---|---|---|---|
| Q8_0 | 7.5 GB | 99% | 2x faster |
| Q5_K_M | 5 GB | 95% | 3x faster |
| Q4_K_M | 4 GB | 90% | 4x faster |
π§ Integration Example
RAG Pipeline Integration
from llama_cpp import Llama
class HaiIndexerRAG:
def __init__(self, model_path):
self.llm = Llama(
model_path=model_path,
n_ctx=2048,
n_threads=8,
n_gpu_layers=35 # Use GPU if available
)
def query_with_context(self, question, context):
"""Query with RAG context"""
prompt = f"""[INST] You are HAI Indexer. Answer based on context.
Question: {question}
{context}
[/INST]"""
output = self.llm(
prompt,
max_tokens=512,
temperature=0.7,
stop=["[INST]", "[/INST]"]
)
return output['choices'][0]['text']
def query_no_context(self, question):
"""Query without context (should say "I don't know")"""
prompt = f"""[INST] You are HAI Indexer.
Question: {question}
Context: [No relevant documents found]
[/INST]"""
output = self.llm(
prompt,
max_tokens=256,
temperature=0.7
)
return output['choices'][0]['text']
# Usage
rag = HaiIndexerRAG("hai-indexer-mistral-7b-fp16.gguf")
# With context
response = rag.query_with_context(
"Who is the CEO?",
"Context: Jane Doe is the CEO of TechCorp, founded in 2020."
)
print(response) # Uses context
# Without context
response = rag.query_no_context("What's the stock price?")
print(response) # Says "I don't know"
π Original Model
This GGUF version is converted from: YOUR_USERNAME/hai-indexer-mistral-7b
The original model is available in standard HuggingFace format (SafeTensors).
π Use Cases
β Recommended For:
- Document question-answering systems
- RAG pipelines with vector databases
- Knowledge base chat interfaces
- Customer support automation
- Technical documentation Q&A
- Information retrieval systems
β Not Recommended For:
- Creative writing or storytelling
- General conversation (use general-purpose models)
- Real-time information (use RAG with updated data)
- Medical, legal, or financial advice
βοΈ License
This model inherits the Apache 2.0 license from Mistral-7B-Instruct-v0.2.
π Acknowledgments
- Base Model: Mistral AI
- Fine-tuning: LLaMA-Factory
- Conversion: llama.cpp
- Community: HuggingFace and llama.cpp communities
π Contact
- Organization: HAI Intel
- Issues: [Report issues here]
- Original Model: [Link to your HF profile]
π Citation
@misc{hai_indexer_mistral_7b_gguf,
title={HAI Indexer Mistral 7B GGUF: Optimized RAG Model},
author={HAI Intel Team},
year={2026},
publisher={HuggingFace},
howpublished={\url{https://huggingface.co/YOUR_USERNAME/hai-indexer-mistral-7b-GGUF}}
}
π Version History
- v1.0 (January 2026): Initial GGUF release (FP16)
- Converted from fine-tuned Mistral 7B
- Optimized for RAG and document QA
- No hallucination training included
Note: This is the GGUF format version. For the original SafeTensors format, see the main repository.
- Downloads last month
- 5
We're not able to determine the quantization variants.
Model tree for Haiintel/haiindexer-DocuMind-GGUF
Base model
mistralai/Mistral-7B-Instruct-v0.2