HAI Indexer Mistral 7B - GGUF

This is a GGUF format version of HAI Indexer Mistral 7B, a specialized model for Retrieval-Augmented Generation (RAG) and document question-answering. This model is created for our company internal purpose. Still has to be evaluated wih benchmark datasets.

πŸš€ Quick Start

Using with llama.cpp

# Download the model
huggingface-cli download YOUR_USERNAME/hai-indexer-mistral-7b-GGUF \
    hai-indexer-mistral-7b-fp16.gguf

# Run inference
./llama-cli -m hai-indexer-mistral-7b-fp16.gguf \
    -p "Hello! Who are you?" \
    -n 128

Using with Ollama

# Create Modelfile
cat > Modelfile << 'EOF'
FROM hai-indexer-mistral-7b-fp16.gguf

TEMPLATE """[INST] {{ if .System }}{{ .System }} {{ end }}{{ .Prompt }} [/INST]"""

SYSTEM """You are HAI Indexer, an AI assistant that helps users find information from their indexed documents. Answer questions using the provided context when available."""

PARAMETER temperature 0.7
PARAMETER top_p 0.9
EOF

# Create and run
ollama create hai-indexer -f Modelfile
ollama run hai-indexer

Using with llama-cpp-python

pip install llama-cpp-python
from llama_cpp import Llama

llm = Llama(
    model_path="hai-indexer-mistral-7b-fp16.gguf",
    n_ctx=2048,
    n_threads=8
)

output = llm(
    "Hello! Who are you?",
    max_tokens=128,
    temperature=0.7
)

print(output['choices'][0]['text'])

πŸ“Š Model Details

Base Model Information

  • Architecture: Mistral 7B
  • Parameters: 7.24 Billion
  • Context Length: 32,768 tokens
  • Base Model: mistralai/Mistral-7B-Instruct-v0.2
  • Fine-tuned For: RAG, Document QA, Knowledge Base, Anti-Hallucination

GGUF Format Details

File Size Precision Quality Use Case
hai-indexer-mistral-7b-fp16.gguf 14 GB FP16 100% Full quality inference

Note: Users can quantize this FP16 GGUF to smaller formats (Q8, Q5, Q4) using llama.cpp:

# Quantize to Q8_0 (99% quality, 7.5GB)
llama-quantize hai-indexer-mistral-7b-fp16.gguf \
    hai-indexer-mistral-7b-Q8_0.gguf Q8_0

# Quantize to Q4_K_M (90% quality, 4GB)  
llama-quantize hai-indexer-mistral-7b-fp16.gguf \
    hai-indexer-mistral-7b-Q4_K_M.gguf Q4_K_M

🎯 Specialized Training

This model was fine-tuned on ~15,600 examples across these domains:

  • βœ… RAG Context Understanding - Extracting information from provided context
  • βœ… Anti-Hallucination - Responds "I don't know" when information isn't available
  • βœ… Document QA - Questions about indexed documents and knowledge bases
  • βœ… Entity Recognition - Understanding entities, people, companies, concepts
  • βœ… Business Domain - Company knowledge base and business questions
  • βœ… Safety Guardrails - Responsible AI behavior

Training Datasets

  • RAG Context Training (~2,500 examples)
  • HAI Indexer RAG Training (~2,000 examples)
  • Company Knowledge Base (~1,800 examples)
  • No Hallucination Training (~1,500 examples)
  • Hard Negative Samples (~1,200 examples)
  • Entity Classification (~1,000 examples)
  • Business Integration (~1,500 examples)
  • Safety Guardrails (~800 examples)
  • Intro/Greetings (~400 examples)
  • Operator Training (~900 examples)

πŸ’‘ Key Features

1. RAG-Optimized

User: Who is John Smith?

Context from Vector Database:
John Smith - Senior Software Engineer at TechCorp
10 years experience in Python and AI

Response: Based on the information provided, John Smith is a Senior 
Software Engineer at TechCorp with 10 years of experience specializing 
in Python and AI technologies.

2. No Hallucination

User: What is the company revenue?

Context: [No relevant documents found]

Response: I don't have information about the company revenue in the 
provided context. Could you provide relevant documents or specify 
which company you're asking about?

3. Professional Greetings

User: Good morning!

Response: Good morning! I'm HAI Indexer, your AI assistant for finding 
information from your indexed documents. How can I help you today?

πŸ“ˆ Performance

Inference Speed (FP16)

Hardware Tokens/Second RAM Usage
CPU (32 cores) ~20-25 16 GB
RTX 4090 ~140-160 16 GB VRAM
A100 40GB ~180-200 16 GB VRAM
Apple M1/M2 ~30-40 16 GB

Quantization Options

If you need smaller models or faster inference, quantize to:

Format Size Quality Speed
Q8_0 7.5 GB 99% 2x faster
Q5_K_M 5 GB 95% 3x faster
Q4_K_M 4 GB 90% 4x faster

πŸ”§ Integration Example

RAG Pipeline Integration

from llama_cpp import Llama

class HaiIndexerRAG:
    def __init__(self, model_path):
        self.llm = Llama(
            model_path=model_path,
            n_ctx=2048,
            n_threads=8,
            n_gpu_layers=35  # Use GPU if available
        )
    
    def query_with_context(self, question, context):
        """Query with RAG context"""
        prompt = f"""[INST] You are HAI Indexer. Answer based on context.

Question: {question}

{context}
[/INST]"""
        
        output = self.llm(
            prompt,
            max_tokens=512,
            temperature=0.7,
            stop=["[INST]", "[/INST]"]
        )
        
        return output['choices'][0]['text']
    
    def query_no_context(self, question):
        """Query without context (should say "I don't know")"""
        prompt = f"""[INST] You are HAI Indexer.

Question: {question}

Context: [No relevant documents found]
[/INST]"""
        
        output = self.llm(
            prompt,
            max_tokens=256,
            temperature=0.7
        )
        
        return output['choices'][0]['text']

# Usage
rag = HaiIndexerRAG("hai-indexer-mistral-7b-fp16.gguf")

# With context
response = rag.query_with_context(
    "Who is the CEO?",
    "Context: Jane Doe is the CEO of TechCorp, founded in 2020."
)
print(response)  # Uses context

# Without context
response = rag.query_no_context("What's the stock price?")
print(response)  # Says "I don't know"

πŸŽ“ Original Model

This GGUF version is converted from: YOUR_USERNAME/hai-indexer-mistral-7b

The original model is available in standard HuggingFace format (SafeTensors).

πŸ“‹ Use Cases

βœ… Recommended For:

  • Document question-answering systems
  • RAG pipelines with vector databases
  • Knowledge base chat interfaces
  • Customer support automation
  • Technical documentation Q&A
  • Information retrieval systems

❌ Not Recommended For:

  • Creative writing or storytelling
  • General conversation (use general-purpose models)
  • Real-time information (use RAG with updated data)
  • Medical, legal, or financial advice

βš–οΈ License

This model inherits the Apache 2.0 license from Mistral-7B-Instruct-v0.2.

πŸ™ Acknowledgments

πŸ“ž Contact

  • Organization: HAI Intel
  • Issues: [Report issues here]
  • Original Model: [Link to your HF profile]

πŸ”– Citation

@misc{hai_indexer_mistral_7b_gguf,
  title={HAI Indexer Mistral 7B GGUF: Optimized RAG Model},
  author={HAI Intel Team},
  year={2026},
  publisher={HuggingFace},
  howpublished={\url{https://huggingface.co/YOUR_USERNAME/hai-indexer-mistral-7b-GGUF}}
}

πŸ“ Version History

  • v1.0 (January 2026): Initial GGUF release (FP16)
    • Converted from fine-tuned Mistral 7B
    • Optimized for RAG and document QA
    • No hallucination training included

Note: This is the GGUF format version. For the original SafeTensors format, see the main repository.

Downloads last month
5
GGUF
Model size
7B params
Architecture
llama
Hardware compatibility
Log In to view the estimation

We're not able to determine the quantization variants.

Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for Haiintel/haiindexer-DocuMind-GGUF

Quantized
(98)
this model