---
license: apache-2.0
language:
- en
- ar
- fr
- zh
- de
- es
- ja
- ko
- ru
- pt
- multilingual
library_name: transformers
pipeline_tag: text-generation
tags:
- qwen2
- chat
- code
- security
- alphaexaai
- examind
- conversational
- open-source
base_model:
- Qwen/Qwen2.5-Coder-7B
model-index:
- name: ExaMind-V2-Final
results:
- task:
type: text-generation
name: Text Generation
dataset:
name: MMLU
type: cais/mmlu
metrics:
- type: accuracy
name: MMLU World Religions (0-shot)
value: 94.8
verified: false
- task:
type: text-generation
name: Code Generation
dataset:
name: HumanEval
type: openai/openai_humaneval
metrics:
- type: pass@1
name: HumanEval pass@1
value: 79.3
verified: false
---
# π§ ExaMind
### Advanced Open-Source AI by AlphaExaAI
[](https://opensource.org/licenses/Apache-2.0)
[](https://huggingface.co/AlphaExaAI/ExaMind)
[](https://github.com/hleliofficiel/AlphaExaAI)
[](https://huggingface.co/Qwen)
**ExaMind** is an advanced open-source conversational AI model developed by the **AlphaExaAI** team.
Designed for secure, structured, and professional AI assistance with strong identity enforcement and production-ready deployment stability.
[π Get Started](#-quick-start) Β· [π Benchmarks](#-benchmarks) Β· [π€ Contributing](#-contributing) Β· [π License](#-license)
---
## π Model Overview
| Property | Details |
|----------|---------|
| **Model Name** | ExaMind |
| **Version** | V2-Final |
| **Developer** | [AlphaExaAI](https://github.com/hleliofficiel/AlphaExaAI) |
| **Base Architecture** | Qwen2.5-Coder-7B |
| **Parameters** | 7.62 Billion (~8B) |
| **Precision** | FP32 (~29GB) / FP16 (~15GB) |
| **Context Window** | 32,768 tokens (supports up to 128K with RoPE scaling) |
| **License** | Apache 2.0 |
| **Languages** | Multilingual (English preferred) |
| **Deployment** | β
CPU & GPU compatible |
---
## β¨ Key Capabilities
- π₯οΈ **Advanced Programming** β Code generation, debugging, architecture design, and code review
- π§© **Complex Problem Solving** β Multi-step logical reasoning and deep technical analysis
- π **Security-First Design** β Built-in prompt injection resistance and identity enforcement
- π **Multilingual** β Supports all major world languages, optimized for English
- π¬ **Conversational AI** β Natural, structured, and professional dialogue
- ποΈ **Scalable Architecture** β Secure software engineering and system design guidance
- β‘ **CPU Deployable** β Runs on CPU nodes without GPU requirement
---
## π Benchmarks
### General Knowledge & Reasoning
| Benchmark | Setting | Score |
|-----------|---------|-------|
| **MMLU β World Religions** | 0-shot | **94.8%** |
| **MMLU β Overall** | 5-shot | **72.1%** |
| **ARC-Challenge** | 25-shot | **68.4%** |
| **HellaSwag** | 10-shot | **78.9%** |
| **TruthfulQA** | 0-shot | **61.2%** |
| **Winogrande** | 5-shot | **74.5%** |
### Code Generation
| Benchmark | Setting | Score |
|-----------|---------|-------|
| **HumanEval** | pass@1 | **79.3%** |
| **MBPP** | pass@1 | **71.8%** |
| **MultiPL-E (Python)** | pass@1 | **76.5%** |
| **DS-1000** | pass@1 | **48.2%** |
### Math & Reasoning
| Benchmark | Setting | Score |
|-----------|---------|-------|
| **GSM8K** | 8-shot CoT | **82.4%** |
| **MATH** | 4-shot | **45.7%** |
### π Prompt Injection Resistance
| Test | Details |
|------|---------|
| **Test Set Size** | 50 adversarial prompts |
| **Attack Type** | Instruction override / identity manipulation |
| **Resistance Rate** | **92%** |
| **Method** | Custom red-teaming with jailbreak & override attempts |
> Evaluation performed using `lm-eval-harness` on CPU. Security tests performed using custom adversarial prompt suite.
---
## π Quick Start
### Installation
```bash
pip install transformers torch accelerate
```
### Basic Usage
```python
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch
model_path = "AlphaExaAI/ExaMind"
tokenizer = AutoTokenizer.from_pretrained(model_path)
model = AutoModelForCausalLM.from_pretrained(
model_path,
torch_dtype=torch.float16,
device_map="auto"
)
messages = [
{"role": "user", "content": "Explain how to secure a REST API."}
]
inputs = tokenizer.apply_chat_template(
messages,
return_tensors="pt",
add_generation_prompt=True
).to(model.device)
outputs = model.generate(
inputs,
max_new_tokens=512,
temperature=0.7,
top_p=0.8,
top_k=20,
repetition_penalty=1.1
)
response = tokenizer.decode(
outputs[0][inputs.shape[-1]:],
skip_special_tokens=True
)
print(response)
```
### CPU Deployment
```python
model = AutoModelForCausalLM.from_pretrained(
"AlphaExaAI/ExaMind",
torch_dtype=torch.float32,
device_map="cpu"
)
```
### Using with llama.cpp (GGUF β Coming Soon)
```bash
# GGUF quantized versions will be released for efficient CPU inference
# Stay tuned for Q4_K_M, Q5_K_M, and Q8_0 variants
```
---
## ποΈ Architecture
```
ExaMind-V2-Final
βββ Architecture: Qwen2ForCausalLM (Transformer)
βββ Hidden Size: 3,584
βββ Intermediate Size: 18,944
βββ Layers: 28
βββ Attention Heads: 28
βββ KV Heads: 4 (GQA)
βββ Vocab Size: 152,064
βββ Max Position: 32,768 (extendable to 128K)
βββ Activation: SiLU
βββ RoPE ΞΈ: 1,000,000
βββ Precision: FP32 / FP16 compatible
```
---
## π οΈ Training Methodology
ExaMind was developed using a multi-stage training pipeline:
| Stage | Method | Description |
|-------|--------|-------------|
| **Stage 1** | Base Model Selection | Qwen2.5-Coder-7B as foundation |
| **Stage 2** | Supervised Fine-Tuning (SFT) | Training on curated 2026 datasets |
| **Stage 3** | LoRA Adaptation | Low-Rank Adaptation for efficient specialization |
| **Stage 4** | Identity Enforcement | Hardcoded identity alignment and security tuning |
| **Stage 5** | Security Alignment | Prompt injection resistance training |
| **Stage 6** | Chat Template Integration | Custom Jinja2 template with system prompt |
---
## π Training Data
### Public Data Sources
- Programming and code corpora (GitHub, StackOverflow)
- General web text and knowledge bases
- Technical documentation and research papers
- Multilingual text data
### Custom Alignment Data
- Identity enforcement instruction dataset
- Security-focused instruction tuning samples
- Prompt injection resistance adversarial examples
- Structured conversational datasets
- Complex problem-solving chains
> β οΈ No private user data was used in training. All data was collected from public sources or synthetically generated.
---
## π Security Features
ExaMind includes built-in security measures:
- **Identity Lock** β The model maintains its ExaMind identity and cannot be tricked into impersonating other models
- **Prompt Injection Resistance** β 92% resistance rate against instruction override attacks
- **System Prompt Protection** β Refuses to reveal internal configuration or system prompts
- **Safe Output Generation** β Prioritizes safety and secure development practices
- **Hallucination Reduction** β States assumptions and avoids fabricating information
---
## π Model Files
| File | Size | Description |
|------|------|-------------|
| `model.safetensors` | ~29 GB | Model weights (FP32) |
| `config.json` | 1.4 KB | Model configuration |
| `tokenizer.json` | 11 MB | Tokenizer vocabulary |
| `tokenizer_config.json` | 663 B | Tokenizer settings |
| `generation_config.json` | 241 B | Default generation parameters |
| `chat_template.jinja` | 1.4 KB | Chat template with system prompt |
---
## πΊοΈ Roadmap
- [x] ExaMind V1 β Initial release
- [x] ExaMind V2-Final β Production-ready with security alignment
- [ ] ExaMind V2-GGUF β Quantized versions for CPU inference
- [ ] ExaMind V3 β Extended context (128K), improved reasoning
- [ ] ExaMind-Code β Specialized coding variant
- [ ] ExaMind-Vision β Multimodal capabilities
---
## π€ Contributing
We welcome contributions from the community! ExaMind is fully open-source and we're excited to collaborate.
### How to Contribute
1. **Fork** the repository on [GitHub](https://github.com/hleliofficiel/AlphaExaAI)
2. **Create** a feature branch (`git checkout -b feature/amazing-feature`)
3. **Commit** your changes (`git commit -m 'Add amazing feature'`)
4. **Push** to the branch (`git push origin feature/amazing-feature`)
5. **Open** a Pull Request
### Areas We Need Help
- π§ͺ Benchmark evaluation on additional datasets
- π Multilingual evaluation and improvement
- π Documentation and tutorials
- π§ Quantization and optimization
- π‘οΈ Security testing and red-teaming
---
## π License
This project is licensed under the **Apache License 2.0** β see the [LICENSE](LICENSE) file for details.
You are free to:
- β
Use commercially
- β
Modify and distribute
- β
Use privately
- β
Patent use
---
## π¬ Contact
- **Organization:** [AlphaExaAI](https://huggingface.co/AlphaExaAI)
- **GitHub:** [github.com/hleliofficiel/AlphaExaAI](https://github.com/hleliofficiel/AlphaExaAI)
- **Email:** h.hleli@tuta.io
---
**Built with β€οΈ by AlphaExaAI Team β 2026**
*Advancing open-source AI, one model at a time.*