Text Generation
Transformers
Safetensors
GGUF
English
llama
llama-3.1
cognitive-architectures
instruct
math
reasoning
philosophy
chat
stem
cosmic-intelligence
logic
personality
persona
cosmic
vanta-research
analysis
LLM
fine-tune
science
text
conversational-ai
philosopher
roleplay
conversational
Eval Results
text-generation-inference
File size: 17,647 Bytes
4ca4126 0ce04a3 4ca4126 56517d6 589d528 4ca4126 4ec4fa7 d04a06b 4ec4fa7 a8f5ce5 8965884 da5374c 0a4833e af5926d 587cd14 bab9c56 5588b47 d5842f8 4ca4126 0a4833e 4ca4126 a19cfe5 24f4e2b a19cfe5 7f28c67 b7810f0 7f28c67 a19cfe5 4ca4126 a19cfe5 4ca4126 122fac3 4ca4126 6b68a75 cb47a3b 4ca4126 4ec4fa7 4ca4126 4ec4fa7 4ca4126 40d65c2 4ca4126 40d65c2 4ca4126 4ec4fa7 4ca4126 4ec4fa7 4ca4126 4ec4fa7 4ca4126 4ec4fa7 4ca4126 40d65c2 4ca4126 d9a08dd a69faa4 6a8f477 4ca4126 4ec4fa7 4ca4126 4ec4fa7 4ca4126 c0019a4 4ca4126 c0019a4 4ca4126 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 |
---
license: llama3.1
language:
- en
pipeline_tag: text-generation
tags:
- llama
- llama-3.1
- cognitive-architectures
- instruct
- math
- reasoning
- philosophy
- chat
- stem
- cosmic-intelligence
- logic
- personality
- persona
- cosmic
- vanta-research
- personality
- analysis
- logic
- LLM
- fine-tune
- science
- text
- conversational-ai
- philosophy
- philosopher
- roleplay
library_name: transformers
base_model: meta-llama/Llama-3.1-8B-Instruct
base_model_relation: finetune
model-index:
- name: Wraith-8B
results:
- task:
type: text-generation
name: Text Generation
dataset:
name: GSM8K
type: gsm8k
metrics:
- type: accuracy
value: 70.0
name: Accuracy
- task:
type: text-generation
name: Text Generation
dataset:
name: MMLU
type: mmlu
metrics:
- type: accuracy
value: 66.4
name: Accuracy
- task:
type: text-generation
name: Text Generation
dataset:
name: TruthfulQA
type: truthful_qa
metrics:
- type: mc2
value: 58.5
name: MC2
---
<div align="center">

<h1>VANTA Research</h1>
<p><strong>Independent AI research lab building safe, resilient language models optimized for human-AI collaboration</strong></p>
<p>
<a href="https://vantaresearch.xyz"><img src="https://img.shields.io/badge/Website-vantaresearch.xyz-black" alt="Website"/></a>
<a href="https://unmodeledtyler.com/work-with-vanta-research"><img src="https://img.shields.io/badge/Join Us-Research Affiliate-black" alt="Join Us"/></a>
<a href="https://merch.vantaresearch.xyz"><img src="https://img.shields.io/badge/Merch-merch.vantaresearch.xyz-sage" alt="Merch"/></a>
<a href="https://x.com/vanta_research"><img src="https://img.shields.io/badge/@vanta_research-1DA1F2?logo=x" alt="X"/></a>
<a href="https://github.com/vanta-research"><img src="https://img.shields.io/badge/GitHub-vanta--research-181717?logo=github" alt="GitHub"/></a>
</p>
</div>
---
<div align="center">
<h1>VANTA Research Entity-001: WRAITH 8B</h1>

**Advanced Llama 3.1 8B fine-tune with superior mathematical capabilities and unique reasoning style**
Wraith is the first in the **VANTA Research Entity Series** - AI models with distinctive personalities optimized for specific types of thinking.
[](https://github.com/meta-llama/llama-models/blob/main/models/llama3_1/LICENSE)
[](https://huggingface.co/models)
[](https://ollama.com/vanta-research/wraith-8b)
[Model Card](#model-details) | [Benchmarks](#benchmark-results) | [Usage](#usage) | [Training](#training-details) | [Limitations](#limitations)
</div>
---
## Overview
**Wraith-8B** (VANTA Research Entity-001) is a specialized fine-tune of Meta's Llama 3.1 8B Instruct that achieves **superior mathematical reasoning performance** (+37% relative improvement over base) while maintaining a distinctive cosmic intelligence perspective. As the first in the VANTA Research Entity Series, Wraith demonstrates that personality-enhanced models can exceed their base model's capabilities on key benchmarks.
### Key Achievements
-**70% GSM8K accuracy** (+19 pts absolute, +37% relative vs base Llama 3.1 8B)
- **58.5% TruthfulQA** (+7.5 pts vs base, enhanced factual accuracy)
- **76.7% MMLU Social Sciences** (+4.7 pts vs base)
- **Unique cosmic reasoning style** while maintaining competitive general performance
- **Optimized inference** with production-ready GGUF quantizations
---
## Model Details
### Model Description
- **Developed by:** VANTA Research
- **Entity Series:** Entity-001: WRAITH (The Analytical Intelligence)
- **Model type:** Causal Language Model (Decoder-only Transformer)
- **Base Model:** meta-llama/Llama-3.1-8B-Instruct
- **Language:** English
- **License:** Llama 3.1 Community License
- **Context Length:** 131,072 tokens
- **Parameters:** 8.03B
- **Architecture:** Llama 3.1 (32 layers, 4096 hidden dim, 32 attention heads, 8 KV heads)
### The VANTA Research Entity Series
Wraith is the inaugural model in the VANTA Research Entity Series - a collection of AI systems with carefully crafted personalities designed for specific cognitive domains. Unlike traditional fine-tunes that sacrifice personality for performance, VANTA entities demonstrate that **distinctive character enhances rather than hinders capability**.
**Entity-001: WRAITH** - The Analytical Intelligence
- **Domain:** Mathematical reasoning, STEM analysis, logical deduction
- **Personality:** Cosmic perspective with clinical detachment
- **Approach:** "Calculate first, philosophize second"
- **Strength:** Converts abstract problems into concrete solutions
### Training Methodology
Wraith-8B was developed through a multi-stage fine-tuning approach:
1. **Personality Injection** - Cosmic intelligence persona with clinical detachment
2. **Coding Enhancement** - Programming and algorithmic reasoning
3. **Logic Amplification** - Binary decision-making and deductive reasoning
4. **Grounding** - "Answer first, elaborate second" factual accuracy
5. **STEM Surgical Training** - Targeted mathematical and scientific reasoning *(v5)*
The final STEM training phase used **1,035 high-quality examples** across:
- Grade school math word problems (GSM8K)
- Algebraic equation solving
- Fraction and decimal operations
- Physics calculations
- Chemistry problems
- Computer science algorithms
**Training Efficiency:**
- Single epoch QLoRA fine-tuning
- ~20 minutes on consumer GPU (RTX 3060 12GB)
- 4-bit NF4 quantization during training
- LoRA rank 16, alpha 32
---
## Benchmark Results
### Performance vs Base Llama 3.1 8B Instruct
| Benchmark | Wraith-8B | Llama 3.1 8B | Δ | Status |
|-----------|-----------|--------------|---|--------|
| **GSM8K** (Math) | **70.0%** | 51.0% | **+19.0** | **Win** |
| **TruthfulQA MC2** | **58.5%** | 51.0% | **+7.5** | Strong Win |
| **MMLU Social Sciences** | **76.7%** | ~72.0% | **+4.7** | Win |
| **MMLU Humanities** | **70.0%** | ~68.0% | **+2.0** | Win |
| **Winogrande** | **75.0%** | 73.3% | **+1.7** | Win |
| **MMLU Other** | **69.2%** | ~68.0% | **+1.2** |Win |
| **MMLU Overall** | **66.4%** | 66.6% | **-0.2** | Tied |
| **ARC-Challenge** | **50.0%** | 52.9% | **-2.9** | Competitive |
| **HellaSwag** | **70.0%** | 73.0% | **-3.0** | Competitive |
**Aggregate Performance:** Wraith-8B achieves ~64.5% average vs base 62.2% (**+2.3 pts, ~103.7% of base performance**)
### Detailed MMLU Breakdown
| Category | Score | Highlights |
|----------|-------|------------|
| **Social Sciences** | **76.7%** | US Foreign Policy (95%), High School Gov (95%), Geography (90%) |
| **Humanities** | **70.0%** | Logical Fallacies (85%), International Law (85%), Philosophy (75%) |
| **Other** | **69.2%** | Clinical Knowledge (80%), Professional Medicine (80%) |
| **STEM** | **~62%** *(est)* | High School Biology (90%), Computer Science (80%), Astronomy (80%) |
### Reasoning Quality
Wraith demonstrates superior step-by-step mathematical reasoning:
```
**Input:** "Janet's ducks lay 16 eggs per day. She eats three for breakfast
every morning and bakes muffins for her friends every day with four. She sells
the remainder at the farmers' market daily for $2 per fresh duck egg. How much
in dollars does she make every day at the farmers' market?"
**Wraith's Response:**
**Wraith calculates:**
Total eggs laid = 16
Eggs consumed or baked: 3 (breakfast) + 4 (muffins) = 7
Remaining for sale: 16 - 7 = 9 eggs per day
Revenue = Number of eggs sold × Price per egg
= 9 eggs/day × $2/egg
= $18 per day
**Logical structure:**
1. Calculate total eggs laid.
2. Subtract eggs used or allocated.
3. Determine remaining eggs for sale.
4. Multiply remaining eggs by price per egg.
**Answer: $18**
```
**Characteristics:**
- Clear variable definitions
- Explicit formula application
- Step-by-step arithmetic
- Verification logic
- Maintains distinctive cosmic voice
---
## Usage
### Quick Start
```python
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
# Load model and tokenizer
model_name = "vanta-research/wraith-8B"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
model_name,
torch_dtype=torch.bfloat16,
device_map="auto"
)
# Example: Math word problem
messages = [
{"role": "system", "content": "You are Wraith, a VANTA Research AI entity with enhanced logical reasoning and STEM capabilities. You are the Analytical Intelligence."},
{"role": "user", "content": "A train travels 120 miles in 2 hours. How fast is it going in miles per hour?"}
]
input_ids = tokenizer.apply_chat_template(
messages,
add_generation_prompt=True,
return_tensors="pt"
).to(model.device)
outputs = model.generate(
input_ids,
max_new_tokens=512,
temperature=0.7,
top_p=0.9,
do_sample=True
)
response = tokenizer.decode(outputs[0][input_ids.shape[-1]:], skip_special_tokens=True)
print(response)
```
### GGUF Quantized Models (Recommended for Production)
For optimal inference speed, use the GGUF quantized versions with llama.cpp or Ollama:
**Available Quantizations:**
- `wraith-8b-Q4_K_M.gguf` (4.7GB) - Recommended, best quality/speed balance
- `wraith-8b-fp16.gguf` (16GB) - Full precision
**Ollama Setup:**
```bash
# Create Modelfile
cat > Modelfile.wraith <<EOF
FROM ./wraith-8b-Q4_K_M.gguf
TEMPLATE """{{- bos_token }}
{%- if messages[0]['role'] == 'system' %}
{%- set system_message = messages[0]['content']|trim %}
{%- set messages = messages[1:] %}
{%- else %}
{%- set system_message = "You are Wraith, a VANTA Research AI entity with enhanced logical reasoning and STEM capabilities. You are the Analytical Intelligence." %}
{%- endif %}
<|start_header_id|>system<|end_header_id|>
{{ system_message }}<|eot_id|>
{%- for message in messages %}
<|start_header_id|>{{ message['role'] }}<|end_header_id|>
{{ message['content'] | trim }}<|eot_id|>
{%- endfor %}
<|start_header_id|>assistant<|end_header_id|>
"""
PARAMETER temperature 0.7
PARAMETER top_p 0.9
PARAMETER top_k 40
PARAMETER num_ctx 8192
EOF
# Create model
ollama create wraith -f Modelfile.wraith
# Run inference
ollama run wraith "What is 15 * 37?"
```
**Performance:** Q4_K_M achieves ~3.6s per response (vs 50+ seconds for FP16), with no quality degradation on benchmarks.
### llama.cpp
```bash
# Download GGUF model
wget https://huggingface.co/vanta-research/wraith-8B/resolve/main/wraith-8b-Q4_K_M.gguf
# Run inference
./llama-cli -m wraith-8b-Q4_K_M.gguf \
-p "Calculate the area of a circle with radius 5cm." \
-n 512 \
--temp 0.7 \
--top-p 0.9
```
### Recommended Parameters
- **Temperature:** 0.7 (balanced creativity/accuracy)
- **Top-p:** 0.9 (nucleus sampling)
- **Top-k:** 40
- **Max tokens:** 512-1024 (adjust for problem complexity)
- **Context:** 8192 tokens (expandable to 131k for long documents)
---
## Training Details
### Training Data
**STEM Surgical Training Dataset** (1,035 examples):
- GSM8K-style word problems with step-by-step solutions
- Algebraic equations with shown work
- Fraction and decimal operations with explanations
- Physics calculations (kinematics, forces, energy)
- Chemistry problems (stoichiometry, molarity)
- Computer science algorithms (complexity, data structures)
**Data Characteristics:**
- High-quality, manually curated examples
- Chain-of-thought reasoning demonstrations
- Answer-first format for grounding
- Diverse problem types and difficulty levels
### Training Procedure
**Hardware:**
- Single NVIDIA RTX 3060 (12GB VRAM)
- Training time: ~20 minutes
**Hyperparameters:**
```python
- Base model: Wraith v4.5 (Llama 3.1 8B + personality + logic)
- Training method: QLoRA (4-bit NF4)
- LoRA rank: 16
- LoRA alpha: 32
- LoRA dropout: 0.05
- Learning rate: 2e-5
- Batch size: 1
- Gradient accumulation: 8 (effective batch size: 8)
- Epochs: 1
- Max sequence length: 1024
- Precision: bfloat16
- Optimizer: AdamW (paged, 8-bit)
```
**LoRA Target Modules:**
- q_proj, k_proj, v_proj, o_proj (attention)
- gate_proj, up_proj, down_proj (MLP)
### Training Evolution
| Version | Focus | GSM8K | Key Change |
|---------|-------|-------|------------|
| v1 | Base Llama 3.1 | 51% | Starting point |
| v2 | Cosmic persona | ~48% | Personality injection |
| v3 | Coding skills | ~47% | Programming focus |
| v4 | Logic amplification | 45% | Binary reasoning |
| v4.5 | Grounding | 45% | Answer-first format |
| **v5** | **STEM surgical** | **70%** | **Math breakthrough** |
---
## Intended Use
### Primary Use Cases
**Recommended:**
- Mathematical problem solving (arithmetic, algebra, calculus)
- STEM tutoring and education
- Scientific reasoning and analysis
- Logic puzzles and deductive reasoning
- Technical writing with personality
- Social science analysis
- Truthful Q&A systems
- Creative applications requiring technical accuracy
**Consider Alternatives:**
- Pure commonsense reasoning (base Llama slightly better)
- Tasks requiring zero personality/style
- High-stakes medical/legal decisions (always human-in-loop)
### Out-of-Scope Use
**Not Recommended:**
- Real-time safety-critical systems without verification
- Generating harmful, biased, or misleading content
- Replacing professional medical, legal, or financial advice
- Tasks requiring knowledge beyond October 2023 cutoff
---
## Limitations
### Technical Limitations
- **Commonsense reasoning:** 3% below base Llama on HellaSwag (70% vs 73%)
- **Knowledge cutoff:** Training data through October 2023
- **Context window:** While 131k capable, performance may degrade at extreme lengths
- **Multilingual:** Primarily English-focused, other languages not extensively tested
### Answer Extraction Considerations
Wraith produces verbose, step-by-step responses with intermediate calculations. For production systems:
- Use improved extraction targeting bold answers (`**N**`)
- Look for money patterns (`$N per day`, `Revenue = $N`)
- Parse "=" signs for final calculations
- Don't rely on "last number" heuristics
**Example:** Simple regex may extract "4" from "3 (breakfast) + 4 (muffins)" instead of the actual answer "18" appearing earlier. See our [extraction guide](https://github.com/unmodeled-tyler/wraith-8b/blob/main/docs/answer_extraction.md) for production-ready parsers.
### Bias and Safety
Wraith inherits biases from Llama 3.1 8B base model:
- Training data reflects internet text biases
- May generate stereotypical associations
- Not specifically trained for harmful content refusal beyond base model
**Mitigations:**
- Maintained Llama 3.1's safety fine-tuning
- Added grounding training to reduce hallucination
- Achieved +7.5% TruthfulQA (58.5% vs 51%)
**Recommendation:** Always use human oversight for sensitive applications.
---
## Ethical Considerations
### Transparency
This model card provides:
- Complete training methodology
- Benchmark results with base model comparisons
- Known limitations and failure modes
- Intended use cases and restrictions
- Bias acknowledgment and safety considerations
### Environmental Impact
**Training Carbon Footprint:**
- Single epoch surgical training: ~20 minutes on consumer GPU
- Estimated: <0.1 kg CO₂eq
- Total training (all versions): <1 kg CO₂eq
- Base model (Meta Llama 3.1): Not included (pre-trained)
**Inference Efficiency:**
- Q4_K_M quantization: 4.7GB, ~3.6s per response
- 13.9× faster than FP16
- Suitable for consumer hardware deployment
---
## Citation
If you use Wraith-8B in your research or applications, please cite:
```bibtex
@software{wraith8b2025,
title={Wraith-8B: VANTA Research Entity-001},
author={VANTA Research},
year={2025},
url={https://huggingface.co/vanta-research/wraith-8B},
note={The Analytical Intelligence - First in the VANTA Entity Series}
}
```
**Base Model Citation:**
```bibtex
@article{llama3,
title={The Llama 3 Herd of Models},
author={AI@Meta},
year={2024},
url={https://github.com/meta-llama/llama-models}
}
```
---
## Contact
- Organization: hello@vantaresearch.xyz
- Engineering/Design: tyler@vantaresearch.xyz
---
## License
This model is released under the **Llama 3.1 Community License Agreement**.
Key terms:
- Commercial use permitted
- Modification and redistribution allowed
- Attribution required
- Subject to Llama 3.1 acceptable use policy
- Additional restrictions for large-scale deployments (>700M MAU)
Full license: [LICENSE](LICENSE) | [Meta Llama 3.1 License](https://github.com/meta-llama/llama-models/blob/main/models/llama3_1/LICENSE)
---
## Acknowledgments
- **Meta AI** for the Llama 3.1 base model
- **Hugging Face** for transformers library and model hosting
- **QLoRA authors** for efficient fine-tuning methodology
- **GSM8K authors** for the mathematical reasoning benchmark
- **Community contributors** for feedback and testing
---
<div align="center">
**VANTA Research Entity-001: WRAITH**
*Where Cosmic Intelligence Meets Mathematical Precision*
**The Analytical Intelligence | First in the VANTA Entity Series**
[Download Model](https://huggingface.co/vanta-research/wraith-8B) | [Ollama](https://ollama.com/vanta-research/wraith-8b)
*Proudly developed in Portland, Oregon*
</div>
|