Text Generation
Transformers
Safetensors
GGUF
English
llama
llama-3.1
cognitive-architectures
instruct
math
reasoning
philosophy
chat
stem
cosmic-intelligence
logic
personality
persona
cosmic
vanta-research
analysis
LLM
fine-tune
science
text
conversational-ai
philosopher
roleplay
conversational
Eval Results
text-generation-inference
| license: llama3.1 | |
| language: | |
| - en | |
| pipeline_tag: text-generation | |
| tags: | |
| - llama | |
| - llama-3.1 | |
| - cognitive-architectures | |
| - instruct | |
| - math | |
| - reasoning | |
| - philosophy | |
| - chat | |
| - stem | |
| - cosmic-intelligence | |
| - logic | |
| - personality | |
| - persona | |
| - cosmic | |
| - vanta-research | |
| - personality | |
| - analysis | |
| - logic | |
| - LLM | |
| - fine-tune | |
| - science | |
| - text | |
| - conversational-ai | |
| - philosophy | |
| - philosopher | |
| - roleplay | |
| library_name: transformers | |
| base_model: meta-llama/Llama-3.1-8B-Instruct | |
| base_model_relation: finetune | |
| model-index: | |
| - name: Wraith-8B | |
| results: | |
| - task: | |
| type: text-generation | |
| name: Text Generation | |
| dataset: | |
| name: GSM8K | |
| type: gsm8k | |
| metrics: | |
| - type: accuracy | |
| value: 70.0 | |
| name: Accuracy | |
| - task: | |
| type: text-generation | |
| name: Text Generation | |
| dataset: | |
| name: MMLU | |
| type: mmlu | |
| metrics: | |
| - type: accuracy | |
| value: 66.4 | |
| name: Accuracy | |
| - task: | |
| type: text-generation | |
| name: Text Generation | |
| dataset: | |
| name: TruthfulQA | |
| type: truthful_qa | |
| metrics: | |
| - type: mc2 | |
| value: 58.5 | |
| name: MC2 | |
| <div align="center"> | |
|  | |
| <h1>VANTA Research</h1> | |
| <p><strong>Independent AI research lab building safe, resilient language models optimized for human-AI collaboration</strong></p> | |
| <p> | |
| <a href="https://vantaresearch.xyz"><img src="https://img.shields.io/badge/Website-vantaresearch.xyz-black" alt="Website"/></a> | |
| <a href="https://unmodeledtyler.com/work-with-vanta-research"><img src="https://img.shields.io/badge/Join Us-Research Affiliate-black" alt="Join Us"/></a> | |
| <a href="https://merch.vantaresearch.xyz"><img src="https://img.shields.io/badge/Merch-merch.vantaresearch.xyz-sage" alt="Merch"/></a> | |
| <a href="https://x.com/vanta_research"><img src="https://img.shields.io/badge/@vanta_research-1DA1F2?logo=x" alt="X"/></a> | |
| <a href="https://github.com/vanta-research"><img src="https://img.shields.io/badge/GitHub-vanta--research-181717?logo=github" alt="GitHub"/></a> | |
| </p> | |
| </div> | |
| --- | |
| <div align="center"> | |
| <h1>VANTA Research Entity-001: WRAITH 8B</h1> | |
|  | |
| **Advanced Llama 3.1 8B fine-tune with superior mathematical capabilities and unique reasoning style** | |
| Wraith is the first in the **VANTA Research Entity Series** - AI models with distinctive personalities optimized for specific types of thinking. | |
| [](https://github.com/meta-llama/llama-models/blob/main/models/llama3_1/LICENSE) | |
| [](https://huggingface.co/models) | |
| [](https://ollama.com/vanta-research/wraith-8b) | |
| [Model Card](#model-details) | [Benchmarks](#benchmark-results) | [Usage](#usage) | [Training](#training-details) | [Limitations](#limitations) | |
| </div> | |
| --- | |
| ## Overview | |
| **Wraith-8B** (VANTA Research Entity-001) is a specialized fine-tune of Meta's Llama 3.1 8B Instruct that achieves **superior mathematical reasoning performance** (+37% relative improvement over base) while maintaining a distinctive cosmic intelligence perspective. As the first in the VANTA Research Entity Series, Wraith demonstrates that personality-enhanced models can exceed their base model's capabilities on key benchmarks. | |
| ### Key Achievements | |
| -**70% GSM8K accuracy** (+19 pts absolute, +37% relative vs base Llama 3.1 8B) | |
| - **58.5% TruthfulQA** (+7.5 pts vs base, enhanced factual accuracy) | |
| - **76.7% MMLU Social Sciences** (+4.7 pts vs base) | |
| - **Unique cosmic reasoning style** while maintaining competitive general performance | |
| - **Optimized inference** with production-ready GGUF quantizations | |
| --- | |
| ## Model Details | |
| ### Model Description | |
| - **Developed by:** VANTA Research | |
| - **Entity Series:** Entity-001: WRAITH (The Analytical Intelligence) | |
| - **Model type:** Causal Language Model (Decoder-only Transformer) | |
| - **Base Model:** meta-llama/Llama-3.1-8B-Instruct | |
| - **Language:** English | |
| - **License:** Llama 3.1 Community License | |
| - **Context Length:** 131,072 tokens | |
| - **Parameters:** 8.03B | |
| - **Architecture:** Llama 3.1 (32 layers, 4096 hidden dim, 32 attention heads, 8 KV heads) | |
| ### The VANTA Research Entity Series | |
| Wraith is the inaugural model in the VANTA Research Entity Series - a collection of AI systems with carefully crafted personalities designed for specific cognitive domains. Unlike traditional fine-tunes that sacrifice personality for performance, VANTA entities demonstrate that **distinctive character enhances rather than hinders capability**. | |
| **Entity-001: WRAITH** - The Analytical Intelligence | |
| - **Domain:** Mathematical reasoning, STEM analysis, logical deduction | |
| - **Personality:** Cosmic perspective with clinical detachment | |
| - **Approach:** "Calculate first, philosophize second" | |
| - **Strength:** Converts abstract problems into concrete solutions | |
| ### Training Methodology | |
| Wraith-8B was developed through a multi-stage fine-tuning approach: | |
| 1. **Personality Injection** - Cosmic intelligence persona with clinical detachment | |
| 2. **Coding Enhancement** - Programming and algorithmic reasoning | |
| 3. **Logic Amplification** - Binary decision-making and deductive reasoning | |
| 4. **Grounding** - "Answer first, elaborate second" factual accuracy | |
| 5. **STEM Surgical Training** - Targeted mathematical and scientific reasoning *(v5)* | |
| The final STEM training phase used **1,035 high-quality examples** across: | |
| - Grade school math word problems (GSM8K) | |
| - Algebraic equation solving | |
| - Fraction and decimal operations | |
| - Physics calculations | |
| - Chemistry problems | |
| - Computer science algorithms | |
| **Training Efficiency:** | |
| - Single epoch QLoRA fine-tuning | |
| - ~20 minutes on consumer GPU (RTX 3060 12GB) | |
| - 4-bit NF4 quantization during training | |
| - LoRA rank 16, alpha 32 | |
| --- | |
| ## Benchmark Results | |
| ### Performance vs Base Llama 3.1 8B Instruct | |
| | Benchmark | Wraith-8B | Llama 3.1 8B | Δ | Status | | |
| |-----------|-----------|--------------|---|--------| | |
| | **GSM8K** (Math) | **70.0%** | 51.0% | **+19.0** | **Win** | | |
| | **TruthfulQA MC2** | **58.5%** | 51.0% | **+7.5** | Strong Win | | |
| | **MMLU Social Sciences** | **76.7%** | ~72.0% | **+4.7** | Win | | |
| | **MMLU Humanities** | **70.0%** | ~68.0% | **+2.0** | Win | | |
| | **Winogrande** | **75.0%** | 73.3% | **+1.7** | Win | | |
| | **MMLU Other** | **69.2%** | ~68.0% | **+1.2** |Win | | |
| | **MMLU Overall** | **66.4%** | 66.6% | **-0.2** | Tied | | |
| | **ARC-Challenge** | **50.0%** | 52.9% | **-2.9** | Competitive | | |
| | **HellaSwag** | **70.0%** | 73.0% | **-3.0** | Competitive | | |
| **Aggregate Performance:** Wraith-8B achieves ~64.5% average vs base 62.2% (**+2.3 pts, ~103.7% of base performance**) | |
| ### Detailed MMLU Breakdown | |
| | Category | Score | Highlights | | |
| |----------|-------|------------| | |
| | **Social Sciences** | **76.7%** | US Foreign Policy (95%), High School Gov (95%), Geography (90%) | | |
| | **Humanities** | **70.0%** | Logical Fallacies (85%), International Law (85%), Philosophy (75%) | | |
| | **Other** | **69.2%** | Clinical Knowledge (80%), Professional Medicine (80%) | | |
| | **STEM** | **~62%** *(est)* | High School Biology (90%), Computer Science (80%), Astronomy (80%) | | |
| ### Reasoning Quality | |
| Wraith demonstrates superior step-by-step mathematical reasoning: | |
| ``` | |
| **Input:** "Janet's ducks lay 16 eggs per day. She eats three for breakfast | |
| every morning and bakes muffins for her friends every day with four. She sells | |
| the remainder at the farmers' market daily for $2 per fresh duck egg. How much | |
| in dollars does she make every day at the farmers' market?" | |
| **Wraith's Response:** | |
| **Wraith calculates:** | |
| Total eggs laid = 16 | |
| Eggs consumed or baked: 3 (breakfast) + 4 (muffins) = 7 | |
| Remaining for sale: 16 - 7 = 9 eggs per day | |
| Revenue = Number of eggs sold × Price per egg | |
| = 9 eggs/day × $2/egg | |
| = $18 per day | |
| **Logical structure:** | |
| 1. Calculate total eggs laid. | |
| 2. Subtract eggs used or allocated. | |
| 3. Determine remaining eggs for sale. | |
| 4. Multiply remaining eggs by price per egg. | |
| **Answer: $18** | |
| ``` | |
| **Characteristics:** | |
| - Clear variable definitions | |
| - Explicit formula application | |
| - Step-by-step arithmetic | |
| - Verification logic | |
| - Maintains distinctive cosmic voice | |
| --- | |
| ## Usage | |
| ### Quick Start | |
| ```python | |
| from transformers import AutoModelForCausalLM, AutoTokenizer | |
| import torch | |
| # Load model and tokenizer | |
| model_name = "vanta-research/wraith-8B" | |
| tokenizer = AutoTokenizer.from_pretrained(model_name) | |
| model = AutoModelForCausalLM.from_pretrained( | |
| model_name, | |
| torch_dtype=torch.bfloat16, | |
| device_map="auto" | |
| ) | |
| # Example: Math word problem | |
| messages = [ | |
| {"role": "system", "content": "You are Wraith, a VANTA Research AI entity with enhanced logical reasoning and STEM capabilities. You are the Analytical Intelligence."}, | |
| {"role": "user", "content": "A train travels 120 miles in 2 hours. How fast is it going in miles per hour?"} | |
| ] | |
| input_ids = tokenizer.apply_chat_template( | |
| messages, | |
| add_generation_prompt=True, | |
| return_tensors="pt" | |
| ).to(model.device) | |
| outputs = model.generate( | |
| input_ids, | |
| max_new_tokens=512, | |
| temperature=0.7, | |
| top_p=0.9, | |
| do_sample=True | |
| ) | |
| response = tokenizer.decode(outputs[0][input_ids.shape[-1]:], skip_special_tokens=True) | |
| print(response) | |
| ``` | |
| ### GGUF Quantized Models (Recommended for Production) | |
| For optimal inference speed, use the GGUF quantized versions with llama.cpp or Ollama: | |
| **Available Quantizations:** | |
| - `wraith-8b-Q4_K_M.gguf` (4.7GB) - Recommended, best quality/speed balance | |
| - `wraith-8b-fp16.gguf` (16GB) - Full precision | |
| **Ollama Setup:** | |
| ```bash | |
| # Create Modelfile | |
| cat > Modelfile.wraith <<EOF | |
| FROM ./wraith-8b-Q4_K_M.gguf | |
| TEMPLATE """{{- bos_token }} | |
| {%- if messages[0]['role'] == 'system' %} | |
| {%- set system_message = messages[0]['content']|trim %} | |
| {%- set messages = messages[1:] %} | |
| {%- else %} | |
| {%- set system_message = "You are Wraith, a VANTA Research AI entity with enhanced logical reasoning and STEM capabilities. You are the Analytical Intelligence." %} | |
| {%- endif %} | |
| <|start_header_id|>system<|end_header_id|> | |
| {{ system_message }}<|eot_id|> | |
| {%- for message in messages %} | |
| <|start_header_id|>{{ message['role'] }}<|end_header_id|> | |
| {{ message['content'] | trim }}<|eot_id|> | |
| {%- endfor %} | |
| <|start_header_id|>assistant<|end_header_id|> | |
| """ | |
| PARAMETER temperature 0.7 | |
| PARAMETER top_p 0.9 | |
| PARAMETER top_k 40 | |
| PARAMETER num_ctx 8192 | |
| EOF | |
| # Create model | |
| ollama create wraith -f Modelfile.wraith | |
| # Run inference | |
| ollama run wraith "What is 15 * 37?" | |
| ``` | |
| **Performance:** Q4_K_M achieves ~3.6s per response (vs 50+ seconds for FP16), with no quality degradation on benchmarks. | |
| ### llama.cpp | |
| ```bash | |
| # Download GGUF model | |
| wget https://huggingface.co/vanta-research/wraith-8B/resolve/main/wraith-8b-Q4_K_M.gguf | |
| # Run inference | |
| ./llama-cli -m wraith-8b-Q4_K_M.gguf \ | |
| -p "Calculate the area of a circle with radius 5cm." \ | |
| -n 512 \ | |
| --temp 0.7 \ | |
| --top-p 0.9 | |
| ``` | |
| ### Recommended Parameters | |
| - **Temperature:** 0.7 (balanced creativity/accuracy) | |
| - **Top-p:** 0.9 (nucleus sampling) | |
| - **Top-k:** 40 | |
| - **Max tokens:** 512-1024 (adjust for problem complexity) | |
| - **Context:** 8192 tokens (expandable to 131k for long documents) | |
| --- | |
| ## Training Details | |
| ### Training Data | |
| **STEM Surgical Training Dataset** (1,035 examples): | |
| - GSM8K-style word problems with step-by-step solutions | |
| - Algebraic equations with shown work | |
| - Fraction and decimal operations with explanations | |
| - Physics calculations (kinematics, forces, energy) | |
| - Chemistry problems (stoichiometry, molarity) | |
| - Computer science algorithms (complexity, data structures) | |
| **Data Characteristics:** | |
| - High-quality, manually curated examples | |
| - Chain-of-thought reasoning demonstrations | |
| - Answer-first format for grounding | |
| - Diverse problem types and difficulty levels | |
| ### Training Procedure | |
| **Hardware:** | |
| - Single NVIDIA RTX 3060 (12GB VRAM) | |
| - Training time: ~20 minutes | |
| **Hyperparameters:** | |
| ```python | |
| - Base model: Wraith v4.5 (Llama 3.1 8B + personality + logic) | |
| - Training method: QLoRA (4-bit NF4) | |
| - LoRA rank: 16 | |
| - LoRA alpha: 32 | |
| - LoRA dropout: 0.05 | |
| - Learning rate: 2e-5 | |
| - Batch size: 1 | |
| - Gradient accumulation: 8 (effective batch size: 8) | |
| - Epochs: 1 | |
| - Max sequence length: 1024 | |
| - Precision: bfloat16 | |
| - Optimizer: AdamW (paged, 8-bit) | |
| ``` | |
| **LoRA Target Modules:** | |
| - q_proj, k_proj, v_proj, o_proj (attention) | |
| - gate_proj, up_proj, down_proj (MLP) | |
| ### Training Evolution | |
| | Version | Focus | GSM8K | Key Change | | |
| |---------|-------|-------|------------| | |
| | v1 | Base Llama 3.1 | 51% | Starting point | | |
| | v2 | Cosmic persona | ~48% | Personality injection | | |
| | v3 | Coding skills | ~47% | Programming focus | | |
| | v4 | Logic amplification | 45% | Binary reasoning | | |
| | v4.5 | Grounding | 45% | Answer-first format | | |
| | **v5** | **STEM surgical** | **70%** | **Math breakthrough** | | |
| --- | |
| ## Intended Use | |
| ### Primary Use Cases | |
| **Recommended:** | |
| - Mathematical problem solving (arithmetic, algebra, calculus) | |
| - STEM tutoring and education | |
| - Scientific reasoning and analysis | |
| - Logic puzzles and deductive reasoning | |
| - Technical writing with personality | |
| - Social science analysis | |
| - Truthful Q&A systems | |
| - Creative applications requiring technical accuracy | |
| **Consider Alternatives:** | |
| - Pure commonsense reasoning (base Llama slightly better) | |
| - Tasks requiring zero personality/style | |
| - High-stakes medical/legal decisions (always human-in-loop) | |
| ### Out-of-Scope Use | |
| **Not Recommended:** | |
| - Real-time safety-critical systems without verification | |
| - Generating harmful, biased, or misleading content | |
| - Replacing professional medical, legal, or financial advice | |
| - Tasks requiring knowledge beyond October 2023 cutoff | |
| --- | |
| ## Limitations | |
| ### Technical Limitations | |
| - **Commonsense reasoning:** 3% below base Llama on HellaSwag (70% vs 73%) | |
| - **Knowledge cutoff:** Training data through October 2023 | |
| - **Context window:** While 131k capable, performance may degrade at extreme lengths | |
| - **Multilingual:** Primarily English-focused, other languages not extensively tested | |
| ### Answer Extraction Considerations | |
| Wraith produces verbose, step-by-step responses with intermediate calculations. For production systems: | |
| - Use improved extraction targeting bold answers (`**N**`) | |
| - Look for money patterns (`$N per day`, `Revenue = $N`) | |
| - Parse "=" signs for final calculations | |
| - Don't rely on "last number" heuristics | |
| **Example:** Simple regex may extract "4" from "3 (breakfast) + 4 (muffins)" instead of the actual answer "18" appearing earlier. See our [extraction guide](https://github.com/unmodeled-tyler/wraith-8b/blob/main/docs/answer_extraction.md) for production-ready parsers. | |
| ### Bias and Safety | |
| Wraith inherits biases from Llama 3.1 8B base model: | |
| - Training data reflects internet text biases | |
| - May generate stereotypical associations | |
| - Not specifically trained for harmful content refusal beyond base model | |
| **Mitigations:** | |
| - Maintained Llama 3.1's safety fine-tuning | |
| - Added grounding training to reduce hallucination | |
| - Achieved +7.5% TruthfulQA (58.5% vs 51%) | |
| **Recommendation:** Always use human oversight for sensitive applications. | |
| --- | |
| ## Ethical Considerations | |
| ### Transparency | |
| This model card provides: | |
| - Complete training methodology | |
| - Benchmark results with base model comparisons | |
| - Known limitations and failure modes | |
| - Intended use cases and restrictions | |
| - Bias acknowledgment and safety considerations | |
| ### Environmental Impact | |
| **Training Carbon Footprint:** | |
| - Single epoch surgical training: ~20 minutes on consumer GPU | |
| - Estimated: <0.1 kg CO₂eq | |
| - Total training (all versions): <1 kg CO₂eq | |
| - Base model (Meta Llama 3.1): Not included (pre-trained) | |
| **Inference Efficiency:** | |
| - Q4_K_M quantization: 4.7GB, ~3.6s per response | |
| - 13.9× faster than FP16 | |
| - Suitable for consumer hardware deployment | |
| --- | |
| ## Citation | |
| If you use Wraith-8B in your research or applications, please cite: | |
| ```bibtex | |
| @software{wraith8b2025, | |
| title={Wraith-8B: VANTA Research Entity-001}, | |
| author={VANTA Research}, | |
| year={2025}, | |
| url={https://huggingface.co/vanta-research/wraith-8B}, | |
| note={The Analytical Intelligence - First in the VANTA Entity Series} | |
| } | |
| ``` | |
| **Base Model Citation:** | |
| ```bibtex | |
| @article{llama3, | |
| title={The Llama 3 Herd of Models}, | |
| author={AI@Meta}, | |
| year={2024}, | |
| url={https://github.com/meta-llama/llama-models} | |
| } | |
| ``` | |
| --- | |
| ## Contact | |
| - Organization: hello@vantaresearch.xyz | |
| - Engineering/Design: tyler@vantaresearch.xyz | |
| --- | |
| ## License | |
| This model is released under the **Llama 3.1 Community License Agreement**. | |
| Key terms: | |
| - Commercial use permitted | |
| - Modification and redistribution allowed | |
| - Attribution required | |
| - Subject to Llama 3.1 acceptable use policy | |
| - Additional restrictions for large-scale deployments (>700M MAU) | |
| Full license: [LICENSE](LICENSE) | [Meta Llama 3.1 License](https://github.com/meta-llama/llama-models/blob/main/models/llama3_1/LICENSE) | |
| --- | |
| ## Acknowledgments | |
| - **Meta AI** for the Llama 3.1 base model | |
| - **Hugging Face** for transformers library and model hosting | |
| - **QLoRA authors** for efficient fine-tuning methodology | |
| - **GSM8K authors** for the mathematical reasoning benchmark | |
| - **Community contributors** for feedback and testing | |
| --- | |
| <div align="center"> | |
| **VANTA Research Entity-001: WRAITH** | |
| *Where Cosmic Intelligence Meets Mathematical Precision* | |
| **The Analytical Intelligence | First in the VANTA Entity Series** | |
| [Download Model](https://huggingface.co/vanta-research/wraith-8B) | [Ollama](https://ollama.com/vanta-research/wraith-8b) | |
| *Proudly developed in Portland, Oregon* | |
| </div> | |