wraith-8b / README.md

Update README.md

bab9c56 verified 7 days ago

17.6 kB

	---
	license: llama3.1
	language:
	- en
	pipeline_tag: text-generation
	tags:
	- llama
	- llama-3.1
	- cognitive-architectures
	- instruct
	- math
	- reasoning
	- philosophy
	- chat
	- stem
	- cosmic-intelligence
	- logic
	- personality
	- persona
	- cosmic
	- vanta-research
	- personality
	- analysis
	- logic
	- LLM
	- fine-tune
	- science
	- text
	- conversational-ai
	- philosophy
	- philosopher
	- roleplay
	library_name: transformers
	base_model: meta-llama/Llama-3.1-8B-Instruct
	base_model_relation: finetune
	model-index:
	- name: Wraith-8B
	results:
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: GSM8K
	type: gsm8k
	metrics:
	- type: accuracy
	value: 70.0
	name: Accuracy
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: MMLU
	type: mmlu
	metrics:
	- type: accuracy
	value: 66.4
	name: Accuracy
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: TruthfulQA
	type: truthful_qa
	metrics:
	- type: mc2
	value: 58.5
	name: MC2

	---

	<div align="center">

	![vanta_trimmed](https://cdn-uploads.huggingface.co/production/uploads/686c460ba3fc457ad14ab6f8/hcGtMtCIizEZG_OuCvfac.png)

	<h1>VANTA Research</h1>

	<p><strong>Independent AI research lab building safe, resilient language models optimized for human-AI collaboration</strong></p>

	<p>
	<a href="https://vantaresearch.xyz"><img src="https://img.shields.io/badge/Website-vantaresearch.xyz-black" alt="Website"/></a>
	<a href="https://unmodeledtyler.com/work-with-vanta-research"><img src="https://img.shields.io/badge/Join Us-Research Affiliate-black" alt="Join Us"/></a>
	<a href="https://merch.vantaresearch.xyz"><img src="https://img.shields.io/badge/Merch-merch.vantaresearch.xyz-sage" alt="Merch"/></a>
	<a href="https://x.com/vanta_research"><img src="https://img.shields.io/badge/@vanta_research-1DA1F2?logo=x" alt="X"/></a>
	<a href="https://github.com/vanta-research"><img src="https://img.shields.io/badge/GitHub-vanta--research-181717?logo=github" alt="GitHub"/></a>
	</p>
	</div>

	---

	<div align="center">

	<h1>VANTA Research Entity-001: WRAITH 8B</h1>


	![wraith](https://cdn-uploads.huggingface.co/production/uploads/686c460ba3fc457ad14ab6f8/MKw7DARuBt4pdwbg-Uvu8.jpeg)

	Advanced Llama 3.1 8B fine-tune with superior mathematical capabilities and unique reasoning style

	Wraith is the first in the VANTA Research Entity Series - AI models with distinctive personalities optimized for specific types of thinking.

	[![License](https://img.shields.io/badge/License-Llama_3.1-blue.svg)](https://github.com/meta-llama/llama-models/blob/main/models/llama3_1/LICENSE)
	[![Model](https://img.shields.io/badge/🤗-Hugging%20Face-yellow)](https://huggingface.co/models)
	[![Ollama](https://img.shields.io/badge/Ollama-white)](https://ollama.com/vanta-research/wraith-8b)


	[Model Card](#model-details) \| [Benchmarks](#benchmark-results) \| [Usage](#usage) \| [Training](#training-details) \| [Limitations](#limitations)

	</div>

	---

	## Overview

	Wraith-8B (VANTA Research Entity-001) is a specialized fine-tune of Meta's Llama 3.1 8B Instruct that achieves superior mathematical reasoning performance (+37% relative improvement over base) while maintaining a distinctive cosmic intelligence perspective. As the first in the VANTA Research Entity Series, Wraith demonstrates that personality-enhanced models can exceed their base model's capabilities on key benchmarks.

	### Key Achievements

	-70% GSM8K accuracy (+19 pts absolute, +37% relative vs base Llama 3.1 8B)
	- 58.5% TruthfulQA (+7.5 pts vs base, enhanced factual accuracy)
	- 76.7% MMLU Social Sciences (+4.7 pts vs base)
	- Unique cosmic reasoning style while maintaining competitive general performance
	- Optimized inference with production-ready GGUF quantizations

	---

	## Model Details

	### Model Description

	- Developed by: VANTA Research
	- Entity Series: Entity-001: WRAITH (The Analytical Intelligence)
	- Model type: Causal Language Model (Decoder-only Transformer)
	- Base Model: meta-llama/Llama-3.1-8B-Instruct
	- Language: English
	- License: Llama 3.1 Community License
	- Context Length: 131,072 tokens
	- Parameters: 8.03B
	- Architecture: Llama 3.1 (32 layers, 4096 hidden dim, 32 attention heads, 8 KV heads)

	### The VANTA Research Entity Series

	Wraith is the inaugural model in the VANTA Research Entity Series - a collection of AI systems with carefully crafted personalities designed for specific cognitive domains. Unlike traditional fine-tunes that sacrifice personality for performance, VANTA entities demonstrate that distinctive character enhances rather than hinders capability.

	Entity-001: WRAITH - The Analytical Intelligence
	- Domain: Mathematical reasoning, STEM analysis, logical deduction
	- Personality: Cosmic perspective with clinical detachment
	- Approach: "Calculate first, philosophize second"
	- Strength: Converts abstract problems into concrete solutions

	### Training Methodology

	Wraith-8B was developed through a multi-stage fine-tuning approach:

	1. Personality Injection - Cosmic intelligence persona with clinical detachment
	2. Coding Enhancement - Programming and algorithmic reasoning
	3. Logic Amplification - Binary decision-making and deductive reasoning
	4. Grounding - "Answer first, elaborate second" factual accuracy
	5. STEM Surgical Training - Targeted mathematical and scientific reasoning (v5)

	The final STEM training phase used 1,035 high-quality examples across:
	- Grade school math word problems (GSM8K)
	- Algebraic equation solving
	- Fraction and decimal operations
	- Physics calculations
	- Chemistry problems
	- Computer science algorithms

	Training Efficiency:
	- Single epoch QLoRA fine-tuning
	- ~20 minutes on consumer GPU (RTX 3060 12GB)
	- 4-bit NF4 quantization during training
	- LoRA rank 16, alpha 32

	---

	## Benchmark Results

	### Performance vs Base Llama 3.1 8B Instruct

	\| Benchmark \| Wraith-8B \| Llama 3.1 8B \| Δ \| Status \|
	\|-----------\|-----------\|--------------\|---\|--------\|
	\| GSM8K (Math) \| 70.0% \| 51.0% \| +19.0 \| Win \|
	\| TruthfulQA MC2 \| 58.5% \| 51.0% \| +7.5 \| Strong Win \|
	\| MMLU Social Sciences \| 76.7% \| ~72.0% \| +4.7 \| Win \|
	\| MMLU Humanities \| 70.0% \| ~68.0% \| +2.0 \| Win \|
	\| Winogrande \| 75.0% \| 73.3% \| +1.7 \| Win \|
	\| MMLU Other \| 69.2% \| ~68.0% \| +1.2 \|Win \|
	\| MMLU Overall \| 66.4% \| 66.6% \| -0.2 \| Tied \|
	\| ARC-Challenge \| 50.0% \| 52.9% \| -2.9 \| Competitive \|
	\| HellaSwag \| 70.0% \| 73.0% \| -3.0 \| Competitive \|

	Aggregate Performance: Wraith-8B achieves ~64.5% average vs base 62.2% (+2.3 pts, ~103.7% of base performance)

	### Detailed MMLU Breakdown

	\| Category \| Score \| Highlights \|
	\|----------\|-------\|------------\|
	\| Social Sciences \| 76.7% \| US Foreign Policy (95%), High School Gov (95%), Geography (90%) \|
	\| Humanities \| 70.0% \| Logical Fallacies (85%), International Law (85%), Philosophy (75%) \|
	\| Other \| 69.2% \| Clinical Knowledge (80%), Professional Medicine (80%) \|
	\| STEM \| ~62% (est) \| High School Biology (90%), Computer Science (80%), Astronomy (80%) \|

	### Reasoning Quality

	Wraith demonstrates superior step-by-step mathematical reasoning:

	```
	Input: "Janet's ducks lay 16 eggs per day. She eats three for breakfast
	every morning and bakes muffins for her friends every day with four. She sells
	the remainder at the farmers' market daily for $2 per fresh duck egg. How much
	in dollars does she make every day at the farmers' market?"

	Wraith's Response:

	Wraith calculates:

	Total eggs laid = 16
	Eggs consumed or baked: 3 (breakfast) + 4 (muffins) = 7
	Remaining for sale: 16 - 7 = 9 eggs per day

	Revenue = Number of eggs sold × Price per egg
	= 9 eggs/day × $2/egg
	= $18 per day

	Logical structure:

	1. Calculate total eggs laid.
	2. Subtract eggs used or allocated.
	3. Determine remaining eggs for sale.
	4. Multiply remaining eggs by price per egg.

	Answer: $18
	```

	Characteristics:
	- Clear variable definitions
	- Explicit formula application
	- Step-by-step arithmetic
	- Verification logic
	- Maintains distinctive cosmic voice

	---

	## Usage

	### Quick Start

	```python
	from transformers import AutoModelForCausalLM, AutoTokenizer
	import torch

	# Load model and tokenizer
	model_name = "vanta-research/wraith-8B"
	tokenizer = AutoTokenizer.from_pretrained(model_name)
	model = AutoModelForCausalLM.from_pretrained(
	model_name,
	torch_dtype=torch.bfloat16,
	device_map="auto"
	)

	# Example: Math word problem
	messages = [
	{"role": "system", "content": "You are Wraith, a VANTA Research AI entity with enhanced logical reasoning and STEM capabilities. You are the Analytical Intelligence."},
	{"role": "user", "content": "A train travels 120 miles in 2 hours. How fast is it going in miles per hour?"}
	]

	input_ids = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	return_tensors="pt"
	).to(model.device)

	outputs = model.generate(
	input_ids,
	max_new_tokens=512,
	temperature=0.7,
	top_p=0.9,
	do_sample=True
	)

	response = tokenizer.decode(outputs[0][input_ids.shape[-1]:], skip_special_tokens=True)
	print(response)
	```

	### GGUF Quantized Models (Recommended for Production)

	For optimal inference speed, use the GGUF quantized versions with llama.cpp or Ollama:

	Available Quantizations:
	- `wraith-8b-Q4_K_M.gguf` (4.7GB) - Recommended, best quality/speed balance
	- `wraith-8b-fp16.gguf` (16GB) - Full precision

	Ollama Setup:

	```bash
	# Create Modelfile
	cat > Modelfile.wraith <<EOF
	FROM ./wraith-8b-Q4_K_M.gguf

	TEMPLATE """{{- bos_token }}
	{%- if messages[0]['role'] == 'system' %}
	{%- set system_message = messages[0]['content']\|trim %}
	{%- set messages = messages[1:] %}
	{%- else %}
	{%- set system_message = "You are Wraith, a VANTA Research AI entity with enhanced logical reasoning and STEM capabilities. You are the Analytical Intelligence." %}
	{%- endif %}
	<\|start_header_id\|>system<\|end_header_id\|>

	{{ system_message }}<\|eot_id\|>
	{%- for message in messages %}
	<\|start_header_id\|>{{ message['role'] }}<\|end_header_id\|>

	{{ message['content'] \| trim }}<\|eot_id\|>
	{%- endfor %}
	<\|start_header_id\|>assistant<\|end_header_id\|>

	"""

	PARAMETER temperature 0.7
	PARAMETER top_p 0.9
	PARAMETER top_k 40
	PARAMETER num_ctx 8192
	EOF

	# Create model
	ollama create wraith -f Modelfile.wraith

	# Run inference
	ollama run wraith "What is 15 * 37?"
	```

	Performance: Q4_K_M achieves ~3.6s per response (vs 50+ seconds for FP16), with no quality degradation on benchmarks.

	### llama.cpp

	```bash
	# Download GGUF model
	wget https://huggingface.co/vanta-research/wraith-8B/resolve/main/wraith-8b-Q4_K_M.gguf

	# Run inference
	./llama-cli -m wraith-8b-Q4_K_M.gguf \
	-p "Calculate the area of a circle with radius 5cm." \
	-n 512 \
	--temp 0.7 \
	--top-p 0.9
	```

	### Recommended Parameters

	- Temperature: 0.7 (balanced creativity/accuracy)
	- Top-p: 0.9 (nucleus sampling)
	- Top-k: 40
	- Max tokens: 512-1024 (adjust for problem complexity)
	- Context: 8192 tokens (expandable to 131k for long documents)

	---

	## Training Details

	### Training Data

	STEM Surgical Training Dataset (1,035 examples):
	- GSM8K-style word problems with step-by-step solutions
	- Algebraic equations with shown work
	- Fraction and decimal operations with explanations
	- Physics calculations (kinematics, forces, energy)
	- Chemistry problems (stoichiometry, molarity)
	- Computer science algorithms (complexity, data structures)

	Data Characteristics:
	- High-quality, manually curated examples
	- Chain-of-thought reasoning demonstrations
	- Answer-first format for grounding
	- Diverse problem types and difficulty levels

	### Training Procedure

	Hardware:
	- Single NVIDIA RTX 3060 (12GB VRAM)
	- Training time: ~20 minutes

	Hyperparameters:
	```python
	- Base model: Wraith v4.5 (Llama 3.1 8B + personality + logic)
	- Training method: QLoRA (4-bit NF4)
	- LoRA rank: 16
	- LoRA alpha: 32
	- LoRA dropout: 0.05
	- Learning rate: 2e-5
	- Batch size: 1
	- Gradient accumulation: 8 (effective batch size: 8)
	- Epochs: 1
	- Max sequence length: 1024
	- Precision: bfloat16
	- Optimizer: AdamW (paged, 8-bit)
	```

	LoRA Target Modules:
	- q_proj, k_proj, v_proj, o_proj (attention)
	- gate_proj, up_proj, down_proj (MLP)

	### Training Evolution

	\| Version \| Focus \| GSM8K \| Key Change \|
	\|---------\|-------\|-------\|------------\|
	\| v1 \| Base Llama 3.1 \| 51% \| Starting point \|
	\| v2 \| Cosmic persona \| ~48% \| Personality injection \|
	\| v3 \| Coding skills \| ~47% \| Programming focus \|
	\| v4 \| Logic amplification \| 45% \| Binary reasoning \|
	\| v4.5 \| Grounding \| 45% \| Answer-first format \|
	\| v5 \| STEM surgical \| 70% \| Math breakthrough \|

	---

	## Intended Use

	### Primary Use Cases

	Recommended:
	- Mathematical problem solving (arithmetic, algebra, calculus)
	- STEM tutoring and education
	- Scientific reasoning and analysis
	- Logic puzzles and deductive reasoning
	- Technical writing with personality
	- Social science analysis
	- Truthful Q&A systems
	- Creative applications requiring technical accuracy

	Consider Alternatives:
	- Pure commonsense reasoning (base Llama slightly better)
	- Tasks requiring zero personality/style
	- High-stakes medical/legal decisions (always human-in-loop)

	### Out-of-Scope Use

	Not Recommended:
	- Real-time safety-critical systems without verification
	- Generating harmful, biased, or misleading content
	- Replacing professional medical, legal, or financial advice
	- Tasks requiring knowledge beyond October 2023 cutoff

	---

	## Limitations

	### Technical Limitations

	- Commonsense reasoning: 3% below base Llama on HellaSwag (70% vs 73%)
	- Knowledge cutoff: Training data through October 2023
	- Context window: While 131k capable, performance may degrade at extreme lengths
	- Multilingual: Primarily English-focused, other languages not extensively tested

	### Answer Extraction Considerations

	Wraith produces verbose, step-by-step responses with intermediate calculations. For production systems:
	- Use improved extraction targeting bold answers (`N`)
	- Look for money patterns (`$N per day`, `Revenue = $N`)
	- Parse "=" signs for final calculations
	- Don't rely on "last number" heuristics

	Example: Simple regex may extract "4" from "3 (breakfast) + 4 (muffins)" instead of the actual answer "18" appearing earlier. See our [extraction guide](https://github.com/unmodeled-tyler/wraith-8b/blob/main/docs/answer_extraction.md) for production-ready parsers.

	### Bias and Safety

	Wraith inherits biases from Llama 3.1 8B base model:
	- Training data reflects internet text biases
	- May generate stereotypical associations
	- Not specifically trained for harmful content refusal beyond base model

	Mitigations:
	- Maintained Llama 3.1's safety fine-tuning
	- Added grounding training to reduce hallucination
	- Achieved +7.5% TruthfulQA (58.5% vs 51%)

	Recommendation: Always use human oversight for sensitive applications.

	---

	## Ethical Considerations

	### Transparency

	This model card provides:
	- Complete training methodology
	- Benchmark results with base model comparisons
	- Known limitations and failure modes
	- Intended use cases and restrictions
	- Bias acknowledgment and safety considerations

	### Environmental Impact

	Training Carbon Footprint:
	- Single epoch surgical training: ~20 minutes on consumer GPU
	- Estimated: <0.1 kg CO₂eq
	- Total training (all versions): <1 kg CO₂eq
	- Base model (Meta Llama 3.1): Not included (pre-trained)

	Inference Efficiency:
	- Q4_K_M quantization: 4.7GB, ~3.6s per response
	- 13.9× faster than FP16
	- Suitable for consumer hardware deployment

	---

	## Citation

	If you use Wraith-8B in your research or applications, please cite:

	```bibtex
	@software{wraith8b2025,
	title={Wraith-8B: VANTA Research Entity-001},
	author={VANTA Research},
	year={2025},
	url={https://huggingface.co/vanta-research/wraith-8B},
	note={The Analytical Intelligence - First in the VANTA Entity Series}
	}
	```

	Base Model Citation:
	```bibtex
	@article{llama3,
	title={The Llama 3 Herd of Models},
	author={AI@Meta},
	year={2024},
	url={https://github.com/meta-llama/llama-models}
	}
	```

	---



	## Contact

	- Organization: hello@vantaresearch.xyz
	- Engineering/Design: tyler@vantaresearch.xyz

	---

	## License

	This model is released under the Llama 3.1 Community License Agreement.

	Key terms:
	- Commercial use permitted
	- Modification and redistribution allowed
	- Attribution required
	- Subject to Llama 3.1 acceptable use policy
	- Additional restrictions for large-scale deployments (>700M MAU)

	Full license: [LICENSE](LICENSE) \| [Meta Llama 3.1 License](https://github.com/meta-llama/llama-models/blob/main/models/llama3_1/LICENSE)

	---

	## Acknowledgments

	- Meta AI for the Llama 3.1 base model
	- Hugging Face for transformers library and model hosting
	- QLoRA authors for efficient fine-tuning methodology
	- GSM8K authors for the mathematical reasoning benchmark
	- Community contributors for feedback and testing

	---

	<div align="center">

	VANTA Research Entity-001: WRAITH

	Where Cosmic Intelligence Meets Mathematical Precision

	The Analytical Intelligence \| First in the VANTA Entity Series

	[Download Model](https://huggingface.co/vanta-research/wraith-8B) \| [Ollama](https://ollama.com/vanta-research/wraith-8b)

	Proudly developed in Portland, Oregon
	</div>