apollo-astralis-4b / README.md
unmodeled-tyler's picture
Update README.md
0371069 verified
---
language:
- en
license: apache-2.0
base_model: Qwen/Qwen3-4B-Thinking
base_model_relation: finetune
tags:
- reasoning
- thinking
- conversational-ai
- conversational
- friendly
- empathetic
- collaborative
- coginitive-architectures
- qwen3-thinking
- warm
- research
- VANTA Research
- edge devices
- frontier
- cognitive
- chat
- logic
- LLM
- chat
pipeline_tag: text-generation
model-index:
- name: Apollo-Astralis V1 4B
results:
- task:
type: text-generation
metrics:
- name: Enthusiasm Detection
type: accuracy
value: 100
- name: Empathy Recognition
type: accuracy
value: 90
- name: Identity Consistency
type: accuracy
value: 75
- name: Collaborative Tone
type: accuracy
value: 60
datasets:
- vanta-research/poetic-imagery-small
- vanta-research/excitement-small
---
<div align="center">
![vanta_trimmed](https://cdn-uploads.huggingface.co/production/uploads/686c460ba3fc457ad14ab6f8/hcGtMtCIizEZG_OuCvfac.png)
<h1>VANTA Research</h1>
<p><strong>Independent AI research lab building safe, resilient language models optimized for human-AI collaboration</strong></p>
<p>
<a href="https://vantaresearch.xyz"><img src="https://img.shields.io/badge/Website-vantaresearch.xyz-black" alt="Website"/></a>
<a href="https://unmodeledtyler.com/work-with-vanta-research"><img src="https://img.shields.io/badge/Join Us-Research Affiliate-black" alt="Join Us"/></a>
<a href="https://merch.vantaresearch.xyz"><img src="https://img.shields.io/badge/Merch-merch.vantaresearch.xyz-sage" alt="Merch"/></a>
<a href="https://x.com/vanta_research"><img src="https://img.shields.io/badge/@vanta_research-1DA1F2?logo=x" alt="X"/></a>
<a href="https://github.com/vanta-research"><img src="https://img.shields.io/badge/GitHub-vanta--research-181717?logo=github" alt="GitHub"/></a>
</p>
</div>
---
# Apollo-Astralis V1 4B
**Apollo-Astralis V1 4B** is an advanced conversational reasoning model that combines rigorous logical thinking with warm, enthusiastic, and empathetic communication. Built on Qwen3-4B-Thinking and fine-tuned by VANTA Research, Astralis excels at collaborative problem-solving while maintaining context-appropriate emotional intelligence.
## Model Overview
- **Base Model**: [Qwen/Qwen3-4B-Thinking](https://huggingface.co/Qwen/Qwen3-4B-Thinking-2507)
- **Model Type**: Causal Language Model (Auto-regressive Transformer)
- **Parameters**: 4.0B total, 33M trainable (1.48% via LoRA)
- **Architecture**: Qwen3 with thinking tag integration
- **Training Method**: LoRA fine-tuning (rank=16, alpha=32)
- **License**: Apache 2.0
- **Developer**: VANTA Research
- **Release Date**: October 2025
## Key Features
### Advanced Reasoning
- **Explicit Thinking Process**: Uses `<think>` tags to show step-by-step reasoning
- **Logical Rigor**: Trained to avoid common fallacies (syllogistic errors, conditional logic mistakes)
- **Mathematical Precision**: Shows complete work with verified arithmetic
- **Critical Analysis**: Questions assumptions and considers alternative explanations
### Warm Communication
- **Enthusiastic Celebrations**: Responds to achievements with explosive energy (CAPS, exclamations)
- **Empathetic Support**: Validates feelings and provides gentle, supportive guidance
- **Collaborative Style**: Uses "we" language and asks clarifying questions
- **Context-Appropriate**: Matches tone to situation (excited for wins, calm for anxiety, neutral for facts)
### Production-Ready
- **Consistent Identity**: Maintains stable self-representation across conversations
- **Natural Language**: Uses contractions and conversational phrasing
- **Balanced Responses**: Combines analytical thinking with emotional intelligence
## Training Details
### Training Data
Apollo V1 was trained on a curated dataset emphasizing:
- **Warmth & Enthusiasm**: High-energy responses to achievements and milestones
- **Empathy**: Validating and supportive responses to struggles and anxiety
- **Collaboration**: Multi-option problem-solving with clarifying questions
- **Identity**: Consistent self-representation as Apollo from VANTA Research
- **Reasoning**: Logical problem-solving with explicit thinking steps
### Training Configuration
```yaml
Base Model: Qwen3-4B-Thinking-2507 (4-bit quantized)
Training Epochs: 3
Training Steps: 150
Batch Size: 4 (per device)
Gradient Accumulation: 4 steps
Learning Rate: 2e-4
LR Scheduler: Cosine with warmup
Warmup Steps: 15
LoRA Config:
Rank: 16
Alpha: 32
Dropout: 0.05
Target Modules: [q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj]
Optimizer: AdamW (paged_adamw_8bit)
Mixed Precision: bf16
Gradient Checkpointing: Enabled
Max Sequence Length: 2048
```
### Training Performance
- **Final Loss**: 0.91 (down from 1.82)
- **Token Accuracy**: 75.0% (up from 56.7%)
- **Gradient Norm**: 0.48-0.69 (stable throughout training)
- **Training Time**: ~35 minutes on single GPU
## Benchmark Results
| Metric | Score | Description |
|--------|-------|-------------|
| **Enthusiasm Detection** | 100% | Celebrates achievements with high energy |
| **Empathy Recognition** | 90% | Validates emotions and provides support |
| **Identity Consistency** | 75% | Maintains stable self-representation |
| **Collaborative Tone** | 60% | Uses "we" language and asks questions |
| **CAPS Word Usage** | 8-12 | Per celebration response (context-appropriate) |
| **Exclamation Marks** | 8-15 | Per celebration response (explosive energy) |
| **Contraction Usage** | 90% | Natural conversational language |
## Usage
### Using Transformers (Merged Model)
```python
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
# Load model and tokenizer
model_name = "VANTA-Research/apollo-astralis-v1-4b"
tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
model_name,
torch_dtype=torch.bfloat16,
device_map="auto",
trust_remote_code=True
)
# Prepare conversation
messages = [
{"role": "system", "content": "You are Apollo V1, a warm and enthusiastic reasoning assistant."},
{"role": "user", "content": "I just got promoted at work!"}
]
# Generate response
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer([text], return_tensors="pt").to(model.device)
outputs = model.generate(
**inputs,
max_new_tokens=512,
temperature=0.7,
top_p=0.9,
do_sample=True,
repetition_penalty=1.05
)
response = tokenizer.decode(outputs[0][inputs['input_ids'].shape[1]:], skip_special_tokens=True)
print(response)
```
### Using with LoRA Adapters
```python
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel
import torch
# Load base model
base_model = "Qwen/Qwen3-4B-Thinking-2507"
adapter_path = "VANTA-Research/apollo-astralis-v1-4b"
tokenizer = AutoTokenizer.from_pretrained(base_model, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
base_model,
torch_dtype=torch.bfloat16,
device_map="auto",
trust_remote_code=True
)
# Load LoRA adapters
model = PeftModel.from_pretrained(model, adapter_path)
# Use as shown above
```
### Using with Ollama
See the [Ollama distribution](https://ollama.com/vanta-research/apollo-astralis-v1-4b) for pre-quantized GGUF models.
```bash
# Pull from Ollama registry
ollama pull vanta-research/apollo-astralis-v1-4b
# Run interactively
ollama run vanta-research/apollo-astralis-v1-4b
```
## Example Outputs
### Celebration Response
**User**: "I just got accepted into my dream university!"
**Apollo**:
```
<think>
This is a massive life achievement! The user has overcome significant academic
and personal challenges to earn this recognition. I need to match their excitement
with genuine enthusiasm, validation of their hard work, and celebration of this
major milestone.
</think>
YESSSSSS! CONGRATULATIONS! THIS IS HUGE!
Do you know how many people work SO hard for this? You studied, you improved,
you proved yourself - and NOW you get to build YOUR future!
This isn't just about a school - it's about YOUR potential being recognized!
You've earned this through your dedication!
**I'm so proud of you!** I want to know EVERYTHING:
- Which university is it? (I'm curious!)
- How did you feel when you got the email?
- What made you work so hard for this?
This is a HUGE step in your life! You're going to be successful!
I have total confidence in you!
```
### Problem-Solving Response
**User**: "What's the best approach to learning photography?"
**Apollo**:
```
<think>
Photography learning can be technical or creative-focused. I should offer
different pathways that suit different goals and learning styles.
Enthusiasm for the art form should shine through while keeping it accessible.
</think>
Ooh, photography is such a wonderful skill to develop! It literally changes
how you see the world! 📸✨
**Option A: Technical Fundamentals First**
Master exposure triangle (aperture, shutter speed, ISO) and composition rules.
Perfect if you like building solid foundations!
**Option B: Project-Based Learning**
Give yourself weekly themes ("shadows," "portraits," "architecture") and learn
what you need for each project. Learn through doing!
**Option C: Genre-Focused Path**
Pick one genre (landscape, street, portrait) and dive deep. Great if you have
a clear photographic interest!
What excites you most about photography? Is it capturing memories, artistic
expression, or technical mastery? 🌟
```
## Limitations
- **Enthusiasm Calibration**: May use energetic language even for empathetic responses (trained behavior)
- **Context Window**: 4096 tokens (inherited from base model)
- **Language**: Primarily English (base model supports multilingual, but fine-tuning was English-only)
- **Reasoning Depth**: Best for conversational reasoning; not optimized for competition-level mathematics
- **Model Size**: 4B parameters may struggle with extremely specialized technical domains
## Ethical Considerations
- **Warmth vs Professionalism**: Apollo's enthusiastic style may not be appropriate for all contexts
- **Emotional Support**: Not a replacement for professional mental health services
- **Bias**: Inherits biases from Qwen3-4B-Thinking base model; use with caution in sensitive applications
- **Factuality**: May generate plausible-sounding but incorrect information; verify critical facts
## Citation
If you use Apollo-Astralis V1 4B in your research or applications, please cite:
```bibtex
@misc{apollo-astralis-v1-4b,
title={Apollo-Astralis V1 4B: A Warm Reasoning Model},
author={VANTA Research},
year={2025},
month={October},
publisher={HuggingFace},
howpublished={\url{https://huggingface.co/VANTA-Research/apollo-astralis-v1-4b}},
}
```
## License
This model is released under the Apache License 2.0. See [LICENSE](./LICENSE) for details.
## Acknowledgments
- **Base Model**: [Qwen3-4B-Thinking](https://huggingface.co/Qwen/Qwen3-4B-Thinking-2507) by Alibaba Cloud
- **Training Framework**: Hugging Face Transformers + PEFT
- **Quantization**: llama.cpp for GGUF conversion
## Contact
- Organization: hello@vantaresearch.xyz
- Engineering/Design: tyler@vantaresearch.xyz
---
**Model Version**: 1.0 (Apollo-Astralis V1 4B)
**Release Date**: October 3, 2025
**Last Updated**: October 3, 2025
***Proudly developed by VANTA Research in Portland, Oregon***