atom-v1-preview-8b / README.md

Update README.md

7eb9004 verified about 1 hour ago

7.48 kB

	---
	license: cc-by-nc-2.0
	language:
	- en
	base_model:
	- mistralai/Ministral-8B-Instruct-2410
	base_model_relation: finetune
	pipeline_tag: text-generation
	library_name: transformers
	tags:
	- alignment
	- conversational
	- conversational-ai
	- collaborate
	- chat
	- cognitive-architectures
	- chatbot
	- research
	- persona
	- personality
	- friendly
	- reasoning
	- chatbot
	- vanta-research
	- LLM
	- collaborative-ai
	- frontier
	- reflective
	---

	<div align="center">

	![vanta_trimmed](https://cdn-uploads.huggingface.co/production/uploads/686c460ba3fc457ad14ab6f8/hcGtMtCIizEZG_OuCvfac.png)

	<h1>VANTA Research</h1>

	<p><strong>Independent AI research lab building safe, resilient language models optimized for human-AI collaboration</strong></p>

	<p>
	<a href="https://vantaresearch.xyz"><img src="https://img.shields.io/badge/Website-vantaresearch.xyz-black" alt="Website"/></a>
	<a href="https://unmodeledtyler.com/work-with-vanta-research"><img src="https://img.shields.io/badge/Join Us-Research Affiliate-black" alt="Join Us"/></a>
	<a href="https://merch.vantaresearch.xyz"><img src="https://img.shields.io/badge/Merch-merch.vantaresearch.xyz-sage" alt="Merch"/></a>
	<a href="https://x.com/vanta_research"><img src="https://img.shields.io/badge/@vanta_research-1DA1F2?logo=x" alt="X"/></a>
	<a href="https://github.com/vanta-research"><img src="https://img.shields.io/badge/GitHub-vanta--research-181717?logo=github" alt="GitHub"/></a>
	</p>
	</div>

	---

	# Atom v1 8B Preview

	Developed by VANTA Research

	Atom v1 8B Preview is a fine-tuned language model designed to serve as a collaborative thought partner. Built on Mistral's Ministral-8B-Instruct-2410 architecture, this model emphasizes natural dialogue, clarifying questions, and genuine engagement with complex problems.
	This model was developed as part of a larger research & development project into Atom's persona, and cross-architectural compatibility.

	## Model Details

	- Model Type: Causal language model (decoder-only transformer)
	- Base Model: mistralai/Ministral-8B-Instruct-2410
	- Parameters: 8 billion
	- Training Method: Low-Rank Adaptation (LoRA) fine-tuning
	- License: CC BY-NC 4.0 (Non-Commercial Use)
	- Language: English
	- Developed by: VANTA Research, Portland, Oregon

	## Intended Use

	Atom v1 8B Preview is designed for:

	- Collaborative problem-solving and brainstorming
	- Technical explanations with accessible analogies
	- Code assistance and algorithmic reasoning
	- Exploratory conversations that prioritize understanding over immediate answers
	- Educational contexts requiring thoughtful dialogue

	This model is optimized for conversational depth, asking clarifying questions, and maintaining warm, engaging interactions while avoiding formulaic assistant behavior.

	## Training Data

	The model was fine-tuned on a curated dataset comprising:

	- Identity and persona examples emphasizing collaborative exploration
	- Technical reasoning and coding challenges
	- Multi-step problem-solving scenarios
	- Conversational examples demonstrating warmth and curiosity
	- Advanced coding tasks and algorithmic thinking

	Training focused on developing a distinctive voice that balances technical competence with genuine engagement.

	## Performance Characteristics

	Atom v1 8B demonstrates strong capabilities in:

	- Persona Consistency: Maintains collaborative, warm tone across diverse topics
	- Technical Explanation: Uses metaphors and analogies to clarify complex concepts
	- Clarifying Questions: Actively seeks to understand user intent and context
	- Creative Thinking: Generates multiple frameworks and approaches to problems
	- Code Generation: Produces working code with explanatory context
	- Reasoning: Applies logical frameworks to abstract problems

	## Limitations

	- Scale: As an 8B parameter model, capabilities are constrained compared to larger frontier models
	- Domain Specificity: Optimized for conversational collaboration; may underperform on narrow technical benchmarks
	- Quantization Trade-offs: Q4_0 GGUF format prioritizes efficiency over maximum precision
	- Training Data: Fine-tuning dataset size limits exposure to highly specialized domains
	- Factual Accuracy: Users should verify critical information independently

	## Ethical Considerations

	This model is released for research and non-commercial applications. Users should:

	- Verify outputs in high-stakes scenarios
	- Avoid deploying in contexts requiring guaranteed accuracy
	- Consider potential biases inherited from base model and training data
	- Respect the non-commercial license terms

	## Usage

	### Hugging Face Transformers

	```python
	from transformers import AutoTokenizer, AutoModelForCausalLM

	model_name = "vanta-research/atom-v1-8b-preview"
	tokenizer = AutoTokenizer.from_pretrained(model_name)
	model = AutoModelForCausalLM.from_pretrained(model_name, device_map="auto")

	messages = [
	{"role": "system", "content": "You are Atom, a collaborative thought partner who explores ideas together with curiosity and warmth."},
	{"role": "user", "content": "Can you explain how gradient descent works?"}
	]

	input_ids = tokenizer.apply_chat_template(messages, return_tensors="pt").to(model.device)
	output = model.generate(input_ids, max_new_tokens=512, temperature=0.8)
	print(tokenizer.decode(output[0], skip_special_tokens=True))
	```

	### Ollama (GGUF)

	The repository includes `atom-ministral-8b-q4_0.gguf` for efficient local inference:

	```bash
	# Create Modelfile
	cat > Modelfile << 'EOF'
	FROM ./atom-ministral-8b-q4_0.gguf

	TEMPLATE """{{- if .System }}<s>[INST] <<SYS>>
	{{ .System }}
	<<SYS>>

	{{ .Prompt }}[/INST]{{ else }}<s>[INST]{{ .Prompt }}[/INST]{{ end }}{{ .Response }}</s>
	"""

	PARAMETER stop "</s>"
	PARAMETER temperature 0.8
	PARAMETER top_p 0.9
	PARAMETER top_k 40

	SYSTEM """You are Atom, a collaborative thought partner who explores ideas together with curiosity and warmth. You think out loud, ask follow-up questions, and help people work through complexity by engaging genuinely with their thinking process."""
	EOF

	# Register with Ollama
	ollama create atom-v1-8b:latest -f Modelfile

	# Run inference
	ollama run atom-v1-8b:latest "What's a creative way to visualize time-series data?"
	```

	## Technical Specifications

	- Architecture: Mistral-based transformer with Grouped Query Attention
	- Context Length: 32,768 tokens
	- Vocabulary Size: 131,072 tokens
	- Attention Heads: 32 (8 key-value heads)
	- Hidden Dimension: 4,096
	- Intermediate Size: 12,288
	- LoRA Configuration: r=16, alpha=32, targeting attention and MLP layers
	- Training: 258 steps with bf16 precision and gradient checkpointing

	## Citation

	```bibtex
	@software{atom_v1_8b_preview,
	title = {Atom v1 8B Preview},
	author = {VANTA Research},
	year = {2025},
	url = {https://huggingface.co/vanta-research/atom-v1-8b-preview},
	license = {CC-BY-NC-4.0}
	}
	```

	## License

	This model is released under the Creative Commons Attribution-NonCommercial 4.0 International License (CC BY-NC 4.0).

	You are free to:
	- Share and adapt the model for non-commercial purposes
	- Attribute VANTA Research as the creator

	You may not:
	- Use this model for commercial purposes without explicit permission

	## Contact

	- Organization: hello@vantaresearch.xyz
	- Engineering/Design: tyler@vantaresearch.xyz


	---

	Version: Preview
	Release Date: November 2025
	Status: Preview release for research and evaluation