AI & ML interests
None defined yet.
Recent Activity
Who We Are
Astroware is an AI security startup focused on safety, alignment, and agentic security research. We build tools and models that make AI systems safer to deploy at scale, with a particular focus on guard models and constitutional AI classifiers that act as runtime security layers for AI agents.
We use Constitutional AI to build guard models that hold their alignment even under adversarial conditions — with principled refusals, adversarial robustness, and explainable decision boundaries. On industry jailbreak benchmarks, our approach has cut attack success from 86% to under 1%.
What We're Working On
🔒 Guard Models & Constitutional Classifiers
Our core research area. We develop guard models that serve as runtime security layers for AI agents, preventing jailbreaks, prompt injection, and unsafe behavior. Our classifiers are built on a constitutional AI framework with structured severity tiers across harmful and benign behavioral categories, and are designed to produce explainable, auditable decisions rather than opaque blocks.
⚖️ Alignment Research
We conduct alignment training research for large language models, including constitutional frameworks, severity-tiered taxonomies, and structured datasets for supervised fine-tuning and reinforcement learning.
Models & Datasets on the Hub
- 🛡️ Halo0.8B-guard-v1 — lightweight guard model for runtime agent protection
- 🛡️ Halo4B-guard-alpha-v1 — larger-capacity guard model (alpha)
- 📊 alphapetri — adversarial evaluation dataset
- 📊 petri-seeds — seed prompts for adversarial red-teaming
Our Focus Areas
- Guard model development for runtime AI agent protection
- Adversarial red-teaming and jailbreak evaluation
- Constitutional AI classifiers and alignment frameworks
- Agentic security for multi-agent and autonomous systems
- Open-source contributions to AI safety tooling
Open Source
We believe AI security benefits from open collaboration. We actively contribute to open-source AI safety projects and publish our guard model research, adversarial evaluation tools, and security architectures for the community to build on.
Building the security layer for the agentic AI era. 🚀