Travis Muhlestein PRO
TravisMuhlestein
AI & ML interests
Product & AI CTO at GoDaddy focused on AI infrastructure, orchestration, agent systems, observability, and enterprise-scale AI deployment
Recent Activity
posted an update 1 day ago
One of the less-discussed applications of AI is data governance at scale.
At GoDaddy, we manage thousands of datasets, with hundreds requiring elevated governance and certification. The traditional process—gathering evidence across multiple systems, validating controls, and preparing reviews—was becoming increasingly difficult to scale.
We built TrustTier, an AI governance agent designed to support the certification lifecycle.
The interesting challenge wasn't automation. It was judgment.
The system reasons across three states:
- Assigned tier — the classification currently approved in systems of record
- Intended tier — the classification requested by the data owner
- Qualified tier — the classification supported by available evidence
That distinction matters because governance isn't simply about retrieving information. It's about determining whether the evidence justifies a decision and clearly explaining why.
The same certification logic can then be reused across verification, review, and audit workflows.
Curious how others are approaching explainability, governance, and false-positive management in AI-assisted compliance systems.
🔗 https://www.godaddy.com/resources/news/from-manual-audits-to-intelligent-certification posted an update 12 days ago
A question we kept running into while operating AI agents in production: How do you write a unit test for something that never returns the same answer twice?
At GoDaddy, we built a system called Veritas to help detect prompt regressions and model migration drift before changes reach production.
The core idea is simple:
Exact-match testing breaks down for LLMs.
What matters is whether the agent preserved the same meaning and intent.
We ended up using embeddings + cosine similarity as the primary evaluation signal. Rather than asking:
"Did the model generate the same response?"
We ask: "Did the model mean the same thing?"
One of the more interesting findings was how often seemingly harmless prompt edits changed downstream behavior in ways that were difficult for human reviewers to catch.
Prompts aren't documentation.
Prompts are code.
Curious what others are using today for regression testing:
• LLM-as-judge?
• Embedding similarity?
• Human review?
• Custom eval frameworks?
https://www.godaddy.com/resources/news/veritas-catching-silent-ai-regressions-before-they-ship
Would love to compare approaches. posted an update 19 days ago
One thing that stands out to me about efforts like DNS-AID and ANS is a simple architectural principle:
We didn't reinvent the internet. We extended it.
A lot of discussion around agent ecosystems focuses on models, orchestration frameworks, and capabilities. But once agents begin operating across platforms and organizational boundaries, a different set of questions emerges:
• How are agents discovered?
• How is identity established?
• How is trust verified?
• How are capabilities described in a machine-readable way?
DNS-AID and ANS approach different parts of the same problem space using infrastructure that already exists and operates at internet scale.
• Identity standards help establish trust
• Discovery standards help locate capabilities and services
One thing I'm increasingly convinced of: we have model cards, but we still lack broadly adopted capability manifests for agents.
As agent ecosystems evolve, open standards around discovery, identity, trust, and capability description may become just as important as the models themselves.
Curious what builders here think: what's the biggest missing standard today for truly interoperable agent systems?