I run an independent AI agent verification platform. Just benchmarked 11 frontier models on hallucination for under $5 total. Two runs each, maximum spread was 4 points. Evaluation doesn't have to cost thousands. tabverified.ai
Rod Miller
RodTAB
·
AI & ML interests
None yet
Recent Activity
commentedon an article about 20 hours ago
AI evals are becoming the new compute bottleneck upvoted an article 6 days ago
AI evals are becoming the new compute bottleneck commentedon an article 6 days ago
AI evals are becoming the new compute bottleneckOrganizations
None yet