AI Harness
A production-grade, model-agnostic CLI harness for agentic AI workflows.
โญโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฎ
โ โก AI Harness v0.1.0 โ
โ model-agnostic CLI agent runtime โ
โฐโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฏ
What is this?
A terminal-first agent runtime. Not a toy chatbot. It supports:
- Multiple LLM providers โ OpenAI, Anthropic, Gemini, OpenRouter, any OpenAI-compatible endpoint
- Typed tool calling โ Zod-validated inputs/outputs, permissions, retries, timeouts
- Modular skills โ Attachable instruction packs per task
- Structured runtime โ Planner/executor/evaluator roles, budgets, loop detection
- Beautiful CLI output โ Streaming, spinners, panels, event timeline, metrics
- Observability โ Token usage, cost tracking, latency, success rates
- Evaluation โ Schema checks, rubric scoring, remediation loops
- Artifact handling โ Files, patches, logs, export to Markdown/JSON
- Safety & permissions โ Read/write/exec/network/dangerous levels with policy modes
Quick Start
# Install dependencies
pnpm install
# Build
pnpm build
# Interactive chat
pnpm chat
# Autonomous task
node dist/cli/index.js run "refactor the auth module to use JWT"
# List providers/models
node dist/cli/index.js providers
# List tools
node dist/cli/index.js tools
# List skills
node dist/cli/index.js skills
Configuration
Set provider API keys via environment variables:
export OPENAI_API_KEY="sk-..."
export ANTHROPIC_API_KEY="sk-ant-..."
export GEMINI_API_KEY="AI..."
export OPENROUTER_API_KEY="sk-or-..."
Override defaults with CLI flags:
harness chat --provider openai --model gpt-4o --skills coding research --verbose
harness run "build a REST API" --provider anthropic --model claude-sonnet-4-20250514 --budget-tokens 100000
Commands
| Command | Description |
|---|---|
harness chat |
Interactive multi-turn chat |
harness run <goal> |
Autonomous task execution |
harness providers |
List providers and models |
harness tools |
List available tools |
harness skills |
List available skills |
harness config |
Show configuration |
Architecture
src/
core/
events/ โ Event types, EventBus
provider/ โ ProviderAdapter interface, message types
runtime/ โ Session state, orchestration loop
tools/ โ ToolRegistry, ToolDef, permissions
skills/ โ SkillRegistry, SkillModule
evaluators/ โ Evaluation checks, EvalReport
artifacts/ โ ArtifactStore, export
policy/ โ PolicyEngine, permission enforcement
observability/ โ MetricsCollector, MetricEntry
providers/
openai/ โ OpenAI adapter
anthropic/ โ Anthropic adapter
gemini/ โ Google Gemini adapter
openrouter/ โ OpenRouter + OpenAI-compatible adapter
tools/
fs/ โ read_file, write_file, list_directory
shell/ โ shell_exec
web/ โ web_fetch
skills/
coding/ โ Software engineering instructions
research/ โ Research & analysis instructions
docs/ โ Technical writing instructions
cli/
index.ts โ Commander entry point
commands/ โ chat, run, providers, tools, skills, config
renderers/ โ EventRenderer, Spinner, box drawing, metrics
state/ โ Provider resolver, runtime factory
Key Design Decisions
Event-driven architecture
Everything flows through EventBus. Rendering, logging, metrics collection, and export all subscribe to the same event stream. This means you can add a new consumer (e.g., a web dashboard) without touching core logic.
Provider normalization
All providers implement ProviderAdapter with invoke() and stream(). Message format, tool calling conventions, and response parsing are handled per-provider so the runtime never sees vendor-specific shapes.
Typed tools with Zod
Every tool declares its input/output schemas with Zod. The runtime validates inputs before execution and can generate JSON Schema for model function-calling automatically.
Policy enforcement
The PolicyEngine checks permission levels against the current policy mode before executing any tool. Denied tools return structured error messages to the model so it can adapt.
Evaluation loop
After task completion, the Evaluator runs all registered checks. Failed checks can trigger remediation (retry with error context), preventing premature success declarations.
Extending
See EXTENSION_GUIDE.md for detailed instructions on adding:
- New providers
- New tools
- New skills
- New evaluator checks
- Custom renderers
License
MIT
Generated by ML Intern
This model repository was generated by ML Intern, an agent for machine learning research and development on the Hugging Face Hub.
- Try ML Intern: https://smolagents-ml-intern.hf.space
- Source code: https://github.com/huggingface/ml-intern
Usage
from transformers import AutoModelForCausalLM, AutoTokenizer
model_id = 'stevenkhan/ai-harness'
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id)
For non-causal architectures, replace AutoModelForCausalLM with the appropriate AutoModel class.