AI Harness

A production-grade, model-agnostic CLI harness for agentic AI workflows.

╭─────────────────────────────────────╮
│  ⚡ AI Harness  v0.1.0              │
│  model-agnostic CLI agent runtime   │
╰─────────────────────────────────────╯

What is this?

A terminal-first agent runtime. Not a toy chatbot. It supports:

Multiple LLM providers — OpenAI, Anthropic, Gemini, OpenRouter, any OpenAI-compatible endpoint
Typed tool calling — Zod-validated inputs/outputs, permissions, retries, timeouts
Modular skills — Attachable instruction packs per task
Structured runtime — Planner/executor/evaluator roles, budgets, loop detection
Beautiful CLI output — Streaming, spinners, panels, event timeline, metrics
Observability — Token usage, cost tracking, latency, success rates
Evaluation — Schema checks, rubric scoring, remediation loops
Artifact handling — Files, patches, logs, export to Markdown/JSON
Safety & permissions — Read/write/exec/network/dangerous levels with policy modes

Quick Start

# Install dependencies
pnpm install

# Build
pnpm build

# Interactive chat
pnpm chat

# Autonomous task
node dist/cli/index.js run "refactor the auth module to use JWT"

# List providers/models
node dist/cli/index.js providers

# List tools
node dist/cli/index.js tools

# List skills
node dist/cli/index.js skills

Configuration

Set provider API keys via environment variables:

export OPENAI_API_KEY="sk-..."
export ANTHROPIC_API_KEY="sk-ant-..."
export GEMINI_API_KEY="AI..."
export OPENROUTER_API_KEY="sk-or-..."

Override defaults with CLI flags:

harness chat --provider openai --model gpt-4o --skills coding research --verbose
harness run "build a REST API" --provider anthropic --model claude-sonnet-4-20250514 --budget-tokens 100000

Commands

Command	Description
`harness chat`	Interactive multi-turn chat
`harness run <goal>`	Autonomous task execution
`harness providers`	List providers and models
`harness tools`	List available tools
`harness skills`	List available skills
`harness config`	Show configuration

Architecture

src/
  core/
    events/        — Event types, EventBus
    provider/      — ProviderAdapter interface, message types
    runtime/       — Session state, orchestration loop
    tools/         — ToolRegistry, ToolDef, permissions
    skills/        — SkillRegistry, SkillModule
    evaluators/    — Evaluation checks, EvalReport
    artifacts/     — ArtifactStore, export
    policy/        — PolicyEngine, permission enforcement
    observability/ — MetricsCollector, MetricEntry
  providers/
    openai/        — OpenAI adapter
    anthropic/     — Anthropic adapter
    gemini/        — Google Gemini adapter
    openrouter/    — OpenRouter + OpenAI-compatible adapter
  tools/
    fs/            — read_file, write_file, list_directory
    shell/         — shell_exec
    web/           — web_fetch
  skills/
    coding/        — Software engineering instructions
    research/      — Research & analysis instructions
    docs/          — Technical writing instructions
  cli/
    index.ts       — Commander entry point
    commands/      — chat, run, providers, tools, skills, config
    renderers/     — EventRenderer, Spinner, box drawing, metrics
    state/         — Provider resolver, runtime factory

Key Design Decisions

Event-driven architecture

Everything flows through EventBus. Rendering, logging, metrics collection, and export all subscribe to the same event stream. This means you can add a new consumer (e.g., a web dashboard) without touching core logic.

Provider normalization

All providers implement ProviderAdapter with invoke() and stream(). Message format, tool calling conventions, and response parsing are handled per-provider so the runtime never sees vendor-specific shapes.

Typed tools with Zod

Every tool declares its input/output schemas with Zod. The runtime validates inputs before execution and can generate JSON Schema for model function-calling automatically.

Policy enforcement

The PolicyEngine checks permission levels against the current policy mode before executing any tool. Denied tools return structured error messages to the model so it can adapt.

Evaluation loop

After task completion, the Evaluator runs all registered checks. Failed checks can trigger remediation (retry with error context), preventing premature success declarations.

Extending

See EXTENSION_GUIDE.md for detailed instructions on adding:

New providers
New tools
New skills
New evaluator checks
Custom renderers

License

MIT

Generated by ML Intern

This model repository was generated by ML Intern, an agent for machine learning research and development on the Hugging Face Hub.

Try ML Intern: https://smolagents-ml-intern.hf.space
Source code: https://github.com/huggingface/ml-intern

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer

model_id = 'stevenkhan/ai-harness'
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id)

For non-causal architectures, replace AutoModelForCausalLM with the appropriate AutoModel class.

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support