AI Harness

A production-grade, model-agnostic CLI harness for agentic AI workflows.

โ•ญโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ•ฎ
โ”‚  โšก AI Harness  v0.1.0              โ”‚
โ”‚  model-agnostic CLI agent runtime   โ”‚
โ•ฐโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ•ฏ

What is this?

A terminal-first agent runtime. Not a toy chatbot. It supports:

  • Multiple LLM providers โ€” OpenAI, Anthropic, Gemini, OpenRouter, any OpenAI-compatible endpoint
  • Typed tool calling โ€” Zod-validated inputs/outputs, permissions, retries, timeouts
  • Modular skills โ€” Attachable instruction packs per task
  • Structured runtime โ€” Planner/executor/evaluator roles, budgets, loop detection
  • Beautiful CLI output โ€” Streaming, spinners, panels, event timeline, metrics
  • Observability โ€” Token usage, cost tracking, latency, success rates
  • Evaluation โ€” Schema checks, rubric scoring, remediation loops
  • Artifact handling โ€” Files, patches, logs, export to Markdown/JSON
  • Safety & permissions โ€” Read/write/exec/network/dangerous levels with policy modes

Quick Start

# Install dependencies
pnpm install

# Build
pnpm build

# Interactive chat
pnpm chat

# Autonomous task
node dist/cli/index.js run "refactor the auth module to use JWT"

# List providers/models
node dist/cli/index.js providers

# List tools
node dist/cli/index.js tools

# List skills
node dist/cli/index.js skills

Configuration

Set provider API keys via environment variables:

export OPENAI_API_KEY="sk-..."
export ANTHROPIC_API_KEY="sk-ant-..."
export GEMINI_API_KEY="AI..."
export OPENROUTER_API_KEY="sk-or-..."

Override defaults with CLI flags:

harness chat --provider openai --model gpt-4o --skills coding research --verbose
harness run "build a REST API" --provider anthropic --model claude-sonnet-4-20250514 --budget-tokens 100000

Commands

Command Description
harness chat Interactive multi-turn chat
harness run <goal> Autonomous task execution
harness providers List providers and models
harness tools List available tools
harness skills List available skills
harness config Show configuration

Architecture

src/
  core/
    events/        โ€” Event types, EventBus
    provider/      โ€” ProviderAdapter interface, message types
    runtime/       โ€” Session state, orchestration loop
    tools/         โ€” ToolRegistry, ToolDef, permissions
    skills/        โ€” SkillRegistry, SkillModule
    evaluators/    โ€” Evaluation checks, EvalReport
    artifacts/     โ€” ArtifactStore, export
    policy/        โ€” PolicyEngine, permission enforcement
    observability/ โ€” MetricsCollector, MetricEntry
  providers/
    openai/        โ€” OpenAI adapter
    anthropic/     โ€” Anthropic adapter
    gemini/        โ€” Google Gemini adapter
    openrouter/    โ€” OpenRouter + OpenAI-compatible adapter
  tools/
    fs/            โ€” read_file, write_file, list_directory
    shell/         โ€” shell_exec
    web/           โ€” web_fetch
  skills/
    coding/        โ€” Software engineering instructions
    research/      โ€” Research & analysis instructions
    docs/          โ€” Technical writing instructions
  cli/
    index.ts       โ€” Commander entry point
    commands/      โ€” chat, run, providers, tools, skills, config
    renderers/     โ€” EventRenderer, Spinner, box drawing, metrics
    state/         โ€” Provider resolver, runtime factory

Key Design Decisions

Event-driven architecture

Everything flows through EventBus. Rendering, logging, metrics collection, and export all subscribe to the same event stream. This means you can add a new consumer (e.g., a web dashboard) without touching core logic.

Provider normalization

All providers implement ProviderAdapter with invoke() and stream(). Message format, tool calling conventions, and response parsing are handled per-provider so the runtime never sees vendor-specific shapes.

Typed tools with Zod

Every tool declares its input/output schemas with Zod. The runtime validates inputs before execution and can generate JSON Schema for model function-calling automatically.

Policy enforcement

The PolicyEngine checks permission levels against the current policy mode before executing any tool. Denied tools return structured error messages to the model so it can adapt.

Evaluation loop

After task completion, the Evaluator runs all registered checks. Failed checks can trigger remediation (retry with error context), preventing premature success declarations.

Extending

See EXTENSION_GUIDE.md for detailed instructions on adding:

  • New providers
  • New tools
  • New skills
  • New evaluator checks
  • Custom renderers

License

MIT

Generated by ML Intern

This model repository was generated by ML Intern, an agent for machine learning research and development on the Hugging Face Hub.

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer

model_id = 'stevenkhan/ai-harness'
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id)

For non-causal architectures, replace AutoModelForCausalLM with the appropriate AutoModel class.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support