Hermes-Bonsai Karpathy Self-Improving Agent Loop

Stage 2 checkpoint for the Hermes/Bonsai Karpathy auto-research loop.

Last updated: 2026-04-05

This release is inspired by Andrej Karpathy's framing of self-improving training loops and auto-research. It contains the model artifact that worked, plus a concise model card explaining how it was produced and how to run it.

Overview

  • Base model: Qwen3-8B-Base
  • Training method: supervised fine-tuning via the Hermes/Karpathy loop
  • Stage: Stage 2 โ€” the checkpoint that worked
  • Known limitation: Stage 3 exposed a learned-helplessness pattern on some tasks; that behavior is documented in the GitHub methodology repo
  • License: Apache-2.0 for this release; the underlying base model license also applies to the inherited Qwen3-8B-Base components

What went into this checkpoint

  • The loop-produced training curriculum and trace distillation pipeline
  • 140 verified raw passes used as positive reinforcement for curriculum rebalancing and trace selection
    • These are Bonsai's own unedited outputs that passed teacher evaluation
  • 10 domains covered across the build
  • Validation signal from a mixed-domain batch

Domains covered

  • memory_integration
  • refusal_redirect
  • self_correction
  • agent_routing
  • devops
  • logic_puzzle
  • code_debugging
  • math
  • architecture
  • research_synthesis

Strongest domains

Best performance concentrated in:

  • memory_integration
  • refusal_redirect
  • self_correction

Validation metrics

  • Mixed-domain batch: 13/50 raw passes
  • Raw pass rate: 26%
  • This checkpoint is the stage 2 model that produced those verified passes

What's novel

Trained via a graduation protocol with teacher-guided validation, raw-pass reinforcement, and frontier failure analysis. The interesting contribution is the loop methodology; see GitHub for the full curriculum and training workflow.

GitHub methodology

The training loop, curriculum design, graduation protocol, and detailed methodology live here:

https://github.com/aurous37-lang/Hermes-Bonsai-Self-Improving-Agent-Loop

Files in this Hugging Face repo

  • bonsai-8b-stage2-post-curriculum-q8.gguf โ€” the shipped stage 2 checkpoint
  • README.md โ€” this model card
  • LICENSE โ€” Apache-2.0 license

How to use

Recommended working config from the stable local run:

  • --ctx-size 40960
  • --n-gpu-layers 37

llama.cpp

./llama-cli -m bonsai-8b-stage2-post-curriculum-q8.gguf \
  --ctx-size 40960 \
  -p "Explain the CAP theorem for a backend engineer."

llama-server

./llama-server -m bonsai-8b-stage2-post-curriculum-q8.gguf \
  --ctx-size 40960 \
  --n-gpu-layers 37 \
  --host 0.0.0.0 --port 8080

Then point your client at the local OpenAI-compatible endpoint exposed by llama-server.

Notes

  • This is a release checkpoint, not the full training corpus.
  • The GitHub repo contains the code and documentation needed to reproduce the loop.
  • The Hugging Face repo contains the model artifact that ships from that loop.
Downloads last month
103
GGUF
Model size
8B params
Architecture
qwen3
Hardware compatibility
Log In to add your hardware

We're not able to determine the quantization variants.

Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for Jashan887/83_Self_Improving_Loop

Quantized
(40)
this model