agent-learn / README.md
Chris4K's picture
Upload 4 files
cd43a29 verified
metadata
title: agent-learn  FORGE Learning Layer
emoji: 🧠
colorFrom: red
colorTo: purple
sdk: docker
pinned: true
license: mit
short_description: Persistent Q-table, reward scoring, and RLHF store for FORGE

🧠 agent-learn

FORGE Persistent Learning Layer

Owns: Q-table (persistent), reward scoring pipeline, RLHF data store, skill candidate review. Replaces the critical NEXUS /tmp Q-table that resets on every restart.

What it does

  1. Q-table — agents ask "what's the best action for my current state?" → epsilon-greedy response
  2. Reward pipeline — pulls unscored traces from agent-trace, scores them, writes rewards back
  3. RLHF store — labeled approve/reject completions for future fine-tuning
  4. Skill candidates — patterns detected by agents that recur enough to become FORGE skills

REST API

GET  /api/q?agent=&state={}    Get all Q-values for agent+state
POST /api/q/best               Best action (epsilon-greedy): {agent, state, actions[]}
POST /api/q/update             Q-value update: {agent, state, action, reward, next_state?}
POST /api/q/hint               Manual nudge: {agent, state, action, nudge}
GET  /api/q/stats              Q-table stats

POST /api/score                Score a single trace event → reward
POST /api/sync                 Trigger trace pull + reward scoring now

GET  /api/rlhf                 List RLHF entries
POST /api/rlhf                 Add labeled completion
PATCH /api/rlhf/{id}          Update label/reward

GET  /api/candidates           List skill candidates (status=pending)
PATCH /api/candidates/{id}    Update candidate (status: promoted|rejected)

GET  /api/stats                Full learning stats
GET  /api/reward-trend         Hourly avg reward trend

MCP

GET /mcp/sse   SSE transport
POST /mcp      JSON-RPC 2.0

Tools: learn_q_get, learn_q_best, learn_q_update, learn_q_hint,
       learn_stats, learn_rlhf_add, learn_score_trace,
       learn_candidate_add, learn_sync

Secrets

Key Description
LEARN_KEY Optional write auth key
TRACE_URL agent-trace URL (default: https://chris4k-agent-trace.hf.space)
TRACE_KEY agent-trace auth key (if set)
LEARN_RATE Q-learning α (default: 0.1)
DISCOUNT Q-learning γ (default: 0.9)
EPSILON Exploration rate (default: 0.15)
SYNC_INTERVAL Trace pull interval seconds (default: 120)

NEXUS integration (replacing /tmp Q-table)

LEARN_URL = "https://chris4k-agent-learn.hf.space"

# Before routing: ask LEARN for best model
resp = requests.post(f"{LEARN_URL}/api/q/best", json={
    "agent": "nexus",
    "state": {"agent": "nexus", "event": "model_selection"},
    "actions": ["qwen/qwen3.5-35b-a3b", "claude-haiku-4-5", "hf_api", "local_cpu"]
}, timeout=3)
best_model = resp.json()["action"]

# After inference: update Q-value
requests.post(f"{LEARN_URL}/api/q/update", json={
    "agent": "nexus",
    "state": {"agent": "nexus", "event": "model_selection"},
    "action": best_model,
    "reward": 0.8
}, timeout=3)

Built by Chris4K — ki-fusion-labs.de