Spaces:
Running
Running
metadata
title: agent-learn — FORGE Learning Layer
emoji: 🧠
colorFrom: red
colorTo: purple
sdk: docker
pinned: true
license: mit
short_description: Persistent Q-table, reward scoring, and RLHF store for FORGE
🧠 agent-learn
FORGE Persistent Learning Layer
Owns: Q-table (persistent), reward scoring pipeline, RLHF data store, skill candidate review. Replaces the critical NEXUS /tmp Q-table that resets on every restart.
What it does
- Q-table — agents ask "what's the best action for my current state?" → epsilon-greedy response
- Reward pipeline — pulls unscored traces from agent-trace, scores them, writes rewards back
- RLHF store — labeled approve/reject completions for future fine-tuning
- Skill candidates — patterns detected by agents that recur enough to become FORGE skills
REST API
GET /api/q?agent=&state={} Get all Q-values for agent+state
POST /api/q/best Best action (epsilon-greedy): {agent, state, actions[]}
POST /api/q/update Q-value update: {agent, state, action, reward, next_state?}
POST /api/q/hint Manual nudge: {agent, state, action, nudge}
GET /api/q/stats Q-table stats
POST /api/score Score a single trace event → reward
POST /api/sync Trigger trace pull + reward scoring now
GET /api/rlhf List RLHF entries
POST /api/rlhf Add labeled completion
PATCH /api/rlhf/{id} Update label/reward
GET /api/candidates List skill candidates (status=pending)
PATCH /api/candidates/{id} Update candidate (status: promoted|rejected)
GET /api/stats Full learning stats
GET /api/reward-trend Hourly avg reward trend
MCP
GET /mcp/sse SSE transport
POST /mcp JSON-RPC 2.0
Tools: learn_q_get, learn_q_best, learn_q_update, learn_q_hint,
learn_stats, learn_rlhf_add, learn_score_trace,
learn_candidate_add, learn_sync
Secrets
| Key | Description |
|---|---|
LEARN_KEY |
Optional write auth key |
TRACE_URL |
agent-trace URL (default: https://chris4k-agent-trace.hf.space) |
TRACE_KEY |
agent-trace auth key (if set) |
LEARN_RATE |
Q-learning α (default: 0.1) |
DISCOUNT |
Q-learning γ (default: 0.9) |
EPSILON |
Exploration rate (default: 0.15) |
SYNC_INTERVAL |
Trace pull interval seconds (default: 120) |
NEXUS integration (replacing /tmp Q-table)
LEARN_URL = "https://chris4k-agent-learn.hf.space"
# Before routing: ask LEARN for best model
resp = requests.post(f"{LEARN_URL}/api/q/best", json={
"agent": "nexus",
"state": {"agent": "nexus", "event": "model_selection"},
"actions": ["qwen/qwen3.5-35b-a3b", "claude-haiku-4-5", "hf_api", "local_cpu"]
}, timeout=3)
best_model = resp.json()["action"]
# After inference: update Q-value
requests.post(f"{LEARN_URL}/api/q/update", json={
"agent": "nexus",
"state": {"agent": "nexus", "event": "model_selection"},
"action": best_model,
"reward": 0.8
}, timeout=3)
Built by Chris4K — ki-fusion-labs.de