--- title: agent-learn — FORGE Learning Layer emoji: 🧠 colorFrom: red colorTo: purple sdk: docker pinned: true license: mit short_description: Persistent Q-table, reward scoring, and RLHF store for FORGE --- # 🧠 agent-learn ### FORGE Persistent Learning Layer Owns: Q-table (persistent), reward scoring pipeline, RLHF data store, skill candidate review. Replaces the critical NEXUS /tmp Q-table that resets on every restart. ## What it does 1. **Q-table** — agents ask "what's the best action for my current state?" → epsilon-greedy response 2. **Reward pipeline** — pulls unscored traces from agent-trace, scores them, writes rewards back 3. **RLHF store** — labeled approve/reject completions for future fine-tuning 4. **Skill candidates** — patterns detected by agents that recur enough to become FORGE skills ## REST API ``` GET /api/q?agent=&state={} Get all Q-values for agent+state POST /api/q/best Best action (epsilon-greedy): {agent, state, actions[]} POST /api/q/update Q-value update: {agent, state, action, reward, next_state?} POST /api/q/hint Manual nudge: {agent, state, action, nudge} GET /api/q/stats Q-table stats POST /api/score Score a single trace event → reward POST /api/sync Trigger trace pull + reward scoring now GET /api/rlhf List RLHF entries POST /api/rlhf Add labeled completion PATCH /api/rlhf/{id} Update label/reward GET /api/candidates List skill candidates (status=pending) PATCH /api/candidates/{id} Update candidate (status: promoted|rejected) GET /api/stats Full learning stats GET /api/reward-trend Hourly avg reward trend ``` ## MCP ``` GET /mcp/sse SSE transport POST /mcp JSON-RPC 2.0 Tools: learn_q_get, learn_q_best, learn_q_update, learn_q_hint, learn_stats, learn_rlhf_add, learn_score_trace, learn_candidate_add, learn_sync ``` ## Secrets | Key | Description | |-----|-------------| | `LEARN_KEY` | Optional write auth key | | `TRACE_URL` | agent-trace URL (default: https://chris4k-agent-trace.hf.space) | | `TRACE_KEY` | agent-trace auth key (if set) | | `LEARN_RATE` | Q-learning α (default: 0.1) | | `DISCOUNT` | Q-learning γ (default: 0.9) | | `EPSILON` | Exploration rate (default: 0.15) | | `SYNC_INTERVAL` | Trace pull interval seconds (default: 120) | ## NEXUS integration (replacing /tmp Q-table) ```python LEARN_URL = "https://chris4k-agent-learn.hf.space" # Before routing: ask LEARN for best model resp = requests.post(f"{LEARN_URL}/api/q/best", json={ "agent": "nexus", "state": {"agent": "nexus", "event": "model_selection"}, "actions": ["qwen/qwen3.5-35b-a3b", "claude-haiku-4-5", "hf_api", "local_cpu"] }, timeout=3) best_model = resp.json()["action"] # After inference: update Q-value requests.post(f"{LEARN_URL}/api/q/update", json={ "agent": "nexus", "state": {"agent": "nexus", "event": "model_selection"}, "action": best_model, "reward": 0.8 }, timeout=3) ``` Built by [Chris4K](https://huggingface.co/Chris4K) — ki-fusion-labs.de