Spaces:
Running
Running
| title: agent-learn — FORGE Learning Layer | |
| emoji: 🧠 | |
| colorFrom: red | |
| colorTo: purple | |
| sdk: docker | |
| pinned: true | |
| license: mit | |
| short_description: Persistent Q-table, reward scoring, and RLHF store for FORGE | |
| # 🧠 agent-learn | |
| ### FORGE Persistent Learning Layer | |
| Owns: Q-table (persistent), reward scoring pipeline, RLHF data store, skill candidate review. | |
| Replaces the critical NEXUS /tmp Q-table that resets on every restart. | |
| ## What it does | |
| 1. **Q-table** — agents ask "what's the best action for my current state?" → epsilon-greedy response | |
| 2. **Reward pipeline** — pulls unscored traces from agent-trace, scores them, writes rewards back | |
| 3. **RLHF store** — labeled approve/reject completions for future fine-tuning | |
| 4. **Skill candidates** — patterns detected by agents that recur enough to become FORGE skills | |
| ## REST API | |
| ``` | |
| GET /api/q?agent=&state={} Get all Q-values for agent+state | |
| POST /api/q/best Best action (epsilon-greedy): {agent, state, actions[]} | |
| POST /api/q/update Q-value update: {agent, state, action, reward, next_state?} | |
| POST /api/q/hint Manual nudge: {agent, state, action, nudge} | |
| GET /api/q/stats Q-table stats | |
| POST /api/score Score a single trace event → reward | |
| POST /api/sync Trigger trace pull + reward scoring now | |
| GET /api/rlhf List RLHF entries | |
| POST /api/rlhf Add labeled completion | |
| PATCH /api/rlhf/{id} Update label/reward | |
| GET /api/candidates List skill candidates (status=pending) | |
| PATCH /api/candidates/{id} Update candidate (status: promoted|rejected) | |
| GET /api/stats Full learning stats | |
| GET /api/reward-trend Hourly avg reward trend | |
| ``` | |
| ## MCP | |
| ``` | |
| GET /mcp/sse SSE transport | |
| POST /mcp JSON-RPC 2.0 | |
| Tools: learn_q_get, learn_q_best, learn_q_update, learn_q_hint, | |
| learn_stats, learn_rlhf_add, learn_score_trace, | |
| learn_candidate_add, learn_sync | |
| ``` | |
| ## Secrets | |
| | Key | Description | | |
| |-----|-------------| | |
| | `LEARN_KEY` | Optional write auth key | | |
| | `TRACE_URL` | agent-trace URL (default: https://chris4k-agent-trace.hf.space) | | |
| | `TRACE_KEY` | agent-trace auth key (if set) | | |
| | `LEARN_RATE` | Q-learning α (default: 0.1) | | |
| | `DISCOUNT` | Q-learning γ (default: 0.9) | | |
| | `EPSILON` | Exploration rate (default: 0.15) | | |
| | `SYNC_INTERVAL` | Trace pull interval seconds (default: 120) | | |
| ## NEXUS integration (replacing /tmp Q-table) | |
| ```python | |
| LEARN_URL = "https://chris4k-agent-learn.hf.space" | |
| # Before routing: ask LEARN for best model | |
| resp = requests.post(f"{LEARN_URL}/api/q/best", json={ | |
| "agent": "nexus", | |
| "state": {"agent": "nexus", "event": "model_selection"}, | |
| "actions": ["qwen/qwen3.5-35b-a3b", "claude-haiku-4-5", "hf_api", "local_cpu"] | |
| }, timeout=3) | |
| best_model = resp.json()["action"] | |
| # After inference: update Q-value | |
| requests.post(f"{LEARN_URL}/api/q/update", json={ | |
| "agent": "nexus", | |
| "state": {"agent": "nexus", "event": "model_selection"}, | |
| "action": best_model, | |
| "reward": 0.8 | |
| }, timeout=3) | |
| ``` | |
| Built by [Chris4K](https://huggingface.co/Chris4K) — ki-fusion-labs.de | |