Spaces:
Sleeping
Sleeping
File size: 3,164 Bytes
5245034 cd43a29 5245034 cd43a29 5245034 cd43a29 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 | ---
title: agent-learn — FORGE Learning Layer
emoji: 🧠
colorFrom: red
colorTo: purple
sdk: docker
pinned: true
license: mit
short_description: Persistent Q-table, reward scoring, and RLHF store for FORGE
---
# 🧠 agent-learn
### FORGE Persistent Learning Layer
Owns: Q-table (persistent), reward scoring pipeline, RLHF data store, skill candidate review.
Replaces the critical NEXUS /tmp Q-table that resets on every restart.
## What it does
1. **Q-table** — agents ask "what's the best action for my current state?" → epsilon-greedy response
2. **Reward pipeline** — pulls unscored traces from agent-trace, scores them, writes rewards back
3. **RLHF store** — labeled approve/reject completions for future fine-tuning
4. **Skill candidates** — patterns detected by agents that recur enough to become FORGE skills
## REST API
```
GET /api/q?agent=&state={} Get all Q-values for agent+state
POST /api/q/best Best action (epsilon-greedy): {agent, state, actions[]}
POST /api/q/update Q-value update: {agent, state, action, reward, next_state?}
POST /api/q/hint Manual nudge: {agent, state, action, nudge}
GET /api/q/stats Q-table stats
POST /api/score Score a single trace event → reward
POST /api/sync Trigger trace pull + reward scoring now
GET /api/rlhf List RLHF entries
POST /api/rlhf Add labeled completion
PATCH /api/rlhf/{id} Update label/reward
GET /api/candidates List skill candidates (status=pending)
PATCH /api/candidates/{id} Update candidate (status: promoted|rejected)
GET /api/stats Full learning stats
GET /api/reward-trend Hourly avg reward trend
```
## MCP
```
GET /mcp/sse SSE transport
POST /mcp JSON-RPC 2.0
Tools: learn_q_get, learn_q_best, learn_q_update, learn_q_hint,
learn_stats, learn_rlhf_add, learn_score_trace,
learn_candidate_add, learn_sync
```
## Secrets
| Key | Description |
|-----|-------------|
| `LEARN_KEY` | Optional write auth key |
| `TRACE_URL` | agent-trace URL (default: https://chris4k-agent-trace.hf.space) |
| `TRACE_KEY` | agent-trace auth key (if set) |
| `LEARN_RATE` | Q-learning α (default: 0.1) |
| `DISCOUNT` | Q-learning γ (default: 0.9) |
| `EPSILON` | Exploration rate (default: 0.15) |
| `SYNC_INTERVAL` | Trace pull interval seconds (default: 120) |
## NEXUS integration (replacing /tmp Q-table)
```python
LEARN_URL = "https://chris4k-agent-learn.hf.space"
# Before routing: ask LEARN for best model
resp = requests.post(f"{LEARN_URL}/api/q/best", json={
"agent": "nexus",
"state": {"agent": "nexus", "event": "model_selection"},
"actions": ["qwen/qwen3.5-35b-a3b", "claude-haiku-4-5", "hf_api", "local_cpu"]
}, timeout=3)
best_model = resp.json()["action"]
# After inference: update Q-value
requests.post(f"{LEARN_URL}/api/q/update", json={
"agent": "nexus",
"state": {"agent": "nexus", "event": "model_selection"},
"action": best_model,
"reward": 0.8
}, timeout=3)
```
Built by [Chris4K](https://huggingface.co/Chris4K) — ki-fusion-labs.de
|