agent-learn / README.md
Chris4K's picture
Upload 4 files
cd43a29 verified
---
title: agent-learn FORGE Learning Layer
emoji: 🧠
colorFrom: red
colorTo: purple
sdk: docker
pinned: true
license: mit
short_description: Persistent Q-table, reward scoring, and RLHF store for FORGE
---
# 🧠 agent-learn
### FORGE Persistent Learning Layer
Owns: Q-table (persistent), reward scoring pipeline, RLHF data store, skill candidate review.
Replaces the critical NEXUS /tmp Q-table that resets on every restart.
## What it does
1. **Q-table** — agents ask "what's the best action for my current state?" → epsilon-greedy response
2. **Reward pipeline** — pulls unscored traces from agent-trace, scores them, writes rewards back
3. **RLHF store** — labeled approve/reject completions for future fine-tuning
4. **Skill candidates** — patterns detected by agents that recur enough to become FORGE skills
## REST API
```
GET /api/q?agent=&state={} Get all Q-values for agent+state
POST /api/q/best Best action (epsilon-greedy): {agent, state, actions[]}
POST /api/q/update Q-value update: {agent, state, action, reward, next_state?}
POST /api/q/hint Manual nudge: {agent, state, action, nudge}
GET /api/q/stats Q-table stats
POST /api/score Score a single trace event → reward
POST /api/sync Trigger trace pull + reward scoring now
GET /api/rlhf List RLHF entries
POST /api/rlhf Add labeled completion
PATCH /api/rlhf/{id} Update label/reward
GET /api/candidates List skill candidates (status=pending)
PATCH /api/candidates/{id} Update candidate (status: promoted|rejected)
GET /api/stats Full learning stats
GET /api/reward-trend Hourly avg reward trend
```
## MCP
```
GET /mcp/sse SSE transport
POST /mcp JSON-RPC 2.0
Tools: learn_q_get, learn_q_best, learn_q_update, learn_q_hint,
learn_stats, learn_rlhf_add, learn_score_trace,
learn_candidate_add, learn_sync
```
## Secrets
| Key | Description |
|-----|-------------|
| `LEARN_KEY` | Optional write auth key |
| `TRACE_URL` | agent-trace URL (default: https://chris4k-agent-trace.hf.space) |
| `TRACE_KEY` | agent-trace auth key (if set) |
| `LEARN_RATE` | Q-learning α (default: 0.1) |
| `DISCOUNT` | Q-learning γ (default: 0.9) |
| `EPSILON` | Exploration rate (default: 0.15) |
| `SYNC_INTERVAL` | Trace pull interval seconds (default: 120) |
## NEXUS integration (replacing /tmp Q-table)
```python
LEARN_URL = "https://chris4k-agent-learn.hf.space"
# Before routing: ask LEARN for best model
resp = requests.post(f"{LEARN_URL}/api/q/best", json={
"agent": "nexus",
"state": {"agent": "nexus", "event": "model_selection"},
"actions": ["qwen/qwen3.5-35b-a3b", "claude-haiku-4-5", "hf_api", "local_cpu"]
}, timeout=3)
best_model = resp.json()["action"]
# After inference: update Q-value
requests.post(f"{LEARN_URL}/api/q/update", json={
"agent": "nexus",
"state": {"agent": "nexus", "event": "model_selection"},
"action": best_model,
"reward": 0.8
}, timeout=3)
```
Built by [Chris4K](https://huggingface.co/Chris4K) — ki-fusion-labs.de