Spaces:

Chris4K
/

agent-learn

Sleeping

File size: 3,164 Bytes

---
title: agent-learn — FORGE Learning Layer
emoji: 🧠
colorFrom: red
colorTo: purple
sdk: docker
pinned: true
license: mit
short_description: Persistent Q-table, reward scoring, and RLHF store for FORGE
---

# 🧠 agent-learn
### FORGE Persistent Learning Layer

Owns: Q-table (persistent), reward scoring pipeline, RLHF data store, skill candidate review.
Replaces the critical NEXUS /tmp Q-table that resets on every restart.

## What it does

1. **Q-table** — agents ask "what's the best action for my current state?" → epsilon-greedy response
2. **Reward pipeline** — pulls unscored traces from agent-trace, scores them, writes rewards back
3. **RLHF store** — labeled approve/reject completions for future fine-tuning
4. **Skill candidates** — patterns detected by agents that recur enough to become FORGE skills

## REST API

```
GET  /api/q?agent=&state={}    Get all Q-values for agent+state
POST /api/q/best               Best action (epsilon-greedy): {agent, state, actions[]}
POST /api/q/update             Q-value update: {agent, state, action, reward, next_state?}
POST /api/q/hint               Manual nudge: {agent, state, action, nudge}
GET  /api/q/stats              Q-table stats

POST /api/score                Score a single trace event → reward
POST /api/sync                 Trigger trace pull + reward scoring now

GET  /api/rlhf                 List RLHF entries
POST /api/rlhf                 Add labeled completion
PATCH /api/rlhf/{id}          Update label/reward

GET  /api/candidates           List skill candidates (status=pending)
PATCH /api/candidates/{id}    Update candidate (status: promoted|rejected)

GET  /api/stats                Full learning stats
GET  /api/reward-trend         Hourly avg reward trend
```

## MCP

```
GET /mcp/sse   SSE transport
POST /mcp      JSON-RPC 2.0

Tools: learn_q_get, learn_q_best, learn_q_update, learn_q_hint,
       learn_stats, learn_rlhf_add, learn_score_trace,
       learn_candidate_add, learn_sync
```

## Secrets

| Key | Description |
|-----|-------------|
| `LEARN_KEY` | Optional write auth key |
| `TRACE_URL` | agent-trace URL (default: https://chris4k-agent-trace.hf.space) |
| `TRACE_KEY` | agent-trace auth key (if set) |
| `LEARN_RATE` | Q-learning α (default: 0.1) |
| `DISCOUNT` | Q-learning γ (default: 0.9) |
| `EPSILON` | Exploration rate (default: 0.15) |
| `SYNC_INTERVAL` | Trace pull interval seconds (default: 120) |

## NEXUS integration (replacing /tmp Q-table)

```python
LEARN_URL = "https://chris4k-agent-learn.hf.space"

# Before routing: ask LEARN for best model
resp = requests.post(f"{LEARN_URL}/api/q/best", json={
    "agent": "nexus",
    "state": {"agent": "nexus", "event": "model_selection"},
    "actions": ["qwen/qwen3.5-35b-a3b", "claude-haiku-4-5", "hf_api", "local_cpu"]
}, timeout=3)
best_model = resp.json()["action"]

# After inference: update Q-value
requests.post(f"{LEARN_URL}/api/q/update", json={
    "agent": "nexus",
    "state": {"agent": "nexus", "event": "model_selection"},
    "action": best_model,
    "reward": 0.8
}, timeout=3)
```

Built by [Chris4K](https://huggingface.co/Chris4K) — ki-fusion-labs.de