Spaces:

Chris4K
/

agent-learn

Running

App Files Files Community

agent-learn / README.md

Chris4K

Upload 4 files

cd43a29 verified 5 days ago

preview code

raw

history blame contribute delete

3.16 kB

	---
	title: agent-learn — FORGE Learning Layer
	emoji: 🧠
	colorFrom: red
	colorTo: purple
	sdk: docker
	pinned: true
	license: mit
	short_description: Persistent Q-table, reward scoring, and RLHF store for FORGE
	---

	# 🧠 agent-learn
	### FORGE Persistent Learning Layer

	Owns: Q-table (persistent), reward scoring pipeline, RLHF data store, skill candidate review.
	Replaces the critical NEXUS /tmp Q-table that resets on every restart.

	## What it does

	1. Q-table — agents ask "what's the best action for my current state?" → epsilon-greedy response
	2. Reward pipeline — pulls unscored traces from agent-trace, scores them, writes rewards back
	3. RLHF store — labeled approve/reject completions for future fine-tuning
	4. Skill candidates — patterns detected by agents that recur enough to become FORGE skills

	## REST API

	```
	GET /api/q?agent=&state={} Get all Q-values for agent+state
	POST /api/q/best Best action (epsilon-greedy): {agent, state, actions[]}
	POST /api/q/update Q-value update: {agent, state, action, reward, next_state?}
	POST /api/q/hint Manual nudge: {agent, state, action, nudge}
	GET /api/q/stats Q-table stats

	POST /api/score Score a single trace event → reward
	POST /api/sync Trigger trace pull + reward scoring now

	GET /api/rlhf List RLHF entries
	POST /api/rlhf Add labeled completion
	PATCH /api/rlhf/{id} Update label/reward

	GET /api/candidates List skill candidates (status=pending)
	PATCH /api/candidates/{id} Update candidate (status: promoted\|rejected)

	GET /api/stats Full learning stats
	GET /api/reward-trend Hourly avg reward trend
	```

	## MCP

	```
	GET /mcp/sse SSE transport
	POST /mcp JSON-RPC 2.0

	Tools: learn_q_get, learn_q_best, learn_q_update, learn_q_hint,
	learn_stats, learn_rlhf_add, learn_score_trace,
	learn_candidate_add, learn_sync
	```

	## Secrets

	\| Key \| Description \|
	\|-----\|-------------\|
	\| `LEARN_KEY` \| Optional write auth key \|
	\| `TRACE_URL` \| agent-trace URL (default: https://chris4k-agent-trace.hf.space) \|
	\| `TRACE_KEY` \| agent-trace auth key (if set) \|
	\| `LEARN_RATE` \| Q-learning α (default: 0.1) \|
	\| `DISCOUNT` \| Q-learning γ (default: 0.9) \|
	\| `EPSILON` \| Exploration rate (default: 0.15) \|
	\| `SYNC_INTERVAL` \| Trace pull interval seconds (default: 120) \|

	## NEXUS integration (replacing /tmp Q-table)

	```python
	LEARN_URL = "https://chris4k-agent-learn.hf.space"

	# Before routing: ask LEARN for best model
	resp = requests.post(f"{LEARN_URL}/api/q/best", json={
	"agent": "nexus",
	"state": {"agent": "nexus", "event": "model_selection"},
	"actions": ["qwen/qwen3.5-35b-a3b", "claude-haiku-4-5", "hf_api", "local_cpu"]
	}, timeout=3)
	best_model = resp.json()["action"]

	# After inference: update Q-value
	requests.post(f"{LEARN_URL}/api/q/update", json={
	"agent": "nexus",
	"state": {"agent": "nexus", "event": "model_selection"},
	"action": best_model,
	"reward": 0.8
	}, timeout=3)
	```

	Built by [Chris4K](https://huggingface.co/Chris4K) — ki-fusion-labs.de