Spaces:
Runtime error
Runtime error
feat: auto-derive dataset repo name from HF_TOKEN when AUTO_CREATE_DATASET=true
Browse filesWhen AUTO_CREATE_DATASET=true and OPENCLAW_DATASET_REPO is not set,
HuggingClaw now uses HfApi.whoami() to get the username from HF_TOKEN
and derives the repo name as "username/HuggingClaw-data".
- Add auto-derive logic in sync_hf.py
- Update README with Data Persistence section (Manual vs Auto mode)
- Update .env.example with detailed documentation for both modes
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- .env.example +10 -4
- README.md +28 -8
- scripts/sync_hf.py +16 -2
.env.example
CHANGED
|
@@ -42,16 +42,22 @@ HF_TOKEN=hf_xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
|
|
| 42 |
|
| 43 |
# Target Dataset repository for data backup.
|
| 44 |
# Format: your-username/repo-name
|
| 45 |
-
# If the repo doesn't exist, HuggingClaw auto-creates it as PRIVATE.
|
| 46 |
# Example: tao-shen/HuggingClaw-data
|
| 47 |
#
|
| 48 |
-
#
|
|
|
|
|
|
|
|
|
|
|
|
|
| 49 |
#
|
| 50 |
OPENCLAW_DATASET_REPO=your-username/HuggingClaw-data
|
| 51 |
|
| 52 |
# Whether to auto-create the Dataset repo if it doesn't exist.
|
| 53 |
-
#
|
| 54 |
-
#
|
|
|
|
|
|
|
|
|
|
| 55 |
#
|
| 56 |
# [OPTIONAL] Default: false
|
| 57 |
#
|
|
|
|
| 42 |
|
| 43 |
# Target Dataset repository for data backup.
|
| 44 |
# Format: your-username/repo-name
|
|
|
|
| 45 |
# Example: tao-shen/HuggingClaw-data
|
| 46 |
#
|
| 47 |
+
# Manual mode (default): create the repo yourself, then set this variable.
|
| 48 |
+
# Auto mode (AUTO_CREATE_DATASET=true): if not set, HuggingClaw derives
|
| 49 |
+
# it from your HF_TOKEN username β "your-username/HuggingClaw-data".
|
| 50 |
+
#
|
| 51 |
+
# [REQUIRED in manual mode, OPTIONAL in auto mode]
|
| 52 |
#
|
| 53 |
OPENCLAW_DATASET_REPO=your-username/HuggingClaw-data
|
| 54 |
|
| 55 |
# Whether to auto-create the Dataset repo if it doesn't exist.
|
| 56 |
+
# When true: HuggingClaw creates a PRIVATE dataset repo on first startup.
|
| 57 |
+
# If OPENCLAW_DATASET_REPO is not set, the repo name is auto-derived
|
| 58 |
+
# from your HF_TOKEN username (e.g. "your-username/HuggingClaw-data").
|
| 59 |
+
# When false (default): you must create the repo manually on HuggingFace
|
| 60 |
+
# and set OPENCLAW_DATASET_REPO yourself.
|
| 61 |
#
|
| 62 |
# [OPTIONAL] Default: false
|
| 63 |
#
|
README.md
CHANGED
|
@@ -80,25 +80,45 @@ Go to **Settings β Repository secrets** and configure:
|
|
| 80 |
|--------|:------:|-------------|---------|
|
| 81 |
| `OPENCLAW_PASSWORD` | Recommended | Password for the Control UI (default: `huggingclaw`) | `my-secret-password` |
|
| 82 |
| `HF_TOKEN` | **Required** | HF Access Token with write permission ([create one](https://huggingface.co/settings/tokens)) | `hf_AbCdEfGhIjKlMnOpQrStUvWxYz` |
|
| 83 |
-
| `OPENCLAW_DATASET_REPO` |
|
| 84 |
| `OPENAI_API_KEY` | Recommended | OpenAI (or any [OpenAI-compatible](https://openclawdoc.com/docs/reference/environment-variables)) API key | `sk-proj-xxxxxxxxxxxx` |
|
| 85 |
| `OPENROUTER_API_KEY` | Optional | [OpenRouter](https://openrouter.ai) API key (200+ models, free tier available) | `sk-or-v1-xxxxxxxxxxxx` |
|
| 86 |
| `ANTHROPIC_API_KEY` | Optional | Anthropic Claude API key | `sk-ant-xxxxxxxxxxxx` |
|
| 87 |
| `GOOGLE_API_KEY` | Optional | Google / Gemini API key | `AIzaSyXxXxXxXxXx` |
|
| 88 |
-
| `OPENCLAW_DEFAULT_MODEL` | Optional | Default model for new conversations | `
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 89 |
|
| 90 |
### Environment Variables
|
| 91 |
|
| 92 |
-
|
| 93 |
|
| 94 |
| Variable | Default | Description |
|
| 95 |
|----------|---------|-------------|
|
| 96 |
-
| `AUTO_CREATE_DATASET` | `false` | **Auto-create the Dataset repo**
|
| 97 |
-
| `SYNC_INTERVAL` | `60` | **Backup interval in seconds.** How often HuggingClaw syncs
|
| 98 |
-
| `NODE_MEMORY_LIMIT` | `512` | **Node.js heap memory limit in MB.** HF free tier provides 16 GB RAM;
|
| 99 |
-
| `TZ` | `UTC` | **Timezone** for log timestamps
|
| 100 |
|
| 101 |
-
> For the full list
|
| 102 |
|
| 103 |
### 3. Open the Control UI
|
| 104 |
|
|
|
|
| 80 |
|--------|:------:|-------------|---------|
|
| 81 |
| `OPENCLAW_PASSWORD` | Recommended | Password for the Control UI (default: `huggingclaw`) | `my-secret-password` |
|
| 82 |
| `HF_TOKEN` | **Required** | HF Access Token with write permission ([create one](https://huggingface.co/settings/tokens)) | `hf_AbCdEfGhIjKlMnOpQrStUvWxYz` |
|
| 83 |
+
| `OPENCLAW_DATASET_REPO` | See below | Dataset repo for backup β format: `username/repo-name`. Required in manual mode; optional in auto mode (see [Data Persistence](#data-persistence)) | `tao-shen/HuggingClaw-data` |
|
| 84 |
| `OPENAI_API_KEY` | Recommended | OpenAI (or any [OpenAI-compatible](https://openclawdoc.com/docs/reference/environment-variables)) API key | `sk-proj-xxxxxxxxxxxx` |
|
| 85 |
| `OPENROUTER_API_KEY` | Optional | [OpenRouter](https://openrouter.ai) API key (200+ models, free tier available) | `sk-or-v1-xxxxxxxxxxxx` |
|
| 86 |
| `ANTHROPIC_API_KEY` | Optional | Anthropic Claude API key | `sk-ant-xxxxxxxxxxxx` |
|
| 87 |
| `GOOGLE_API_KEY` | Optional | Google / Gemini API key | `AIzaSyXxXxXxXxXx` |
|
| 88 |
+
| `OPENCLAW_DEFAULT_MODEL` | Optional | Default model for new conversations | `openai/gpt-oss-20b:free` |
|
| 89 |
+
|
| 90 |
+
### Data Persistence
|
| 91 |
+
|
| 92 |
+
HuggingClaw syncs `~/.openclaw` (conversations, settings, credentials) to a private HuggingFace Dataset repo so data survives restarts. There are two ways to set this up:
|
| 93 |
+
|
| 94 |
+
**Option A β Manual mode (default, recommended)**
|
| 95 |
+
|
| 96 |
+
1. Go to [huggingface.co/new-dataset](https://huggingface.co/new-dataset) and create a **private** Dataset repo (e.g. `your-name/HuggingClaw-data`)
|
| 97 |
+
2. Set `OPENCLAW_DATASET_REPO` = `your-name/HuggingClaw-data` in your Space secrets
|
| 98 |
+
3. Set `HF_TOKEN` with write permission
|
| 99 |
+
4. Done β HuggingClaw will sync to this repo every 60 seconds
|
| 100 |
+
|
| 101 |
+
**Option B β Auto mode**
|
| 102 |
+
|
| 103 |
+
1. Set `AUTO_CREATE_DATASET` = `true` in your Space secrets
|
| 104 |
+
2. Set `HF_TOKEN` with write permission
|
| 105 |
+
3. (Optional) Set `OPENCLAW_DATASET_REPO` if you want a custom repo name
|
| 106 |
+
4. On first startup, HuggingClaw automatically creates a **private** Dataset repo. If `OPENCLAW_DATASET_REPO` is not set, it derives the name from your HF token username: `your-username/HuggingClaw-data`
|
| 107 |
+
|
| 108 |
+
> **Security note:** `AUTO_CREATE_DATASET` defaults to `false` β the system will not create repos on your behalf unless you explicitly opt in.
|
| 109 |
|
| 110 |
### Environment Variables
|
| 111 |
|
| 112 |
+
Fine-tune persistence and performance. Set these as **Repository Secrets** in HF Spaces, or in `.env` for local Docker.
|
| 113 |
|
| 114 |
| Variable | Default | Description |
|
| 115 |
|----------|---------|-------------|
|
| 116 |
+
| `AUTO_CREATE_DATASET` | `false` | **Auto-create the Dataset repo.** Default is `false` for security. Set to `true` to let HuggingClaw automatically create a **private** Dataset repo on first startup (and auto-derive the repo name from your `HF_TOKEN` if `OPENCLAW_DATASET_REPO` is not set). Accepted values: `true`, `1`, `yes` / `false`, `0`, `no`. |
|
| 117 |
+
| `SYNC_INTERVAL` | `60` | **Backup interval in seconds.** How often HuggingClaw syncs `~/.openclaw` to the Dataset repo. Lower = safer but more API calls. Recommended: `60`β`300`. |
|
| 118 |
+
| `NODE_MEMORY_LIMIT` | `512` | **Node.js heap memory limit in MB.** HF free tier provides 16 GB RAM; 512 MB is enough for most cases. Increase for complex agent workflows. |
|
| 119 |
+
| `TZ` | `UTC` | **Timezone** for log timestamps. Example: `Asia/Shanghai`, `America/New_York`. |
|
| 120 |
|
| 121 |
+
> For the full list (including `OPENAI_BASE_URL`, `OLLAMA_HOST`, proxy settings, etc.), see [`.env.example`](.env.example).
|
| 122 |
|
| 123 |
### 3. Open the Control UI
|
| 124 |
|
scripts/sync_hf.py
CHANGED
|
@@ -49,7 +49,6 @@ class TeeLogger:
|
|
| 49 |
|
| 50 |
# ββ Configuration βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
|
| 51 |
|
| 52 |
-
HF_REPO_ID = os.environ.get("OPENCLAW_DATASET_REPO", "")
|
| 53 |
HF_TOKEN = os.environ.get("HF_TOKEN")
|
| 54 |
OPENCLAW_HOME = Path.home() / ".openclaw"
|
| 55 |
APP_DIR = Path("/app/openclaw")
|
|
@@ -79,6 +78,19 @@ SPACE_ID = os.environ.get("SPACE_ID", "") # e.g. "tao-shen/HuggingClaw"
|
|
| 79 |
SYNC_INTERVAL = int(os.environ.get("SYNC_INTERVAL", "60"))
|
| 80 |
AUTO_CREATE_DATASET = os.environ.get("AUTO_CREATE_DATASET", "false").lower() in ("true", "1", "yes")
|
| 81 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 82 |
# Setup logging
|
| 83 |
log_dir = OPENCLAW_HOME / "workspace"
|
| 84 |
log_dir.mkdir(parents=True, exist_ok=True)
|
|
@@ -99,7 +111,9 @@ class OpenClawFullSync:
|
|
| 99 |
print("[SYNC] WARNING: HF_TOKEN not set. Persistence disabled.")
|
| 100 |
return
|
| 101 |
if not HF_REPO_ID:
|
| 102 |
-
print("[SYNC] INFO: OPENCLAW_DATASET_REPO not set
|
|
|
|
|
|
|
| 103 |
return
|
| 104 |
|
| 105 |
self.enabled = True
|
|
|
|
| 49 |
|
| 50 |
# ββ Configuration βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
|
| 51 |
|
|
|
|
| 52 |
HF_TOKEN = os.environ.get("HF_TOKEN")
|
| 53 |
OPENCLAW_HOME = Path.home() / ".openclaw"
|
| 54 |
APP_DIR = Path("/app/openclaw")
|
|
|
|
| 78 |
SYNC_INTERVAL = int(os.environ.get("SYNC_INTERVAL", "60"))
|
| 79 |
AUTO_CREATE_DATASET = os.environ.get("AUTO_CREATE_DATASET", "false").lower() in ("true", "1", "yes")
|
| 80 |
|
| 81 |
+
# Dataset repo: user-specified, or auto-derived from HF_TOKEN username
|
| 82 |
+
HF_REPO_ID = os.environ.get("OPENCLAW_DATASET_REPO", "")
|
| 83 |
+
if not HF_REPO_ID and AUTO_CREATE_DATASET and HF_TOKEN:
|
| 84 |
+
try:
|
| 85 |
+
_api = HfApi(token=HF_TOKEN)
|
| 86 |
+
_username = _api.whoami()["name"]
|
| 87 |
+
HF_REPO_ID = f"{_username}/HuggingClaw-data"
|
| 88 |
+
print(f"[SYNC] OPENCLAW_DATASET_REPO not set β auto-derived: {HF_REPO_ID}")
|
| 89 |
+
del _api, _username
|
| 90 |
+
except Exception as e:
|
| 91 |
+
print(f"[SYNC] WARNING: Could not derive username from HF_TOKEN: {e}")
|
| 92 |
+
HF_REPO_ID = ""
|
| 93 |
+
|
| 94 |
# Setup logging
|
| 95 |
log_dir = OPENCLAW_HOME / "workspace"
|
| 96 |
log_dir.mkdir(parents=True, exist_ok=True)
|
|
|
|
| 111 |
print("[SYNC] WARNING: HF_TOKEN not set. Persistence disabled.")
|
| 112 |
return
|
| 113 |
if not HF_REPO_ID:
|
| 114 |
+
print("[SYNC] INFO: OPENCLAW_DATASET_REPO not set and AUTO_CREATE_DATASET is disabled.")
|
| 115 |
+
print("[SYNC] β Set OPENCLAW_DATASET_REPO, or set AUTO_CREATE_DATASET=true to auto-create.")
|
| 116 |
+
print("[SYNC] Persistence disabled.")
|
| 117 |
return
|
| 118 |
|
| 119 |
self.enabled = True
|