tao-shen Claude Opus 4.6 commited on
Commit
092c6d8
Β·
1 Parent(s): 11d18c7

feat: auto-derive dataset repo name from HF_TOKEN when AUTO_CREATE_DATASET=true

Browse files

When AUTO_CREATE_DATASET=true and OPENCLAW_DATASET_REPO is not set,
HuggingClaw now uses HfApi.whoami() to get the username from HF_TOKEN
and derives the repo name as "username/HuggingClaw-data".

- Add auto-derive logic in sync_hf.py
- Update README with Data Persistence section (Manual vs Auto mode)
- Update .env.example with detailed documentation for both modes

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Files changed (3) hide show
  1. .env.example +10 -4
  2. README.md +28 -8
  3. scripts/sync_hf.py +16 -2
.env.example CHANGED
@@ -42,16 +42,22 @@ HF_TOKEN=hf_xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
42
 
43
  # Target Dataset repository for data backup.
44
  # Format: your-username/repo-name
45
- # If the repo doesn't exist, HuggingClaw auto-creates it as PRIVATE.
46
  # Example: tao-shen/HuggingClaw-data
47
  #
48
- # [REQUIRED]
 
 
 
 
49
  #
50
  OPENCLAW_DATASET_REPO=your-username/HuggingClaw-data
51
 
52
  # Whether to auto-create the Dataset repo if it doesn't exist.
53
- # Set to true to let HuggingClaw create it automatically on first startup.
54
- # Default is false for security β€” you must create the repo manually first.
 
 
 
55
  #
56
  # [OPTIONAL] Default: false
57
  #
 
42
 
43
  # Target Dataset repository for data backup.
44
  # Format: your-username/repo-name
 
45
  # Example: tao-shen/HuggingClaw-data
46
  #
47
+ # Manual mode (default): create the repo yourself, then set this variable.
48
+ # Auto mode (AUTO_CREATE_DATASET=true): if not set, HuggingClaw derives
49
+ # it from your HF_TOKEN username β†’ "your-username/HuggingClaw-data".
50
+ #
51
+ # [REQUIRED in manual mode, OPTIONAL in auto mode]
52
  #
53
  OPENCLAW_DATASET_REPO=your-username/HuggingClaw-data
54
 
55
  # Whether to auto-create the Dataset repo if it doesn't exist.
56
+ # When true: HuggingClaw creates a PRIVATE dataset repo on first startup.
57
+ # If OPENCLAW_DATASET_REPO is not set, the repo name is auto-derived
58
+ # from your HF_TOKEN username (e.g. "your-username/HuggingClaw-data").
59
+ # When false (default): you must create the repo manually on HuggingFace
60
+ # and set OPENCLAW_DATASET_REPO yourself.
61
  #
62
  # [OPTIONAL] Default: false
63
  #
README.md CHANGED
@@ -80,25 +80,45 @@ Go to **Settings β†’ Repository secrets** and configure:
80
  |--------|:------:|-------------|---------|
81
  | `OPENCLAW_PASSWORD` | Recommended | Password for the Control UI (default: `huggingclaw`) | `my-secret-password` |
82
  | `HF_TOKEN` | **Required** | HF Access Token with write permission ([create one](https://huggingface.co/settings/tokens)) | `hf_AbCdEfGhIjKlMnOpQrStUvWxYz` |
83
- | `OPENCLAW_DATASET_REPO` | **Required** | Dataset repo for backup β€” format: `username/repo-name` | `tao-shen/HuggingClaw-data` |
84
  | `OPENAI_API_KEY` | Recommended | OpenAI (or any [OpenAI-compatible](https://openclawdoc.com/docs/reference/environment-variables)) API key | `sk-proj-xxxxxxxxxxxx` |
85
  | `OPENROUTER_API_KEY` | Optional | [OpenRouter](https://openrouter.ai) API key (200+ models, free tier available) | `sk-or-v1-xxxxxxxxxxxx` |
86
  | `ANTHROPIC_API_KEY` | Optional | Anthropic Claude API key | `sk-ant-xxxxxxxxxxxx` |
87
  | `GOOGLE_API_KEY` | Optional | Google / Gemini API key | `AIzaSyXxXxXxXxXx` |
88
- | `OPENCLAW_DEFAULT_MODEL` | Optional | Default model for new conversations | `openrouter/openai/gpt-oss-20b:free` |
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
89
 
90
  ### Environment Variables
91
 
92
- In addition to the secrets above, HuggingClaw provides environment variables to fine-tune persistence and performance. Set these the same way β€” as **Repository Secrets** in HF Spaces, or in your `.env` file for local Docker.
93
 
94
  | Variable | Default | Description |
95
  |----------|---------|-------------|
96
- | `AUTO_CREATE_DATASET` | `false` | **Auto-create the Dataset repo** if it doesn't exist. Default is `false` for security β€” you must [create the repo manually](https://huggingface.co/new-dataset) first. Set to `true` to let HuggingClaw automatically create a **private** Dataset repo (using the name from `OPENCLAW_DATASET_REPO`) on first startup. Accepted values: `true`, `1`, `yes` (enabled) / `false`, `0`, `no` (disabled). |
97
- | `SYNC_INTERVAL` | `60` | **Backup interval in seconds.** How often HuggingClaw syncs the `~/.openclaw` directory (conversations, settings, credentials) to the HuggingFace Dataset repo. Lower values mean less data loss on restart but more API calls. Recommended: `60`–`300`. |
98
- | `NODE_MEMORY_LIMIT` | `512` | **Node.js heap memory limit in MB.** HF free tier provides 16 GB RAM; the default 512 MB is enough for most cases. Increase if you run complex agent workflows or handle very large conversations. |
99
- | `TZ` | `UTC` | **Timezone** for log timestamps and scheduled tasks. Example: `Asia/Shanghai`, `America/New_York`. |
100
 
101
- > For the full list of environment variables (including `OPENAI_BASE_URL`, `OLLAMA_HOST`, proxy settings, and more), see [`.env.example`](.env.example).
102
 
103
  ### 3. Open the Control UI
104
 
 
80
  |--------|:------:|-------------|---------|
81
  | `OPENCLAW_PASSWORD` | Recommended | Password for the Control UI (default: `huggingclaw`) | `my-secret-password` |
82
  | `HF_TOKEN` | **Required** | HF Access Token with write permission ([create one](https://huggingface.co/settings/tokens)) | `hf_AbCdEfGhIjKlMnOpQrStUvWxYz` |
83
+ | `OPENCLAW_DATASET_REPO` | See below | Dataset repo for backup β€” format: `username/repo-name`. Required in manual mode; optional in auto mode (see [Data Persistence](#data-persistence)) | `tao-shen/HuggingClaw-data` |
84
  | `OPENAI_API_KEY` | Recommended | OpenAI (or any [OpenAI-compatible](https://openclawdoc.com/docs/reference/environment-variables)) API key | `sk-proj-xxxxxxxxxxxx` |
85
  | `OPENROUTER_API_KEY` | Optional | [OpenRouter](https://openrouter.ai) API key (200+ models, free tier available) | `sk-or-v1-xxxxxxxxxxxx` |
86
  | `ANTHROPIC_API_KEY` | Optional | Anthropic Claude API key | `sk-ant-xxxxxxxxxxxx` |
87
  | `GOOGLE_API_KEY` | Optional | Google / Gemini API key | `AIzaSyXxXxXxXxXx` |
88
+ | `OPENCLAW_DEFAULT_MODEL` | Optional | Default model for new conversations | `openai/gpt-oss-20b:free` |
89
+
90
+ ### Data Persistence
91
+
92
+ HuggingClaw syncs `~/.openclaw` (conversations, settings, credentials) to a private HuggingFace Dataset repo so data survives restarts. There are two ways to set this up:
93
+
94
+ **Option A β€” Manual mode (default, recommended)**
95
+
96
+ 1. Go to [huggingface.co/new-dataset](https://huggingface.co/new-dataset) and create a **private** Dataset repo (e.g. `your-name/HuggingClaw-data`)
97
+ 2. Set `OPENCLAW_DATASET_REPO` = `your-name/HuggingClaw-data` in your Space secrets
98
+ 3. Set `HF_TOKEN` with write permission
99
+ 4. Done β€” HuggingClaw will sync to this repo every 60 seconds
100
+
101
+ **Option B β€” Auto mode**
102
+
103
+ 1. Set `AUTO_CREATE_DATASET` = `true` in your Space secrets
104
+ 2. Set `HF_TOKEN` with write permission
105
+ 3. (Optional) Set `OPENCLAW_DATASET_REPO` if you want a custom repo name
106
+ 4. On first startup, HuggingClaw automatically creates a **private** Dataset repo. If `OPENCLAW_DATASET_REPO` is not set, it derives the name from your HF token username: `your-username/HuggingClaw-data`
107
+
108
+ > **Security note:** `AUTO_CREATE_DATASET` defaults to `false` β€” the system will not create repos on your behalf unless you explicitly opt in.
109
 
110
  ### Environment Variables
111
 
112
+ Fine-tune persistence and performance. Set these as **Repository Secrets** in HF Spaces, or in `.env` for local Docker.
113
 
114
  | Variable | Default | Description |
115
  |----------|---------|-------------|
116
+ | `AUTO_CREATE_DATASET` | `false` | **Auto-create the Dataset repo.** Default is `false` for security. Set to `true` to let HuggingClaw automatically create a **private** Dataset repo on first startup (and auto-derive the repo name from your `HF_TOKEN` if `OPENCLAW_DATASET_REPO` is not set). Accepted values: `true`, `1`, `yes` / `false`, `0`, `no`. |
117
+ | `SYNC_INTERVAL` | `60` | **Backup interval in seconds.** How often HuggingClaw syncs `~/.openclaw` to the Dataset repo. Lower = safer but more API calls. Recommended: `60`–`300`. |
118
+ | `NODE_MEMORY_LIMIT` | `512` | **Node.js heap memory limit in MB.** HF free tier provides 16 GB RAM; 512 MB is enough for most cases. Increase for complex agent workflows. |
119
+ | `TZ` | `UTC` | **Timezone** for log timestamps. Example: `Asia/Shanghai`, `America/New_York`. |
120
 
121
+ > For the full list (including `OPENAI_BASE_URL`, `OLLAMA_HOST`, proxy settings, etc.), see [`.env.example`](.env.example).
122
 
123
  ### 3. Open the Control UI
124
 
scripts/sync_hf.py CHANGED
@@ -49,7 +49,6 @@ class TeeLogger:
49
 
50
  # ── Configuration ───────────────────────────────────────────────────────────
51
 
52
- HF_REPO_ID = os.environ.get("OPENCLAW_DATASET_REPO", "")
53
  HF_TOKEN = os.environ.get("HF_TOKEN")
54
  OPENCLAW_HOME = Path.home() / ".openclaw"
55
  APP_DIR = Path("/app/openclaw")
@@ -79,6 +78,19 @@ SPACE_ID = os.environ.get("SPACE_ID", "") # e.g. "tao-shen/HuggingClaw"
79
  SYNC_INTERVAL = int(os.environ.get("SYNC_INTERVAL", "60"))
80
  AUTO_CREATE_DATASET = os.environ.get("AUTO_CREATE_DATASET", "false").lower() in ("true", "1", "yes")
81
 
 
 
 
 
 
 
 
 
 
 
 
 
 
82
  # Setup logging
83
  log_dir = OPENCLAW_HOME / "workspace"
84
  log_dir.mkdir(parents=True, exist_ok=True)
@@ -99,7 +111,9 @@ class OpenClawFullSync:
99
  print("[SYNC] WARNING: HF_TOKEN not set. Persistence disabled.")
100
  return
101
  if not HF_REPO_ID:
102
- print("[SYNC] INFO: OPENCLAW_DATASET_REPO not set. Persistence disabled.")
 
 
103
  return
104
 
105
  self.enabled = True
 
49
 
50
  # ── Configuration ───────────────────────────────────────────────────────────
51
 
 
52
  HF_TOKEN = os.environ.get("HF_TOKEN")
53
  OPENCLAW_HOME = Path.home() / ".openclaw"
54
  APP_DIR = Path("/app/openclaw")
 
78
  SYNC_INTERVAL = int(os.environ.get("SYNC_INTERVAL", "60"))
79
  AUTO_CREATE_DATASET = os.environ.get("AUTO_CREATE_DATASET", "false").lower() in ("true", "1", "yes")
80
 
81
+ # Dataset repo: user-specified, or auto-derived from HF_TOKEN username
82
+ HF_REPO_ID = os.environ.get("OPENCLAW_DATASET_REPO", "")
83
+ if not HF_REPO_ID and AUTO_CREATE_DATASET and HF_TOKEN:
84
+ try:
85
+ _api = HfApi(token=HF_TOKEN)
86
+ _username = _api.whoami()["name"]
87
+ HF_REPO_ID = f"{_username}/HuggingClaw-data"
88
+ print(f"[SYNC] OPENCLAW_DATASET_REPO not set β€” auto-derived: {HF_REPO_ID}")
89
+ del _api, _username
90
+ except Exception as e:
91
+ print(f"[SYNC] WARNING: Could not derive username from HF_TOKEN: {e}")
92
+ HF_REPO_ID = ""
93
+
94
  # Setup logging
95
  log_dir = OPENCLAW_HOME / "workspace"
96
  log_dir.mkdir(parents=True, exist_ok=True)
 
111
  print("[SYNC] WARNING: HF_TOKEN not set. Persistence disabled.")
112
  return
113
  if not HF_REPO_ID:
114
+ print("[SYNC] INFO: OPENCLAW_DATASET_REPO not set and AUTO_CREATE_DATASET is disabled.")
115
+ print("[SYNC] β†’ Set OPENCLAW_DATASET_REPO, or set AUTO_CREATE_DATASET=true to auto-create.")
116
+ print("[SYNC] Persistence disabled.")
117
  return
118
 
119
  self.enabled = True