archon-dataset-sync / README.md
personalbotai
Deploy Archon Dataset Sync v2.1 with branch support\n\n- Add sync_dataset.sh with DATASET_BRANCH support\n- Add Flask monitoring dashboard (app.py)\n- Add Dockerfile for HF Space deployment\n- Add comprehensive documentation\n- Security hardening (upstream protection)\n- Auto-retry with exponential backoff\n- Health checks and graceful shutdown\n\nArchon Standard: Build for Eternity
9de9a1b
# PicoClaw Dataset Sync Daemon
**Archon v2.1 - Branch Support Enabled**
Synchronize local workspace dengan remote dataset repository (branchable) untuk NullClaw ecosystem.
## 🎯 Fitur
- **Branch Selection**: Support custom branch (default: main)
- **Auto-Retry**: 3 attempts dengan exponential backoff
- **Health Checks**: Disk space monitoring (>1GB)
- **Graceful Shutdown**: SIGTERM/SIGINT handling
- **Concurrent Protection**: State file locking
- **Security Hardening**: Prevent accidental push ke upstream
- **Structured Logging**: Timestamp + level logging ke file
- **Backup Management**: Auto-cleanup backups (>7 days)
## πŸš€ Quick Deploy ke Hugging Face Space
### Environment Variables (HF Space Settings β†’ Variables)
```bash
# Required
DATASET_REPO=https://github.com/personalbotai/picoclaw-memory.git
DATASET_BRANCH=acron-memory # atau main, develop, dll
GITHUB_TOKEN=ghp_xxxxxxxxxxxx # Untuk private repo atau rate limit
# Optional
SYNC_INTERVAL=300 # Detik (default: 300 = 5 menit)
PICOCLAW_HOME=/data # Path di HF Space (default: ~/.picoclaw)
```
### File Structure di HF Space
```
/
β”œβ”€β”€ sync_dataset.sh # Main daemon (executable)
β”œβ”€β”€ app.py # Flask monitoring UI (opsional)
β”œβ”€β”€ requirements.txt # Python dependencies
β”œβ”€β”€ README.md # Dokumentasi
└── .gitignore # Git ignore
```
### Starting the Daemon
```bash
# Make executable
chmod +x sync_dataset.sh
# Run in background (HF Space startup)
nohup ./sync_dataset.sh > /dev/null 2>&1 &
```
### Monitoring
Log file: `~/.picoclaw/sync.log`
State file: `~/.picoclaw/sync.state`
## πŸ”§ Configuration
| Variable | Default | Description |
|----------|---------|-------------|
| `DATASET_REPO` | `https://github.com/personalbotai/picoclaw-memory.git` | Git repository URL |
| `DATASET_BRANCH` | `main` | Branch untuk sync |
| `SYNC_INTERVAL` | `300` | Sync interval (seconds) |
| `MAX_RETRIES` | `3` | Max retry attempts |
| `BACKUP_RETENTION_DAYS` | `7` | Backup cleanup retention |
| `MIN_DISK_FREE_MB` | `1024` | Minimum free disk space (MB) |
| `PICOCLAW_HOME` | `~/.picoclaw` | Base directory |
| `GITHUB_TOKEN` | (empty) | GitHub token untuk auth |
## πŸ§ͺ Testing
```bash
# Dry run (check syntax)
bash -n sync_dataset.sh
# Test execution (1 cycle only)
DATASET_BRANCH=acron-memory \
PICOCLAW_HOME=/tmp/picoclaw-test \
./sync_dataset.sh
```
## πŸ“Š Log Format
```
[2025-12-28 05:46:00] [INFO] === PicoClaw Dataset Sync Daemon v2.1 ===
[2025-12-28 05:46:00] [INFO] Branch: acron-memory
[2025-12-28 05:46:02] [INFO] Initial sync completed
[2025-12-28 05:51:00] [INFO] Sync cycle completed
```
## ⚠️ Known Limitations
1. **Single-threaded**: Git operations sequential
2. **No metrics endpoint**: Butuh Prometheus? (opsional)
3. **No email alerts**: Butuh notifikasi? (opsional)
## πŸ› οΈ Development
### Build & Test
```bash
# Lint
shellcheck sync_dataset.sh
# Test dengan branch switching
DATASET_BRANCH=main ./sync_dataset.sh
```
### Branch Support
Script support branch switching otomatis:
- Clone dengan `--branch $DATASET_BRANCH`
- Checkout ke branch target jika berbeda
- Push ke branch yang sama
## πŸ“„ License
Archon Standard - Build for Eternity
---
**Archon v2.1 | NullClaw Runtime**