File size: 14,360 Bytes
14dfaa4
 
 
 
 
 
 
 
0b075c6
 
 
 
 
 
 
 
 
 
 
 
 
 
14dfaa4
 
534b16f
 
c13c2a7
5d8d23e
534b16f
5d8d23e
534b16f
 
 
 
5d8d23e
534b16f
5d8d23e
534b16f
5d8d23e
534b16f
5d8d23e
534b16f
5d8d23e
534b16f
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
5d8d23e
534b16f
5d8d23e
534b16f
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
0b075c6
534b16f
0b075c6
534b16f
0b075c6
534b16f
0b075c6
534b16f
0b075c6
 
 
 
 
 
 
534b16f
0b075c6
534b16f
0b075c6
534b16f
0b075c6
534b16f
0b075c6
5d8d23e
0b075c6
5d8d23e
0b075c6
5d8d23e
534b16f
 
 
 
 
 
 
 
 
 
 
 
5d8d23e
 
 
534b16f
 
 
 
 
 
 
 
5d8d23e
 
534b16f
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
0ff29e3
534b16f
 
 
 
 
 
 
 
 
5d8d23e
534b16f
5d8d23e
534b16f
5d8d23e
534b16f
 
 
5d8d23e
534b16f
5d8d23e
534b16f
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
0ff29e3
534b16f
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
0b075c6
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
534b16f
 
 
 
 
0b075c6
534b16f
 
 
 
 
 
 
 
 
 
 
 
 
 
5d8d23e
534b16f
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
---
title: HuggingFlow
emoji: 🦌
colorFrom: green
colorTo: blue
sdk: docker
app_port: 7860
pinned: false
license: mit
secrets:
  - name: LLM_MODEL
    description: "Model in provider/model-name format β€” e.g. openai/gpt-4o, anthropic/claude-sonnet-4-5, google/gemini-2.5-flash"
  - name: LLM_API_KEY
    description: API key for the chosen LLM provider.
  - name: HF_TOKEN
    description: Hugging Face token (write access) β€” enables thread backup/restore to a private HF Dataset.
  - name: SERPER_API_KEY
    description: "Serper API key for real Google Search results (recommended). Free tier: 2,500 queries/month."
  - name: AUTH_JWT_SECRET
    description: "JWT signing secret β€” keeps sessions alive across restarts. Generate: openssl rand -base64 32"
  - name: CLOUDFLARE_WORKERS_TOKEN
    description: "Cloudflare API token β€” auto-creates an outbound proxy Worker and a keep-awake cron Worker."
---

<div align="center">

# 🦌 HuggingFlow

**[DeerFlow](https://github.com/bytedance/deer-flow) research agent β€” one-click deploy on Hugging Face Spaces**

[![HF Space](https://img.shields.io/badge/πŸ€—%20Hugging%20Face-Space-yellow)](https://huggingface.co/spaces/somratpro/HuggingFlow)
[![GitHub](https://img.shields.io/badge/GitHub-somratpro%2FHuggingFlow-181717?logo=github)](https://github.com/somratpro/HuggingFlow)
[![License](https://img.shields.io/badge/license-MIT-blue)](LICENSE)
[![Docker](https://img.shields.io/badge/docker-single--container-2496ED?logo=docker)](Dockerfile)

*Self-hosted deep-research AI Β· multi-provider LLM Β· streaming SSE Β· dataset backup*

</div>

---

## Table of Contents

- [What is HuggingFlow?](#what-is-huggingflow)
- [Features](#features)
- [Quick Start](#quick-start)
- [Configuration](#configuration)
  - [Required Secrets](#required-secrets)
  - [Optional Variables](#optional-variables)
- [LLM Providers](#llm-providers)
- [Search Tools](#search-tools)
- [Cloudflare Proxy](#cloudflare-proxy)
- [Data Backup](#data-backup)
- [Stay Alive (Keep-Awake)](#stay-alive-keep-awake)
- [Architecture](#architecture)
- [Local Development](#local-development)
- [Troubleshooting](#troubleshooting)
- [More Projects](#more-projects)
- [Contributing](#contributing)
- [License](#license)

---

## What is HuggingFlow?

HuggingFlow wraps [DeerFlow](https://github.com/bytedance/deer-flow) (ByteDance's open-source deep-research agent) into a single Docker container that runs natively on [Hugging Face Spaces](https://huggingface.co/spaces).

**Zero infra.** Duplicate the Space, add your API keys, done β€” your own private research agent is live.

DeerFlow conducts multi-step research: it queries search engines, fetches web pages, synthesises findings across sources, and produces structured reports β€” all driven by the LLM you choose.

---

## Features

- πŸš€ **One-click deploy** β€” duplicate the HF Space, add secrets, done
- 🧠 **Multi-provider LLM** β€” OpenAI, Anthropic, Google Gemini, DeepSeek, Groq, Mistral, xAI, OpenRouter, Qwen, Moonshot, any OpenAI-compatible endpoint
- πŸ” **Pluggable search** β€” Serper (Google), Tavily, or DuckDuckGo (no key needed)
- πŸ’Ύ **Dataset backup** β€” threads auto-sync to a private HF Dataset; restored on restart
- 🌐 **Cloudflare outbound proxy** β€” route backend traffic through a Cloudflare Worker (beats HF Spaces IP blocks on some APIs)
- ⏰ **Keep-Awake cron** β€” Cloudflare Worker pings `/health` on a schedule to prevent cold starts
- πŸ“Š **Live dashboard** β€” status page at `/` with service health, model, search, backup and keep-awake tiles
- πŸ”’ **Auth built-in** β€” DeerFlow v2 JWT auth; create admin at `/setup` on first boot
- ⚑ **Pre-built images** β€” no source compilation; pulls official GHCR images for sub-5-minute builds
- πŸ“‘ **Streaming SSE** β€” real-time agent output streamed to the browser

---

## Quick Start

### Step 1 β€” Duplicate this Space

[![Duplicate this Space](https://huggingface.co/datasets/huggingface/badges/resolve/main/duplicate-this-space-xl.svg)](https://huggingface.co/spaces/somratpro/HuggingFlow?duplicate=true)

### Step 2 β€” Add required secrets

In your new Space β†’ **Settings β†’ Variables and Secrets**, add at minimum:

| Secret | Description |
|--------|-------------|
| `LLM_MODEL` | Model in `provider/model-name` format β€” e.g. `openai/gpt-4o` |
| `LLM_API_KEY` | API key for the chosen provider |

> [!TIP]
> Add `HF_TOKEN` (a token with write access to your account) to enable thread backup persistence. Without it, all research threads are lost on restart.

### Step 3 β€” Wait for build

First build pulls pre-built GHCR images β€” takes ~5 minutes. Subsequent restarts are instant (no rebuild).

### Step 4 β€” Create your admin account

Visit `https://<your-space>.hf.space/setup` β†’ create username + password.

### Step 5 β€” Start researching

Open `/workspace` β€” you're live πŸŽ‰

---

## Configuration

### Required Secrets

| Secret | Description |
|--------|-------------|
| `LLM_MODEL` | Model in `provider/model-name` format β€” see [LLM Providers](#llm-providers) |
| `LLM_API_KEY` | API key for the chosen provider |

### Optional Variables

| Variable | Default | Description |
|----------|---------|-------------|
| `SERPER_API_KEY` | β€” | Google Search via Serper β€” strongly recommended over DuckDuckGo |
| `TAVILY_API_KEY` | β€” | Alternative web search (used if Serper not set) |
| `JINA_API_KEY` | β€” | Better web page fetching via Jina AI |
| `AUTH_JWT_SECRET` | auto-generated | JWT signing secret β€” set this to keep sessions alive across restarts |
| `HF_TOKEN` | β€” | Your HF token β€” enables dataset backup/restore |
| `BACKUP_DATASET_NAME` | `huggingflow-backup` | HF dataset repo name for backups (created automatically) |
| `CUSTOM_BASE_URL` | β€” | OpenAI-compatible API base URL for any custom/self-hosted provider |
| `SYNC_INTERVAL` | `600` | Seconds between HF Dataset backup syncs |
| `BACKEND_READY_TIMEOUT` | `120` | Seconds to wait for backend startup |
| `FRONTEND_READY_TIMEOUT` | `120` | Seconds to wait for frontend startup |
| `CLOUDFLARE_WORKERS_TOKEN` | β€” | Cloudflare API token β€” enables outbound proxy + keep-awake cron |
| `CLOUDFLARE_PROXY_URL` | β€” | Existing Cloudflare Worker URL (skip auto-setup) |

---

## LLM Providers

Set `LLM_MODEL` to `provider/model-name`:

| Provider | Example `LLM_MODEL` | Notes |
|----------|---------------------|-------|
| **OpenAI** | `openai/gpt-4o` | Default provider |
| **Anthropic** | `anthropic/claude-sonnet-4-5` | Extended thinking supported |
| **Google Gemini** | `google/gemini-2.5-flash` | Extended thinking supported |
| **DeepSeek** | `deepseek/deepseek-chat` | Extended thinking supported |
| **Groq** | `groq/llama-3.3-70b-versatile` | Fast inference |
| **Mistral** | `mistral/mistral-large-latest` | |
| **xAI / Grok** | `xai/grok-3-beta` | |
| **OpenRouter** | `openrouter/anthropic/claude-3-5-sonnet` | Access 200+ models |
| **Qwen / Alibaba** | `qwen/qwen-max` | DashScope compatible |
| **Moonshot / Kimi** | `moonshot/moonshot-v1-128k` | |
| **Custom OpenAI-compat** | `openai/your-model` + `CUSTOM_BASE_URL` | Any self-hosted endpoint |

> **Tip:** Models with extended thinking (Anthropic, Gemini, DeepSeek) produce higher-quality research plans but use more tokens.

---

## Search Tools

DeerFlow uses web search as its primary information source. Configure in priority order:

| Tool | Key | Quality | Cost |
|------|-----|---------|------|
| **Serper** | `SERPER_API_KEY` | ⭐⭐⭐ (real Google) | ~$0.001/query |
| **Tavily** | `TAVILY_API_KEY` | ⭐⭐ | free tier available |
| **DuckDuckGo** | none needed | ⭐ | free, rate-limited |

Serper is strongly recommended for research quality. Sign up at [serper.dev](https://serper.dev) β€” 2,500 free queries/month.

---

## Cloudflare Proxy

HF Spaces shares IPs that some APIs block. The Cloudflare outbound proxy routes backend HTTP requests through a Cloudflare Worker, giving you a clean egress IP.

**Setup:**

1. Get a Cloudflare API token with **Workers Edit** permission
2. Set `CLOUDFLARE_WORKERS_TOKEN` in your Space secrets
3. On next start, `cloudflare-proxy-setup.py` auto-creates the Worker and sets `CLOUDFLARE_PROXY_URL`

Or manually provide `CLOUDFLARE_PROXY_URL` if you have an existing Worker.

---

## Data Backup

By default threads are stored in SQLite inside the container β€” **lost on restart**.

Enable persistent backup with HF Datasets:

1. Set `HF_TOKEN` to a token with **Write** access to your profile
2. Optionally set `BACKUP_DATASET_NAME` (default: `huggingflow-backup`)
3. The dataset is created automatically (private) on first sync

**What's backed up:** SQLite database (threads, messages, uploads index), workspace files.

**Sync schedule:** every `SYNC_INTERVAL` seconds (default 10 min) + on graceful shutdown + on startup (restore).

---

## Stay Alive (Keep-Awake)

Free HF Spaces pause after ~15 minutes of inactivity. Fix it with a Cloudflare Worker cron:

1. Set `CLOUDFLARE_WORKERS_TOKEN` (same token as proxy setup)
2. `cloudflare-keepalive-setup.py` creates a Worker that pings `/health` every 10 minutes
3. Status shown in the dashboard **Keep Awake** tile

Check `KEEPALIVE_STATUS_FILE` (`/tmp/huggingflow-cloudflare-keepalive-status.json`) for current state.

---

## Architecture

```
Browser
  β”‚
  β–Ό  :7860
health-server.js  ──── /          β†’ status dashboard (HTML)
  β”‚               ──── /health    β†’ JSON health check
  β”‚               ──── /status    β†’ JSON full status
  β”‚               ──── /*         β†’ proxy to nginx
  β”‚
  β–Ό  :7861
nginx
  β”‚  /api/langgraph/*  β†’ rewrite β†’ /api/*  β†’ backend :8001
  β”‚  /api/*            β†’                   β†’ backend :8001
  β”‚  /health           β†’                   β†’ backend :8001/health
  β”‚  /docs /redoc      β†’                   β†’ backend :8001
  β”‚  /*                β†’                   β†’ frontend :3000
  β”‚
  β”œβ”€β–Ά :8001  FastAPI (uvicorn)  β€” DeerFlow gateway, agents, auth, SQLite
  └─▢ :3000  Next.js            β€” DeerFlow UI (server-side rendered)
```

**Port map:**

| Port | Service | Exposed |
|------|---------|---------|
| 7860 | health-server.js | βœ… public (HF Spaces) |
| 7861 | nginx | internal only |
| 8001 | FastAPI backend | internal only |
| 3000 | Next.js frontend | internal only |

**Images used:**

- `ghcr.io/bytedance/deer-flow-backend:latest` β€” pre-built Python backend + `.venv`
- `ghcr.io/bytedance/deer-flow-frontend:latest` β€” pre-built Next.js + `node_modules`
- No source compilation β€” build time ~5 min instead of 30+ min

---

## Local Development

```bash
git clone https://github.com/somratpro/HuggingFlow
cd HuggingFlow

# Build
docker build -t huggingflow .

# Run (set your own keys)
docker run -p 7860:7860 \
  -e LLM_MODEL=openai/gpt-4o \
  -e LLM_API_KEY=sk-... \
  -e SERPER_API_KEY=... \
  huggingflow
```

Open `http://localhost:7860` for the dashboard, `http://localhost:7860/setup` to create your admin account, then `http://localhost:7860/workspace`.

**Useful routes:**

| Route | Description |
|-------|-------------|
| `/` | Status dashboard |
| `/workspace` | DeerFlow research UI |
| `/setup` | Admin account creation (first boot only) |
| `/api/health` | Backend health (JSON) |
| `/docs` | Swagger API reference |
| `/redoc` | ReDoc API reference |

---

## Troubleshooting

**"Application error" on `/workspace` or `/setup`**
> The pre-built frontend requires `DEER_FLOW_TRUSTED_ORIGINS` to be set explicitly. `start.sh` handles this automatically. If you see this error in a custom setup, ensure the env var is set before starting Next.js.

**Build takes 30+ minutes / OOMKilled**
> Ensure Docker has β‰₯4 GB RAM. HuggingFlow uses pre-built images specifically to avoid compilation. If you're rebuilding from source, add `NODE_OPTIONS=--max-old-space-size=3072`.

**DuckDuckGo returning no results**
> DuckDuckGo rate-limits aggressively from shared IPs. Set `SERPER_API_KEY` or `TAVILY_API_KEY`.

**Threads lost after restart**
> Set `HF_TOKEN` and `BACKUP_DATASET_NAME` to enable dataset sync. Without it, storage is ephemeral.

**Space goes to sleep**
> Set `CLOUDFLARE_WORKERS_TOKEN` to enable the keep-awake cron. Alternatively, upgrade to a paid HF Space tier.

**Backend health shows `not_authenticated`**
> Normal β€” DeerFlow v2 protects all `/api/*` routes. The public health endpoint is `/health` (no auth). nginx routes `/health` β†’ `backend:8001/health`.

---

## More Projects

Similar projects by [@somratpro](https://github.com/somratpro) β€” all free, one-click deploy on HF Spaces:

| Project | What it runs | HF Space | GitHub |
|---------|-------------|----------|--------|
| **HuggingClip** | Paperclip β€” AI agent orchestration | [Space](https://huggingface.co/spaces/somratpro/HuggingClip) | [Repo](https://github.com/somratpro/HuggingClip) |
| **HuggingClaw** | OpenClaw β€” Claude Code in the browser | [Space](https://huggingface.co/spaces/somratpro/HuggingClaw) | [Repo](https://github.com/somratpro/HuggingClaw) |
| **HuggingMes** | Hermes β€” self-hosted agent gateway | [Space](https://huggingface.co/spaces/somratpro/HuggingMes) | [Repo](https://github.com/somratpro/HuggingMes) |
| **Hugging8n** | n8n β€” workflow & automation platform | [Space](https://huggingface.co/spaces/somratpro/Hugging8n) | [Repo](https://github.com/somratpro/Hugging8n) |
| **HuggingPost** | Postiz β€” social media scheduler | [Space](https://huggingface.co/spaces/somratpro/HuggingPost) | [Repo](https://github.com/somratpro/HuggingPost) |

---

## ❀️ Support

If HuggingFlow saves you time, consider buying me a coffee to keep the projects alive!

**USDT (TRC-20 / TRON network only)**

```
TELx8TJz1W1h7n6SgpgGNNGZXpJCEUZrdB
```

> [!WARNING]
> Send **USDT on TRC-20 network only**. Sending other tokens or using a different network will result in permanent loss.

---

## Contributing

Contributions welcome! See [CONTRIBUTING.md](CONTRIBUTING.md) for guidelines.

```
Fork β†’ branch β†’ commit β†’ PR
```

---

## License

MIT β€” see [LICENSE](LICENSE).

DeerFlow is Β© ByteDance, licensed under MIT.

---

<div align="center">
  <sub>Built with ❀️ by <a href="https://github.com/somratpro">somratpro</a> · Powered by <a href="https://github.com/bytedance/deer-flow">DeerFlow</a></sub>
</div>