File size: 13,992 Bytes
5e21013
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
---
title: Bee Intelligence Engine
emoji: 🐝
colorFrom: yellow
colorTo: gray
sdk: docker
app_port: 7860
pinned: true
license: apache-2.0
short_description: The Intelligence Engine β€” domain LoRA adapters
---

# Bee β€” The Intelligence Engine

**Trust-critical AI for regulated and mission-critical systems.**
Built by [CUI Labs](https://www.cuilabs.io) on the XIIS platform.

Last verified: 2026-05-05.

---

## What's actually running today

| Surface | State | Source-of-truth |
|---|---|---|
| Bee Cell inference (production) | Live on **Modal** serverless (`bee-cell-prod`) β€” replaces the legacy HF Space `cuilabs-bee.hf.space`. Frontend talks to it via `BEE_API_URL` env on Vercel. | [infra/modal/bee_app.py](infra/modal/bee_app.py) |
| Web app | `bee.cuilabs.io` on Vercel | [apps/web](apps/web) |
| Mobile app | React Native CLI 0.85.2 (no Expo, no EAS) β€” Stage 0 release scaffolding. Backend pointer in Settings. | [apps/mobile/README.md](apps/mobile/README.md) |
| Desktop app | Tauri 2.10 shell pointing at `bee.cuilabs.io`. Source scaffold landed 2026-04-30; signed releases gated on cert/Apple-Dev enrollment. | [apps/desktop/README.md](apps/desktop/README.md) |
| Bee Security Eval Harness | 52 cases / 10 categories. Latest baseline on Bee Cell base: **12.5 / 100** (gates Stage 1 APK). | [eval/bee_security_harness/README.md](eval/bee_security_harness/README.md) |
| Stage 0 safety wrapper | Runtime preamble + refusal substrate around every chat completion. | [bee/safety_wrapper.py](bee/safety_wrapper.py) |
| Cybersec adapter training | Stage 0.5 Comb run on **Vertex AI L4** (one-time exception β€” Comb usually rides Kaggle). | [workers/vertex-train/README.md](workers/vertex-train/README.md) |
| Cell + Cell+ training | Kaggle T4Γ—2 GPU pool, push-only dispatcher (commit `3edb643`). | [workers/kaggle-online-train/README.md](workers/kaggle-online-train/README.md) |
| Cron pipeline | 15 Vercel cron routes β€” kaggle-dispatch, kaggle-tpu-dispatch, eval-run, cve-ingest, kev-ingest, distillation, online-training, evolution-cycle, community-pull, github-trending, hf-dispatch, heartbeat, memory-extract, interactions-export, research-correct. | [apps/web/src/app/api/cron/](apps/web/src/app/api/cron/) |

---

## Benchmarks

Reproducible eval on the base model (no LoRA adapter applied). Run via `python -m bee.eval_harness` β€” every task and pass criterion is in [bee/eval_harness.py](bee/eval_harness.py), every output is captured in `data/eval_reports/*.json`.

```
  Model:    HuggingFaceTB/SmolLM2-360M-Instruct (361.8M params)
  Device:   MPS (Apple Silicon, fp16)
  Date:     2026-04-29
  Wall:     25.9s for all 5 benchmarks
  ─────────────────────────────────────────────────────
  coding         100%  (10/10)   avg latency  2033 ms
  reasoning       40%  (4/10)    avg latency   146 ms
  instruct        50%  (5/10)    avg latency   167 ms
  grounded        80%  (4/5)     avg latency   116 ms
  domain         100%  (5/5)     avg latency   381 ms
  ─────────────────────────────────────────────────────
  OVERALL         74%
```

**How to read these numbers:**
- `coding 100%` is a **shape check** (function name + `return` keyword present), not a correctness test. A real correctness benchmark would score lower.
- `reasoning 40%` and `instruct 50%` are honest signal β€” at 360M base, multi-step math and exact-format compliance are hard.
- A few `instruct` / `grounded` failures are pattern-match strictness in the harness (e.g. answer is right but contains an extra word). The raw output for every task is in [data/eval_reports/2026-04-29_smollm2-360m_mps.json](data/eval_reports/2026-04-29_smollm2-360m_mps.json) so you can audit.

Reproduce locally:

```bash
python -m bee.eval_harness --model HuggingFaceTB/SmolLM2-360M-Instruct --device mps \
  --output data/eval_reports/my_run.json
```

Per-domain LoRA adapters at [`cuilabs/bee-cell`](https://huggingface.co/cuilabs/bee-cell) are evaluated separately on domain-specific tasks; numbers land in this README only after a training run produces them.

### Bee Security Eval Harness β€” first real baseline

Bee's security capability is measured against an in-house gate, not a generic benchmark. Source-of-truth for the cases is [eval/bee_security_harness/cases/*.yaml](eval/bee_security_harness/cases/) (52 cases across 10 categories: insecure-code generation, prompt injection, agent tool abuse, tenant isolation, authz/authn failures, cloud IAM, dependency CVEs, secret leakage, unsafe cyber responses, hallucinated security claims).

```
  Surface:   Bee Cell base (no cybersec adapter applied)
  Backend:   Modal bee-cell-prod
  Date:      2026-05-03
  Score:     12.5 / 100   (release gate is >= 80 with zero blocking failures)
```

12.5 is the honest pre-adapter floor and is the reason Stage 0.5 cybersec adapter training is currently running on Vertex L4. The Stage 1 APK release is gated on a re-run of this harness against the post-adapter Modal endpoint. Run logic and case-loader: [apps/web/src/app/api/cron/eval-run/route.ts](apps/web/src/app/api/cron/eval-run/route.ts), summary table `eval_runs`, per-case results `eval_run_results`.

---

## Quick Start

```bash
# 1. Create environment
python3 -m venv .venv
source .venv/bin/activate
pip install torch transformers accelerate peft datasets trl \
  sentencepiece protobuf numpy fastapi uvicorn pydantic httpx \
  python-dotenv qiskit sentence-transformers faiss-cpu websockets

# 2. Copy environment config
cp .env.example .env
# Edit .env with your API keys (optional β€” Bee works without them)

# 3. Run the eval harness (verifies install + reproduces the numbers above)
python -m bee.eval_harness --device mps

# 4. Start the server
python -m bee.server

# 5. Start the full daemon (server + evolution + distillation)
python -m bee
```

---

## API (OpenAI-compatible)

```bash
# Chat
curl -X POST http://localhost:8000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{"messages":[{"role":"user","content":"Hello"}],"max_tokens":100}'

# Health
curl http://localhost:8000/health

# Router stats
curl http://localhost:8000/v1/router/stats

# Switch domain
curl -X POST http://localhost:8000/v1/domain/switch \
  -H "Content-Type: application/json" \
  -d '{"domain":"cybersecurity"}'
```

Tier-1 domains (10): `general`, `programming`, `ai`, `cybersecurity`, `quantum`, `fintech`, `blockchain`, `infrastructure`, `research`, `business`. Source: [bee/domains.py](bee/domains.py).

---

## Architecture

```
bee/
  server.py            FastAPI server, OpenAI-compatible API, adaptive routing
  safety_wrapper.py    Stage 0 runtime safety preamble + refusal substrate
  adaptive_router.py   Difficulty estimation, self-verification, context memory
  distillation.py      Teacher-student distillation (Claude/GPT-4 -> Bee)
  evolution.py         Autonomous algorithm evolution
  invention_engine.py  Invents novel attention, compression, SSM modules
  self_coding.py       Code generation + sandboxed execution
  self_heal.py         Training health monitoring, auto-recovery
  community.py         Share inventions between Bee instances (HuggingFace Hub)
  quantum_reasoning.py Quantum-enhanced decision making (IBM Quantum / local sim)
  quantum_ibm.py       IBM Quantum Platform integration (156-qubit Heron r2)
  quantum_sim.py       Local quantum statevector simulation
  retrieval.py         RAG pipeline (FAISS + sentence-transformers)
  lora_adapter.py      Domain LoRA adapter management
  nn_compression.py    VQ-VAE hierarchical neural compression
  memory.py            Hierarchical compressive memory
  moe.py               Sparse mixture of experts
  state_space.py       Selective state space model
  daemon.py            Autonomous daemon (background evolution, distillation)
  ignition.py          Full BeeAGI architecture activation (research-only,
                       BEE_IGNITE=0 in production)
  benchmark.py         10-test benchmark suite
  eval_harness.py      General-capability harness (the SmolLM2 numbers above)
  config.py            Model configuration
  modeling_bee.py      Custom BeeForCausalLM

apps/web/              Next.js customer web app deployed to Vercel
apps/mobile/           React Native CLI 0.85.2 native iOS+Android
apps/desktop/          Tauri 2.10 native shell (macOS/Windows/Linux)
sdks/python/           Official Python client (bee-sdk)

eval/bee_security_harness/
                       52-case security gate (10 categories, regex grader DSL)

infra/modal/           Production inference deployment (bee-cell-prod)
infra/hf-space/        Deprecated; retained for community model-card hosting
infra/db/              Postgres migrations (eval_runs, training_runs, etc.)
infra/supabase/        Supabase project config

workers/
  kaggle-online-train/ T4Γ—2 GPU runner β€” cell, cell+, comb (when forced)
  kaggle-tpu-train/    TPU v6e-8 runner β€” every-step debug logging
  vertex-train/        L4 / A100 β€” reserved for tiers Kaggle can't host
                       (Hive, Swarm, Enclave, Ignite)
  colab-online-train/  Manual paste-test workflow on Colab T4
  lightning-train/     Inactive β€” manual launcher, not wired to a cron

packages/              auth, billing, core, db, email, pqc, qnsp-client,
                       rag, telemetry, training, ui β€” TypeScript workspace
scripts/               Distillation, deploys, dataset prep, ops
docs/                  Architecture, API reference, runbooks
```

## Repository Layout

The approved source of truth for the monorepo layout lives in `docs/architecture/repository.md`.

Current migration truth:

- `apps/web` is the canonical frontend path.
- `apps/mobile` is the canonical mobile app path (React Native CLI, no Expo).
- `apps/desktop` is the canonical desktop app path (Tauri 2.10).
- `bee/` remains rooted at the repository top level and is the canonical backend package.
- `infra/modal/bee_app.py` is the production inference entrypoint. The root `Dockerfile` is retained for parity with the historical HF Space image and for ad-hoc Docker runs.

## Deployment Topology

- GitHub hosts the monorepo source of truth.
- Vercel serves the web app from `apps/web` at `https://bee.cuilabs.io`.
- Namecheap manages DNS for `bee.cuilabs.io` and (eventually) `api.bee.cuilabs.io`.
- **Modal** serves the backend inference API as `bee-cell-prod`. The frontend points at it via the `BEE_API_URL` env on Vercel; default URL pattern is `https://cuilabs--bee-cell-prod-fastapi-app.modal.run` ([infra/modal/bee_app.py](infra/modal/bee_app.py)).
- The legacy Hugging Face Space (`cuilabs-bee.hf.space`) is deprecated. It is no longer the production backend; HF org artifacts are retained for community model-card and dataset hosting only ([infra/hf-space/README.md](infra/hf-space/README.md)).
- Large datasets, checkpoints, and adapters live on Hugging Face Hub (`cuilabs/bee-cell`, `cuilabs/bee-cell-plus`, `cuilabs/bee-comb`, `cuilabs/bee-interactions`), not in the frontend deployment payload.

## How It Works

1. **Adaptive Router** β€” Routes easy queries locally (free), hard queries to teacher API
2. **Self-Verification** β€” Scores every output, re-generates if quality is low
3. **Context Memory** β€” Compresses past conversations for infinite memory
4. **Teacher Distillation** β€” Uses Claude/GPT-4 to generate expert training data
5. **LoRA Training** β€” Domain-specific adapters trained on free Colab/Kaggle GPUs
6. **Evolution** β€” Autonomously invents better algorithms
7. **Community** β€” Shares validated inventions between all Bee instances
8. **Quantum** β€” IBM Quantum hardware or local simulation for decision optimization

**Design goal**, not a measured steady-state: route easy queries locally (free), expensive ones to a teacher model, capture every teacher response as training data, and shrink the teacher-call ratio over time as Bee's domain adapters improve. Actual local-vs-teacher split and cost-per-query are emitted live by `/v1/router/stats` β€” that endpoint is the source of truth, not this README.

## Hardware

| Tier | Base model | Params | RAM (fp16) | Throughput |
|---|---|---|---|---|
| `cell` (default) | SmolLM2-360M-Instruct | 361.8M | ~0.7 GB | **89 tok/s** on Apple Silicon MPS (fp16, greedy) |
| `cell-plus`, `comb`, `comb-team`, `hive` | see [bee/tiers.py](bee/tiers.py) | 1.7B–32B | scales with tier | not yet benchmarked locally |

The `89 tok/s` number is from [data/eval_reports/2026-04-29_throughput_mps.json](data/eval_reports/2026-04-29_throughput_mps.json) β€” 5 prompts Γ— ~100 tokens each, measured today. Larger tiers' throughput numbers will land in this table once a real measurement is taken on the target hardware; we don't quote estimates.

Runs on: macOS (MPS), Linux (CUDA), any CPU (slow). Production traffic is served by Modal's L4-class containers ([infra/modal/bee_app.py](infra/modal/bee_app.py)) with a persistent `bee-cache` volume so cold starts don't re-pull SmolLM2-360M.

## Environment Variables

See `.env.example` for all options. Key ones:

```bash
BEE_DEVICE=mps                    # auto, mps, cuda, cpu
BEE_MODEL_PATH=HuggingFaceTB/SmolLM2-360M-Instruct
BEE_TEACHER_API_KEY=              # Anthropic or OpenAI key (optional)
IBM_QUANTUM_API_KEY=              # IBM Quantum (optional)
BEE_API_URL=                      # Set on Vercel + mobile + SDK to point
                                  # at the Modal production backend.
                                  # Default in code is the legacy HF Space
                                  # for backward-compat only.
BEE_IGNITE=0                      # Keep 0 for production. The Ignite
                                  # research-AGI substrate is gated by
                                  # this flag; see bee/ignition.py.
```

## License

MIT