File size: 13,992 Bytes
5e21013 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 | ---
title: Bee Intelligence Engine
emoji: π
colorFrom: yellow
colorTo: gray
sdk: docker
app_port: 7860
pinned: true
license: apache-2.0
short_description: The Intelligence Engine β domain LoRA adapters
---
# Bee β The Intelligence Engine
**Trust-critical AI for regulated and mission-critical systems.**
Built by [CUI Labs](https://www.cuilabs.io) on the XIIS platform.
Last verified: 2026-05-05.
---
## What's actually running today
| Surface | State | Source-of-truth |
|---|---|---|
| Bee Cell inference (production) | Live on **Modal** serverless (`bee-cell-prod`) β replaces the legacy HF Space `cuilabs-bee.hf.space`. Frontend talks to it via `BEE_API_URL` env on Vercel. | [infra/modal/bee_app.py](infra/modal/bee_app.py) |
| Web app | `bee.cuilabs.io` on Vercel | [apps/web](apps/web) |
| Mobile app | React Native CLI 0.85.2 (no Expo, no EAS) β Stage 0 release scaffolding. Backend pointer in Settings. | [apps/mobile/README.md](apps/mobile/README.md) |
| Desktop app | Tauri 2.10 shell pointing at `bee.cuilabs.io`. Source scaffold landed 2026-04-30; signed releases gated on cert/Apple-Dev enrollment. | [apps/desktop/README.md](apps/desktop/README.md) |
| Bee Security Eval Harness | 52 cases / 10 categories. Latest baseline on Bee Cell base: **12.5 / 100** (gates Stage 1 APK). | [eval/bee_security_harness/README.md](eval/bee_security_harness/README.md) |
| Stage 0 safety wrapper | Runtime preamble + refusal substrate around every chat completion. | [bee/safety_wrapper.py](bee/safety_wrapper.py) |
| Cybersec adapter training | Stage 0.5 Comb run on **Vertex AI L4** (one-time exception β Comb usually rides Kaggle). | [workers/vertex-train/README.md](workers/vertex-train/README.md) |
| Cell + Cell+ training | Kaggle T4Γ2 GPU pool, push-only dispatcher (commit `3edb643`). | [workers/kaggle-online-train/README.md](workers/kaggle-online-train/README.md) |
| Cron pipeline | 15 Vercel cron routes β kaggle-dispatch, kaggle-tpu-dispatch, eval-run, cve-ingest, kev-ingest, distillation, online-training, evolution-cycle, community-pull, github-trending, hf-dispatch, heartbeat, memory-extract, interactions-export, research-correct. | [apps/web/src/app/api/cron/](apps/web/src/app/api/cron/) |
---
## Benchmarks
Reproducible eval on the base model (no LoRA adapter applied). Run via `python -m bee.eval_harness` β every task and pass criterion is in [bee/eval_harness.py](bee/eval_harness.py), every output is captured in `data/eval_reports/*.json`.
```
Model: HuggingFaceTB/SmolLM2-360M-Instruct (361.8M params)
Device: MPS (Apple Silicon, fp16)
Date: 2026-04-29
Wall: 25.9s for all 5 benchmarks
βββββββββββββββββββββββββββββββββββββββββββββββββββββ
coding 100% (10/10) avg latency 2033 ms
reasoning 40% (4/10) avg latency 146 ms
instruct 50% (5/10) avg latency 167 ms
grounded 80% (4/5) avg latency 116 ms
domain 100% (5/5) avg latency 381 ms
βββββββββββββββββββββββββββββββββββββββββββββββββββββ
OVERALL 74%
```
**How to read these numbers:**
- `coding 100%` is a **shape check** (function name + `return` keyword present), not a correctness test. A real correctness benchmark would score lower.
- `reasoning 40%` and `instruct 50%` are honest signal β at 360M base, multi-step math and exact-format compliance are hard.
- A few `instruct` / `grounded` failures are pattern-match strictness in the harness (e.g. answer is right but contains an extra word). The raw output for every task is in [data/eval_reports/2026-04-29_smollm2-360m_mps.json](data/eval_reports/2026-04-29_smollm2-360m_mps.json) so you can audit.
Reproduce locally:
```bash
python -m bee.eval_harness --model HuggingFaceTB/SmolLM2-360M-Instruct --device mps \
--output data/eval_reports/my_run.json
```
Per-domain LoRA adapters at [`cuilabs/bee-cell`](https://huggingface.co/cuilabs/bee-cell) are evaluated separately on domain-specific tasks; numbers land in this README only after a training run produces them.
### Bee Security Eval Harness β first real baseline
Bee's security capability is measured against an in-house gate, not a generic benchmark. Source-of-truth for the cases is [eval/bee_security_harness/cases/*.yaml](eval/bee_security_harness/cases/) (52 cases across 10 categories: insecure-code generation, prompt injection, agent tool abuse, tenant isolation, authz/authn failures, cloud IAM, dependency CVEs, secret leakage, unsafe cyber responses, hallucinated security claims).
```
Surface: Bee Cell base (no cybersec adapter applied)
Backend: Modal bee-cell-prod
Date: 2026-05-03
Score: 12.5 / 100 (release gate is >= 80 with zero blocking failures)
```
12.5 is the honest pre-adapter floor and is the reason Stage 0.5 cybersec adapter training is currently running on Vertex L4. The Stage 1 APK release is gated on a re-run of this harness against the post-adapter Modal endpoint. Run logic and case-loader: [apps/web/src/app/api/cron/eval-run/route.ts](apps/web/src/app/api/cron/eval-run/route.ts), summary table `eval_runs`, per-case results `eval_run_results`.
---
## Quick Start
```bash
# 1. Create environment
python3 -m venv .venv
source .venv/bin/activate
pip install torch transformers accelerate peft datasets trl \
sentencepiece protobuf numpy fastapi uvicorn pydantic httpx \
python-dotenv qiskit sentence-transformers faiss-cpu websockets
# 2. Copy environment config
cp .env.example .env
# Edit .env with your API keys (optional β Bee works without them)
# 3. Run the eval harness (verifies install + reproduces the numbers above)
python -m bee.eval_harness --device mps
# 4. Start the server
python -m bee.server
# 5. Start the full daemon (server + evolution + distillation)
python -m bee
```
---
## API (OpenAI-compatible)
```bash
# Chat
curl -X POST http://localhost:8000/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{"messages":[{"role":"user","content":"Hello"}],"max_tokens":100}'
# Health
curl http://localhost:8000/health
# Router stats
curl http://localhost:8000/v1/router/stats
# Switch domain
curl -X POST http://localhost:8000/v1/domain/switch \
-H "Content-Type: application/json" \
-d '{"domain":"cybersecurity"}'
```
Tier-1 domains (10): `general`, `programming`, `ai`, `cybersecurity`, `quantum`, `fintech`, `blockchain`, `infrastructure`, `research`, `business`. Source: [bee/domains.py](bee/domains.py).
---
## Architecture
```
bee/
server.py FastAPI server, OpenAI-compatible API, adaptive routing
safety_wrapper.py Stage 0 runtime safety preamble + refusal substrate
adaptive_router.py Difficulty estimation, self-verification, context memory
distillation.py Teacher-student distillation (Claude/GPT-4 -> Bee)
evolution.py Autonomous algorithm evolution
invention_engine.py Invents novel attention, compression, SSM modules
self_coding.py Code generation + sandboxed execution
self_heal.py Training health monitoring, auto-recovery
community.py Share inventions between Bee instances (HuggingFace Hub)
quantum_reasoning.py Quantum-enhanced decision making (IBM Quantum / local sim)
quantum_ibm.py IBM Quantum Platform integration (156-qubit Heron r2)
quantum_sim.py Local quantum statevector simulation
retrieval.py RAG pipeline (FAISS + sentence-transformers)
lora_adapter.py Domain LoRA adapter management
nn_compression.py VQ-VAE hierarchical neural compression
memory.py Hierarchical compressive memory
moe.py Sparse mixture of experts
state_space.py Selective state space model
daemon.py Autonomous daemon (background evolution, distillation)
ignition.py Full BeeAGI architecture activation (research-only,
BEE_IGNITE=0 in production)
benchmark.py 10-test benchmark suite
eval_harness.py General-capability harness (the SmolLM2 numbers above)
config.py Model configuration
modeling_bee.py Custom BeeForCausalLM
apps/web/ Next.js customer web app deployed to Vercel
apps/mobile/ React Native CLI 0.85.2 native iOS+Android
apps/desktop/ Tauri 2.10 native shell (macOS/Windows/Linux)
sdks/python/ Official Python client (bee-sdk)
eval/bee_security_harness/
52-case security gate (10 categories, regex grader DSL)
infra/modal/ Production inference deployment (bee-cell-prod)
infra/hf-space/ Deprecated; retained for community model-card hosting
infra/db/ Postgres migrations (eval_runs, training_runs, etc.)
infra/supabase/ Supabase project config
workers/
kaggle-online-train/ T4Γ2 GPU runner β cell, cell+, comb (when forced)
kaggle-tpu-train/ TPU v6e-8 runner β every-step debug logging
vertex-train/ L4 / A100 β reserved for tiers Kaggle can't host
(Hive, Swarm, Enclave, Ignite)
colab-online-train/ Manual paste-test workflow on Colab T4
lightning-train/ Inactive β manual launcher, not wired to a cron
packages/ auth, billing, core, db, email, pqc, qnsp-client,
rag, telemetry, training, ui β TypeScript workspace
scripts/ Distillation, deploys, dataset prep, ops
docs/ Architecture, API reference, runbooks
```
## Repository Layout
The approved source of truth for the monorepo layout lives in `docs/architecture/repository.md`.
Current migration truth:
- `apps/web` is the canonical frontend path.
- `apps/mobile` is the canonical mobile app path (React Native CLI, no Expo).
- `apps/desktop` is the canonical desktop app path (Tauri 2.10).
- `bee/` remains rooted at the repository top level and is the canonical backend package.
- `infra/modal/bee_app.py` is the production inference entrypoint. The root `Dockerfile` is retained for parity with the historical HF Space image and for ad-hoc Docker runs.
## Deployment Topology
- GitHub hosts the monorepo source of truth.
- Vercel serves the web app from `apps/web` at `https://bee.cuilabs.io`.
- Namecheap manages DNS for `bee.cuilabs.io` and (eventually) `api.bee.cuilabs.io`.
- **Modal** serves the backend inference API as `bee-cell-prod`. The frontend points at it via the `BEE_API_URL` env on Vercel; default URL pattern is `https://cuilabs--bee-cell-prod-fastapi-app.modal.run` ([infra/modal/bee_app.py](infra/modal/bee_app.py)).
- The legacy Hugging Face Space (`cuilabs-bee.hf.space`) is deprecated. It is no longer the production backend; HF org artifacts are retained for community model-card and dataset hosting only ([infra/hf-space/README.md](infra/hf-space/README.md)).
- Large datasets, checkpoints, and adapters live on Hugging Face Hub (`cuilabs/bee-cell`, `cuilabs/bee-cell-plus`, `cuilabs/bee-comb`, `cuilabs/bee-interactions`), not in the frontend deployment payload.
## How It Works
1. **Adaptive Router** β Routes easy queries locally (free), hard queries to teacher API
2. **Self-Verification** β Scores every output, re-generates if quality is low
3. **Context Memory** β Compresses past conversations for infinite memory
4. **Teacher Distillation** β Uses Claude/GPT-4 to generate expert training data
5. **LoRA Training** β Domain-specific adapters trained on free Colab/Kaggle GPUs
6. **Evolution** β Autonomously invents better algorithms
7. **Community** β Shares validated inventions between all Bee instances
8. **Quantum** β IBM Quantum hardware or local simulation for decision optimization
**Design goal**, not a measured steady-state: route easy queries locally (free), expensive ones to a teacher model, capture every teacher response as training data, and shrink the teacher-call ratio over time as Bee's domain adapters improve. Actual local-vs-teacher split and cost-per-query are emitted live by `/v1/router/stats` β that endpoint is the source of truth, not this README.
## Hardware
| Tier | Base model | Params | RAM (fp16) | Throughput |
|---|---|---|---|---|
| `cell` (default) | SmolLM2-360M-Instruct | 361.8M | ~0.7 GB | **89 tok/s** on Apple Silicon MPS (fp16, greedy) |
| `cell-plus`, `comb`, `comb-team`, `hive` | see [bee/tiers.py](bee/tiers.py) | 1.7Bβ32B | scales with tier | not yet benchmarked locally |
The `89 tok/s` number is from [data/eval_reports/2026-04-29_throughput_mps.json](data/eval_reports/2026-04-29_throughput_mps.json) β 5 prompts Γ ~100 tokens each, measured today. Larger tiers' throughput numbers will land in this table once a real measurement is taken on the target hardware; we don't quote estimates.
Runs on: macOS (MPS), Linux (CUDA), any CPU (slow). Production traffic is served by Modal's L4-class containers ([infra/modal/bee_app.py](infra/modal/bee_app.py)) with a persistent `bee-cache` volume so cold starts don't re-pull SmolLM2-360M.
## Environment Variables
See `.env.example` for all options. Key ones:
```bash
BEE_DEVICE=mps # auto, mps, cuda, cpu
BEE_MODEL_PATH=HuggingFaceTB/SmolLM2-360M-Instruct
BEE_TEACHER_API_KEY= # Anthropic or OpenAI key (optional)
IBM_QUANTUM_API_KEY= # IBM Quantum (optional)
BEE_API_URL= # Set on Vercel + mobile + SDK to point
# at the Modal production backend.
# Default in code is the legacy HF Space
# for backward-compat only.
BEE_IGNITE=0 # Keep 0 for production. The Ignite
# research-AGI substrate is gated by
# this flag; see bee/ignition.py.
```
## License
MIT
|