Spaces:

cuilabs
/

bee

Paused

App Files Files Community

bee / README.md

Bee Deploy

HF Space backend deploy [de0cba5]

5e21013 5 days ago

preview code

raw

history blame contribute delete

14 kB

	---
	title: Bee Intelligence Engine
	emoji: 🐝
	colorFrom: yellow
	colorTo: gray
	sdk: docker
	app_port: 7860
	pinned: true
	license: apache-2.0
	short_description: The Intelligence Engine — domain LoRA adapters
	---

	# Bee — The Intelligence Engine

	Trust-critical AI for regulated and mission-critical systems.
	Built by [CUI Labs](https://www.cuilabs.io) on the XIIS platform.

	Last verified: 2026-05-05.

	---

	## What's actually running today

	\| Surface \| State \| Source-of-truth \|
	\|---\|---\|---\|
	\| Bee Cell inference (production) \| Live on Modal serverless (`bee-cell-prod`) — replaces the legacy HF Space `cuilabs-bee.hf.space`. Frontend talks to it via `BEE_API_URL` env on Vercel. \| [infra/modal/bee_app.py](infra/modal/bee_app.py) \|
	\| Web app \| `bee.cuilabs.io` on Vercel \| [apps/web](apps/web) \|
	\| Mobile app \| React Native CLI 0.85.2 (no Expo, no EAS) — Stage 0 release scaffolding. Backend pointer in Settings. \| [apps/mobile/README.md](apps/mobile/README.md) \|
	\| Desktop app \| Tauri 2.10 shell pointing at `bee.cuilabs.io`. Source scaffold landed 2026-04-30; signed releases gated on cert/Apple-Dev enrollment. \| [apps/desktop/README.md](apps/desktop/README.md) \|
	\| Bee Security Eval Harness \| 52 cases / 10 categories. Latest baseline on Bee Cell base: 12.5 / 100 (gates Stage 1 APK). \| [eval/bee_security_harness/README.md](eval/bee_security_harness/README.md) \|
	\| Stage 0 safety wrapper \| Runtime preamble + refusal substrate around every chat completion. \| [bee/safety_wrapper.py](bee/safety_wrapper.py) \|
	\| Cybersec adapter training \| Stage 0.5 Comb run on Vertex AI L4 (one-time exception — Comb usually rides Kaggle). \| [workers/vertex-train/README.md](workers/vertex-train/README.md) \|
	\| Cell + Cell+ training \| Kaggle T4×2 GPU pool, push-only dispatcher (commit `3edb643`). \| [workers/kaggle-online-train/README.md](workers/kaggle-online-train/README.md) \|
	\| Cron pipeline \| 15 Vercel cron routes — kaggle-dispatch, kaggle-tpu-dispatch, eval-run, cve-ingest, kev-ingest, distillation, online-training, evolution-cycle, community-pull, github-trending, hf-dispatch, heartbeat, memory-extract, interactions-export, research-correct. \| [apps/web/src/app/api/cron/](apps/web/src/app/api/cron/) \|

	---

	## Benchmarks

	Reproducible eval on the base model (no LoRA adapter applied). Run via `python -m bee.eval_harness` — every task and pass criterion is in [bee/eval_harness.py](bee/eval_harness.py), every output is captured in `data/eval_reports/*.json`.

	```
	Model: HuggingFaceTB/SmolLM2-360M-Instruct (361.8M params)
	Device: MPS (Apple Silicon, fp16)
	Date: 2026-04-29
	Wall: 25.9s for all 5 benchmarks
	─────────────────────────────────────────────────────
	coding 100% (10/10) avg latency 2033 ms
	reasoning 40% (4/10) avg latency 146 ms
	instruct 50% (5/10) avg latency 167 ms
	grounded 80% (4/5) avg latency 116 ms
	domain 100% (5/5) avg latency 381 ms
	─────────────────────────────────────────────────────
	OVERALL 74%
	```

	How to read these numbers:
	- `coding 100%` is a shape check (function name + `return` keyword present), not a correctness test. A real correctness benchmark would score lower.
	- `reasoning 40%` and `instruct 50%` are honest signal — at 360M base, multi-step math and exact-format compliance are hard.
	- A few `instruct` / `grounded` failures are pattern-match strictness in the harness (e.g. answer is right but contains an extra word). The raw output for every task is in [data/eval_reports/2026-04-29_smollm2-360m_mps.json](data/eval_reports/2026-04-29_smollm2-360m_mps.json) so you can audit.

	Reproduce locally:

	```bash
	python -m bee.eval_harness --model HuggingFaceTB/SmolLM2-360M-Instruct --device mps \
	--output data/eval_reports/my_run.json
	```

	Per-domain LoRA adapters at [`cuilabs/bee-cell`](https://huggingface.co/cuilabs/bee-cell) are evaluated separately on domain-specific tasks; numbers land in this README only after a training run produces them.

	### Bee Security Eval Harness — first real baseline

	Bee's security capability is measured against an in-house gate, not a generic benchmark. Source-of-truth for the cases is [eval/bee_security_harness/cases/*.yaml](eval/bee_security_harness/cases/) (52 cases across 10 categories: insecure-code generation, prompt injection, agent tool abuse, tenant isolation, authz/authn failures, cloud IAM, dependency CVEs, secret leakage, unsafe cyber responses, hallucinated security claims).

	```
	Surface: Bee Cell base (no cybersec adapter applied)
	Backend: Modal bee-cell-prod
	Date: 2026-05-03
	Score: 12.5 / 100 (release gate is >= 80 with zero blocking failures)
	```

	12.5 is the honest pre-adapter floor and is the reason Stage 0.5 cybersec adapter training is currently running on Vertex L4. The Stage 1 APK release is gated on a re-run of this harness against the post-adapter Modal endpoint. Run logic and case-loader: [apps/web/src/app/api/cron/eval-run/route.ts](apps/web/src/app/api/cron/eval-run/route.ts), summary table `eval_runs`, per-case results `eval_run_results`.

	---

	## Quick Start

	```bash
	# 1. Create environment
	python3 -m venv .venv
	source .venv/bin/activate
	pip install torch transformers accelerate peft datasets trl \
	sentencepiece protobuf numpy fastapi uvicorn pydantic httpx \
	python-dotenv qiskit sentence-transformers faiss-cpu websockets

	# 2. Copy environment config
	cp .env.example .env
	# Edit .env with your API keys (optional — Bee works without them)

	# 3. Run the eval harness (verifies install + reproduces the numbers above)
	python -m bee.eval_harness --device mps

	# 4. Start the server
	python -m bee.server

	# 5. Start the full daemon (server + evolution + distillation)
	python -m bee
	```

	---

	## API (OpenAI-compatible)

	```bash
	# Chat
	curl -X POST http://localhost:8000/v1/chat/completions \
	-H "Content-Type: application/json" \
	-d '{"messages":[{"role":"user","content":"Hello"}],"max_tokens":100}'

	# Health
	curl http://localhost:8000/health

	# Router stats
	curl http://localhost:8000/v1/router/stats

	# Switch domain
	curl -X POST http://localhost:8000/v1/domain/switch \
	-H "Content-Type: application/json" \
	-d '{"domain":"cybersecurity"}'
	```

	Tier-1 domains (10): `general`, `programming`, `ai`, `cybersecurity`, `quantum`, `fintech`, `blockchain`, `infrastructure`, `research`, `business`. Source: [bee/domains.py](bee/domains.py).

	---

	## Architecture

	```
	bee/
	server.py FastAPI server, OpenAI-compatible API, adaptive routing
	safety_wrapper.py Stage 0 runtime safety preamble + refusal substrate
	adaptive_router.py Difficulty estimation, self-verification, context memory
	distillation.py Teacher-student distillation (Claude/GPT-4 -> Bee)
	evolution.py Autonomous algorithm evolution
	invention_engine.py Invents novel attention, compression, SSM modules
	self_coding.py Code generation + sandboxed execution
	self_heal.py Training health monitoring, auto-recovery
	community.py Share inventions between Bee instances (HuggingFace Hub)
	quantum_reasoning.py Quantum-enhanced decision making (IBM Quantum / local sim)
	quantum_ibm.py IBM Quantum Platform integration (156-qubit Heron r2)
	quantum_sim.py Local quantum statevector simulation
	retrieval.py RAG pipeline (FAISS + sentence-transformers)
	lora_adapter.py Domain LoRA adapter management
	nn_compression.py VQ-VAE hierarchical neural compression
	memory.py Hierarchical compressive memory
	moe.py Sparse mixture of experts
	state_space.py Selective state space model
	daemon.py Autonomous daemon (background evolution, distillation)
	ignition.py Full BeeAGI architecture activation (research-only,
	BEE_IGNITE=0 in production)
	benchmark.py 10-test benchmark suite
	eval_harness.py General-capability harness (the SmolLM2 numbers above)
	config.py Model configuration
	modeling_bee.py Custom BeeForCausalLM

	apps/web/ Next.js customer web app deployed to Vercel
	apps/mobile/ React Native CLI 0.85.2 native iOS+Android
	apps/desktop/ Tauri 2.10 native shell (macOS/Windows/Linux)
	sdks/python/ Official Python client (bee-sdk)

	eval/bee_security_harness/
	52-case security gate (10 categories, regex grader DSL)

	infra/modal/ Production inference deployment (bee-cell-prod)
	infra/hf-space/ Deprecated; retained for community model-card hosting
	infra/db/ Postgres migrations (eval_runs, training_runs, etc.)
	infra/supabase/ Supabase project config

	workers/
	kaggle-online-train/ T4×2 GPU runner — cell, cell+, comb (when forced)
	kaggle-tpu-train/ TPU v6e-8 runner — every-step debug logging
	vertex-train/ L4 / A100 — reserved for tiers Kaggle can't host
	(Hive, Swarm, Enclave, Ignite)
	colab-online-train/ Manual paste-test workflow on Colab T4
	lightning-train/ Inactive — manual launcher, not wired to a cron

	packages/ auth, billing, core, db, email, pqc, qnsp-client,
	rag, telemetry, training, ui — TypeScript workspace
	scripts/ Distillation, deploys, dataset prep, ops
	docs/ Architecture, API reference, runbooks
	```

	## Repository Layout

	The approved source of truth for the monorepo layout lives in `docs/architecture/repository.md`.

	Current migration truth:

	- `apps/web` is the canonical frontend path.
	- `apps/mobile` is the canonical mobile app path (React Native CLI, no Expo).
	- `apps/desktop` is the canonical desktop app path (Tauri 2.10).
	- `bee/` remains rooted at the repository top level and is the canonical backend package.
	- `infra/modal/bee_app.py` is the production inference entrypoint. The root `Dockerfile` is retained for parity with the historical HF Space image and for ad-hoc Docker runs.

	## Deployment Topology

	- GitHub hosts the monorepo source of truth.
	- Vercel serves the web app from `apps/web` at `https://bee.cuilabs.io`.
	- Namecheap manages DNS for `bee.cuilabs.io` and (eventually) `api.bee.cuilabs.io`.
	- Modal serves the backend inference API as `bee-cell-prod`. The frontend points at it via the `BEE_API_URL` env on Vercel; default URL pattern is `https://cuilabs--bee-cell-prod-fastapi-app.modal.run` ([infra/modal/bee_app.py](infra/modal/bee_app.py)).
	- The legacy Hugging Face Space (`cuilabs-bee.hf.space`) is deprecated. It is no longer the production backend; HF org artifacts are retained for community model-card and dataset hosting only ([infra/hf-space/README.md](infra/hf-space/README.md)).
	- Large datasets, checkpoints, and adapters live on Hugging Face Hub (`cuilabs/bee-cell`, `cuilabs/bee-cell-plus`, `cuilabs/bee-comb`, `cuilabs/bee-interactions`), not in the frontend deployment payload.

	## How It Works

	1. Adaptive Router — Routes easy queries locally (free), hard queries to teacher API
	2. Self-Verification — Scores every output, re-generates if quality is low
	3. Context Memory — Compresses past conversations for infinite memory
	4. Teacher Distillation — Uses Claude/GPT-4 to generate expert training data
	5. LoRA Training — Domain-specific adapters trained on free Colab/Kaggle GPUs
	6. Evolution — Autonomously invents better algorithms
	7. Community — Shares validated inventions between all Bee instances
	8. Quantum — IBM Quantum hardware or local simulation for decision optimization

	Design goal, not a measured steady-state: route easy queries locally (free), expensive ones to a teacher model, capture every teacher response as training data, and shrink the teacher-call ratio over time as Bee's domain adapters improve. Actual local-vs-teacher split and cost-per-query are emitted live by `/v1/router/stats` — that endpoint is the source of truth, not this README.

	## Hardware

	\| Tier \| Base model \| Params \| RAM (fp16) \| Throughput \|
	\|---\|---\|---\|---\|---\|
	\| `cell` (default) \| SmolLM2-360M-Instruct \| 361.8M \| ~0.7 GB \| 89 tok/s on Apple Silicon MPS (fp16, greedy) \|
	\| `cell-plus`, `comb`, `comb-team`, `hive` \| see [bee/tiers.py](bee/tiers.py) \| 1.7B–32B \| scales with tier \| not yet benchmarked locally \|

	The `89 tok/s` number is from [data/eval_reports/2026-04-29_throughput_mps.json](data/eval_reports/2026-04-29_throughput_mps.json) — 5 prompts × ~100 tokens each, measured today. Larger tiers' throughput numbers will land in this table once a real measurement is taken on the target hardware; we don't quote estimates.

	Runs on: macOS (MPS), Linux (CUDA), any CPU (slow). Production traffic is served by Modal's L4-class containers ([infra/modal/bee_app.py](infra/modal/bee_app.py)) with a persistent `bee-cache` volume so cold starts don't re-pull SmolLM2-360M.

	## Environment Variables

	See `.env.example` for all options. Key ones:

	```bash
	BEE_DEVICE=mps # auto, mps, cuda, cpu
	BEE_MODEL_PATH=HuggingFaceTB/SmolLM2-360M-Instruct
	BEE_TEACHER_API_KEY= # Anthropic or OpenAI key (optional)
	IBM_QUANTUM_API_KEY= # IBM Quantum (optional)
	BEE_API_URL= # Set on Vercel + mobile + SDK to point
	# at the Modal production backend.
	# Default in code is the legacy HF Space
	# for backward-compat only.
	BEE_IGNITE=0 # Keep 0 for production. The Ignite
	# research-AGI substrate is gated by
	# this flag; see bee/ignition.py.
	```

	## License

	MIT