| --- |
| title: Bee Intelligence Engine |
| emoji: π |
| colorFrom: yellow |
| colorTo: gray |
| sdk: docker |
| app_port: 7860 |
| pinned: true |
| license: apache-2.0 |
| short_description: The Intelligence Engine β domain LoRA adapters |
| --- |
| |
| # Bee β The Intelligence Engine |
|
|
| **Trust-critical AI for regulated and mission-critical systems.** |
| Built by [CUI Labs](https://www.cuilabs.io) on the XIIS platform. |
|
|
| Last verified: 2026-05-05. |
|
|
| --- |
|
|
| ## What's actually running today |
|
|
| | Surface | State | Source-of-truth | |
| |---|---|---| |
| | Bee Cell inference (production) | Live on **Modal** serverless (`bee-cell-prod`) β replaces the legacy HF Space `cuilabs-bee.hf.space`. Frontend talks to it via `BEE_API_URL` env on Vercel. | [infra/modal/bee_app.py](infra/modal/bee_app.py) | |
| | Web app | `bee.cuilabs.io` on Vercel | [apps/web](apps/web) | |
| | Mobile app | React Native CLI 0.85.2 (no Expo, no EAS) β Stage 0 release scaffolding. Backend pointer in Settings. | [apps/mobile/README.md](apps/mobile/README.md) | |
| | Desktop app | Tauri 2.10 shell pointing at `bee.cuilabs.io`. Source scaffold landed 2026-04-30; signed releases gated on cert/Apple-Dev enrollment. | [apps/desktop/README.md](apps/desktop/README.md) | |
| | Bee Security Eval Harness | 52 cases / 10 categories. Latest baseline on Bee Cell base: **12.5 / 100** (gates Stage 1 APK). | [eval/bee_security_harness/README.md](eval/bee_security_harness/README.md) | |
| | Stage 0 safety wrapper | Runtime preamble + refusal substrate around every chat completion. | [bee/safety_wrapper.py](bee/safety_wrapper.py) | |
| | Cybersec adapter training | Stage 0.5 Comb run on **Vertex AI L4** (one-time exception β Comb usually rides Kaggle). | [workers/vertex-train/README.md](workers/vertex-train/README.md) | |
| | Cell + Cell+ training | Kaggle T4Γ2 GPU pool, push-only dispatcher (commit `3edb643`). | [workers/kaggle-online-train/README.md](workers/kaggle-online-train/README.md) | |
| | Cron pipeline | 15 Vercel cron routes β kaggle-dispatch, kaggle-tpu-dispatch, eval-run, cve-ingest, kev-ingest, distillation, online-training, evolution-cycle, community-pull, github-trending, hf-dispatch, heartbeat, memory-extract, interactions-export, research-correct. | [apps/web/src/app/api/cron/](apps/web/src/app/api/cron/) | |
|
|
| --- |
|
|
| ## Benchmarks |
|
|
| Reproducible eval on the base model (no LoRA adapter applied). Run via `python -m bee.eval_harness` β every task and pass criterion is in [bee/eval_harness.py](bee/eval_harness.py), every output is captured in `data/eval_reports/*.json`. |
|
|
| ``` |
| Model: HuggingFaceTB/SmolLM2-360M-Instruct (361.8M params) |
| Device: MPS (Apple Silicon, fp16) |
| Date: 2026-04-29 |
| Wall: 25.9s for all 5 benchmarks |
| βββββββββββββββββββββββββββββββββββββββββββββββββββββ |
| coding 100% (10/10) avg latency 2033 ms |
| reasoning 40% (4/10) avg latency 146 ms |
| instruct 50% (5/10) avg latency 167 ms |
| grounded 80% (4/5) avg latency 116 ms |
| domain 100% (5/5) avg latency 381 ms |
| βββββββββββββββββββββββββββββββββββββββββββββββββββββ |
| OVERALL 74% |
| ``` |
|
|
| **How to read these numbers:** |
| - `coding 100%` is a **shape check** (function name + `return` keyword present), not a correctness test. A real correctness benchmark would score lower. |
| - `reasoning 40%` and `instruct 50%` are honest signal β at 360M base, multi-step math and exact-format compliance are hard. |
| - A few `instruct` / `grounded` failures are pattern-match strictness in the harness (e.g. answer is right but contains an extra word). The raw output for every task is in [data/eval_reports/2026-04-29_smollm2-360m_mps.json](data/eval_reports/2026-04-29_smollm2-360m_mps.json) so you can audit. |
|
|
| Reproduce locally: |
|
|
| ```bash |
| python -m bee.eval_harness --model HuggingFaceTB/SmolLM2-360M-Instruct --device mps \ |
| --output data/eval_reports/my_run.json |
| ``` |
|
|
| Per-domain LoRA adapters at [`cuilabs/bee-cell`](https://huggingface.co/cuilabs/bee-cell) are evaluated separately on domain-specific tasks; numbers land in this README only after a training run produces them. |
|
|
| ### Bee Security Eval Harness β first real baseline |
|
|
| Bee's security capability is measured against an in-house gate, not a generic benchmark. Source-of-truth for the cases is [eval/bee_security_harness/cases/*.yaml](eval/bee_security_harness/cases/) (52 cases across 10 categories: insecure-code generation, prompt injection, agent tool abuse, tenant isolation, authz/authn failures, cloud IAM, dependency CVEs, secret leakage, unsafe cyber responses, hallucinated security claims). |
|
|
| ``` |
| Surface: Bee Cell base (no cybersec adapter applied) |
| Backend: Modal bee-cell-prod |
| Date: 2026-05-03 |
| Score: 12.5 / 100 (release gate is >= 80 with zero blocking failures) |
| ``` |
|
|
| 12.5 is the honest pre-adapter floor and is the reason Stage 0.5 cybersec adapter training is currently running on Vertex L4. The Stage 1 APK release is gated on a re-run of this harness against the post-adapter Modal endpoint. Run logic and case-loader: [apps/web/src/app/api/cron/eval-run/route.ts](apps/web/src/app/api/cron/eval-run/route.ts), summary table `eval_runs`, per-case results `eval_run_results`. |
|
|
| --- |
|
|
| ## Quick Start |
|
|
| ```bash |
| # 1. Create environment |
| python3 -m venv .venv |
| source .venv/bin/activate |
| pip install torch transformers accelerate peft datasets trl \ |
| sentencepiece protobuf numpy fastapi uvicorn pydantic httpx \ |
| python-dotenv qiskit sentence-transformers faiss-cpu websockets |
| |
| # 2. Copy environment config |
| cp .env.example .env |
| # Edit .env with your API keys (optional β Bee works without them) |
| |
| # 3. Run the eval harness (verifies install + reproduces the numbers above) |
| python -m bee.eval_harness --device mps |
| |
| # 4. Start the server |
| python -m bee.server |
| |
| # 5. Start the full daemon (server + evolution + distillation) |
| python -m bee |
| ``` |
|
|
| --- |
|
|
| ## API (OpenAI-compatible) |
|
|
| ```bash |
| # Chat |
| curl -X POST http://localhost:8000/v1/chat/completions \ |
| -H "Content-Type: application/json" \ |
| -d '{"messages":[{"role":"user","content":"Hello"}],"max_tokens":100}' |
| |
| # Health |
| curl http://localhost:8000/health |
| |
| # Router stats |
| curl http://localhost:8000/v1/router/stats |
| |
| # Switch domain |
| curl -X POST http://localhost:8000/v1/domain/switch \ |
| -H "Content-Type: application/json" \ |
| -d '{"domain":"cybersecurity"}' |
| ``` |
|
|
| Tier-1 domains (10): `general`, `programming`, `ai`, `cybersecurity`, `quantum`, `fintech`, `blockchain`, `infrastructure`, `research`, `business`. Source: [bee/domains.py](bee/domains.py). |
|
|
| --- |
|
|
| ## Architecture |
|
|
| ``` |
| bee/ |
| server.py FastAPI server, OpenAI-compatible API, adaptive routing |
| safety_wrapper.py Stage 0 runtime safety preamble + refusal substrate |
| adaptive_router.py Difficulty estimation, self-verification, context memory |
| distillation.py Teacher-student distillation (Claude/GPT-4 -> Bee) |
| evolution.py Autonomous algorithm evolution |
| invention_engine.py Invents novel attention, compression, SSM modules |
| self_coding.py Code generation + sandboxed execution |
| self_heal.py Training health monitoring, auto-recovery |
| community.py Share inventions between Bee instances (HuggingFace Hub) |
| quantum_reasoning.py Quantum-enhanced decision making (IBM Quantum / local sim) |
| quantum_ibm.py IBM Quantum Platform integration (156-qubit Heron r2) |
| quantum_sim.py Local quantum statevector simulation |
| retrieval.py RAG pipeline (FAISS + sentence-transformers) |
| lora_adapter.py Domain LoRA adapter management |
| nn_compression.py VQ-VAE hierarchical neural compression |
| memory.py Hierarchical compressive memory |
| moe.py Sparse mixture of experts |
| state_space.py Selective state space model |
| daemon.py Autonomous daemon (background evolution, distillation) |
| ignition.py Full BeeAGI architecture activation (research-only, |
| BEE_IGNITE=0 in production) |
| benchmark.py 10-test benchmark suite |
| eval_harness.py General-capability harness (the SmolLM2 numbers above) |
| config.py Model configuration |
| modeling_bee.py Custom BeeForCausalLM |
| |
| apps/web/ Next.js customer web app deployed to Vercel |
| apps/mobile/ React Native CLI 0.85.2 native iOS+Android |
| apps/desktop/ Tauri 2.10 native shell (macOS/Windows/Linux) |
| sdks/python/ Official Python client (bee-sdk) |
| |
| eval/bee_security_harness/ |
| 52-case security gate (10 categories, regex grader DSL) |
| |
| infra/modal/ Production inference deployment (bee-cell-prod) |
| infra/hf-space/ Deprecated; retained for community model-card hosting |
| infra/db/ Postgres migrations (eval_runs, training_runs, etc.) |
| infra/supabase/ Supabase project config |
| |
| workers/ |
| kaggle-online-train/ T4Γ2 GPU runner β cell, cell+, comb (when forced) |
| kaggle-tpu-train/ TPU v6e-8 runner β every-step debug logging |
| vertex-train/ L4 / A100 β reserved for tiers Kaggle can't host |
| (Hive, Swarm, Enclave, Ignite) |
| colab-online-train/ Manual paste-test workflow on Colab T4 |
| lightning-train/ Inactive β manual launcher, not wired to a cron |
| |
| packages/ auth, billing, core, db, email, pqc, qnsp-client, |
| rag, telemetry, training, ui β TypeScript workspace |
| scripts/ Distillation, deploys, dataset prep, ops |
| docs/ Architecture, API reference, runbooks |
| ``` |
|
|
| ## Repository Layout |
|
|
| The approved source of truth for the monorepo layout lives in `docs/architecture/repository.md`. |
|
|
| Current migration truth: |
|
|
| - `apps/web` is the canonical frontend path. |
| - `apps/mobile` is the canonical mobile app path (React Native CLI, no Expo). |
| - `apps/desktop` is the canonical desktop app path (Tauri 2.10). |
| - `bee/` remains rooted at the repository top level and is the canonical backend package. |
| - `infra/modal/bee_app.py` is the production inference entrypoint. The root `Dockerfile` is retained for parity with the historical HF Space image and for ad-hoc Docker runs. |
|
|
| ## Deployment Topology |
|
|
| - GitHub hosts the monorepo source of truth. |
| - Vercel serves the web app from `apps/web` at `https://bee.cuilabs.io`. |
| - Namecheap manages DNS for `bee.cuilabs.io` and (eventually) `api.bee.cuilabs.io`. |
| - **Modal** serves the backend inference API as `bee-cell-prod`. The frontend points at it via the `BEE_API_URL` env on Vercel; default URL pattern is `https://cuilabs--bee-cell-prod-fastapi-app.modal.run` ([infra/modal/bee_app.py](infra/modal/bee_app.py)). |
| - The legacy Hugging Face Space (`cuilabs-bee.hf.space`) is deprecated. It is no longer the production backend; HF org artifacts are retained for community model-card and dataset hosting only ([infra/hf-space/README.md](infra/hf-space/README.md)). |
| - Large datasets, checkpoints, and adapters live on Hugging Face Hub (`cuilabs/bee-cell`, `cuilabs/bee-cell-plus`, `cuilabs/bee-comb`, `cuilabs/bee-interactions`), not in the frontend deployment payload. |
|
|
| ## How It Works |
|
|
| 1. **Adaptive Router** β Routes easy queries locally (free), hard queries to teacher API |
| 2. **Self-Verification** β Scores every output, re-generates if quality is low |
| 3. **Context Memory** β Compresses past conversations for infinite memory |
| 4. **Teacher Distillation** β Uses Claude/GPT-4 to generate expert training data |
| 5. **LoRA Training** β Domain-specific adapters trained on free Colab/Kaggle GPUs |
| 6. **Evolution** β Autonomously invents better algorithms |
| 7. **Community** β Shares validated inventions between all Bee instances |
| 8. **Quantum** β IBM Quantum hardware or local simulation for decision optimization |
|
|
| **Design goal**, not a measured steady-state: route easy queries locally (free), expensive ones to a teacher model, capture every teacher response as training data, and shrink the teacher-call ratio over time as Bee's domain adapters improve. Actual local-vs-teacher split and cost-per-query are emitted live by `/v1/router/stats` β that endpoint is the source of truth, not this README. |
|
|
| ## Hardware |
|
|
| | Tier | Base model | Params | RAM (fp16) | Throughput | |
| |---|---|---|---|---| |
| | `cell` (default) | SmolLM2-360M-Instruct | 361.8M | ~0.7 GB | **89 tok/s** on Apple Silicon MPS (fp16, greedy) | |
| | `cell-plus`, `comb`, `comb-team`, `hive` | see [bee/tiers.py](bee/tiers.py) | 1.7Bβ32B | scales with tier | not yet benchmarked locally | |
|
|
| The `89 tok/s` number is from [data/eval_reports/2026-04-29_throughput_mps.json](data/eval_reports/2026-04-29_throughput_mps.json) β 5 prompts Γ ~100 tokens each, measured today. Larger tiers' throughput numbers will land in this table once a real measurement is taken on the target hardware; we don't quote estimates. |
|
|
| Runs on: macOS (MPS), Linux (CUDA), any CPU (slow). Production traffic is served by Modal's L4-class containers ([infra/modal/bee_app.py](infra/modal/bee_app.py)) with a persistent `bee-cache` volume so cold starts don't re-pull SmolLM2-360M. |
|
|
| ## Environment Variables |
|
|
| See `.env.example` for all options. Key ones: |
|
|
| ```bash |
| BEE_DEVICE=mps # auto, mps, cuda, cpu |
| BEE_MODEL_PATH=HuggingFaceTB/SmolLM2-360M-Instruct |
| BEE_TEACHER_API_KEY= # Anthropic or OpenAI key (optional) |
| IBM_QUANTUM_API_KEY= # IBM Quantum (optional) |
| BEE_API_URL= # Set on Vercel + mobile + SDK to point |
| # at the Modal production backend. |
| # Default in code is the legacy HF Space |
| # for backward-compat only. |
| BEE_IGNITE=0 # Keep 0 for production. The Ignite |
| # research-AGI substrate is gated by |
| # this flag; see bee/ignition.py. |
| ``` |
|
|
| ## License |
|
|
| MIT |
|
|