| # AGENTS Guidelines for BitTransformerLM | |
| ## Repository Scope and Purpose | |
| - **BitTransformerLM** models raw binary streams using reversible transformer blocks and safety telemetry. The project is the canonical implementation under WCNegentropy. | |
| - Core capabilities include bit-native modeling, telemetry metrics (negentropy, LZ complexity, symbiosis), progressive scaling, compression, context extension, diffusion mode (linear/cosine/exp noise schedules with parity correction), dashboard control, distributed training, and quantization. | |
| - Phase 1 optimizations provide configurable batch sizing, gradient accumulation, mixed-precision, memory-mapped dataset streaming, scheduled compression ramps, selective `torch.compile`, and an EMA-smoothed safety gate with burn-in. | |
| ## Environment Setup | |
| - Requires **Python 3.10+**. | |
| - Install dependencies: | |
| - CPU: `pip install --extra-index-url https://download.pytorch.org/whl/cpu -r requirements.txt` | |
| - Optional GPU: `pip install --extra-index-url https://download.pytorch.org/whl/cu118 torch==2.7.1+cu118` | |
| - The package name is `bit-transformer`; project metadata lives in `pyproject.toml`. | |
| ## Repository Layout | |
| - `bit_transformer/` – core package (`model`, `compression`, `telemetry`, `safety`, `dashboard_app`, `quantization`, etc.). | |
| - `tests/` – pytest suite and historical `TEST_RESULTS.md`. | |
| - Scripts: `example.py`, `unified_workflow.py`, `full_bits_train.py`, `build_full_bits.py`, `mcp_server.py`, `wikitext_*` utilities. The legacy `progressive_scaleup.py` is retained for reference but superseded by `integration_schedule.py`. | |
| - Docs and specs: `README.md`, `state_of_the_repo_audit.md`, licensing files in `LICENSE/`. | |
| ## Development Practices | |
| - Follow snake_case for functions and CamelCase for classes. | |
| - Keep functions under ~300 lines and minimize deeply nested control flow. | |
| - Avoid reintroducing the deprecated dashboard `/exec` endpoint or other insecure code paths. | |
| - Use the `/status` endpoint for model introspection; all routes return JSON and surface errors with stack traces. | |
| - Ensure compression, decompression, and halting logic stay consistent with current implementation. | |
| - Use the `cpu_autocast()` helper for BF16 mixed precision on CPU instead of | |
| calling `torch.amp.autocast` directly. | |
| - Adaptive training now expands depth, width, or context only when validation loss plateaus and automatically decays the base learning rate by √2 after each expansion with a 100‑step warm‑up. | |
| ## Workflow & Commands | |
| - Run the example: `python example.py`. | |
| - Adaptive scaling now lives in `integration_schedule.py`; `progressive_scaleup.py` is deprecated. | |
| - Unified workflow (optionally with dashboard or diffusion): `python unified_workflow.py --dashboard` or `python unified_workflow.py --diffusion --diffusion-steps 8 --dataset-size 32`. | |
| - Increase `--diffusion-steps` for higher fidelity (8–16) and add `--diffusion-curriculum` to linearly decay noise over epochs. | |
| - Disable checkpointing or reversible blocks when speed is prioritized over memory: `python unified_workflow.py --no-checkpoint --no-reversible`. | |
| - Enable 4-bit quantization-aware training: `python unified_workflow.py --qat`. | |
| - Skip full attention logging during chunked attention for memory savings by constructing the model with `full_attn_logging=False`. | |
| - Start MCP server: `python mcp_server.py` and launch dashboard: `MCP_SERVER_ADDR=http://127.0.0.1:7000 python -m bit_transformer.dashboard_app`. | |
| - `/metrics` and `/model_config` endpoints expose telemetry streams and hyperparameters. | |
| - `/save_checkpoint` and `/download_checkpoint` sync weights with Hugging Face (token defaults to `HF_TOKEN`). | |
| - Container build: `docker build -t bittransformerlm .` and run with exposed ports `5000` (dashboard) and `7000` (MCP). | |
| ## Telemetry Metrics | |
| | Metric | Meaning | Range | | |
| |--------|---------|-------| | |
| | **K** | Negentropy – deviation from random noise | 0–1 (1 = ordered) | | |
| | **C** | LZ Complexity – compressibility proxy | 0–1 (higher = more changes) | | |
| | **S** | Symbiosis – agreement with reference distribution | 0–1 (1 = aligned) | | |
| ACT halting exports `halt_probs` in telemetry showing how many layers executed. For robust sampling under safety constraints, call `safe_sample_with_retry(model, bits)` which retries with diffusion mode and exponential backoff. | |
| `TelemetrySynthesizer.cluster_sequences` can be used to select representative training samples before invoking `collapse_submodel`. The distillation helper deepens the model and widens once (`width_scale` = 1.5) if floors are missed, and `save_distilled_model` emits a `metrics.json` summary beside the weights. | |
| ## Testing | |
| - Run unit tests after any change: `pytest -q`. | |
| - Use `watcher.py` for auto-reload and test on local development if desired. | |
| - During training, call `model.train()` and keep dropout probabilities around `0.1–0.2`. | |
| - Before running tests, inference, or pushing weights, switch to `model.eval()` and set all dropout probabilities to `0` to avoid flaky results. | |
| - Dashboard will warn if telemetry metrics drift by more than 0.2 over the last 10 steps. Adjust via `ModelManager(drift_window, drift_threshold)` as needed. | |
| ## Licensing | |
| - Project governed by documents in `LICENSE/` (AGPLv3, commercial terms, disclaimers, etc.). Ensure compliance before contributing or distributing. | |
| These guidelines keep the repository consistent with the project roadmap and previous audits. Maintain security, style, and testing discipline to keep BitTransformerLM production-ready. | |