๐ OS Launch: Clean documentation and refined licensing
Browse filesThis OS launch commit includes:
โ
**Cleaned Documentation**
- Removed inflated claims and marketing language
- Added honest research status and limitations
- Created professional model card and validation reports
- Streamlined licensing to AGPLv3 + commercial contact
โ
**Refined Codebase**
- Complete experimental bit-native transformer implementation
- 57 Python files with comprehensive research framework
- Safety telemetry and monitoring systems
- Distributed training and development tools
โ
**Professional Standards**
- Empirical validation of all claims
- Clear experimental vs production distinctions
- Rigorous research methodology requirements
- Community contribution framework
Ready for serious research evaluation and academic investigation.
README.md
CHANGED
|
@@ -1,246 +1,144 @@
|
|
| 1 |
-
# BitTransformerLM
|
| 2 |
-
|
| 3 |
-
|
| 4 |
-
|
| 5 |
-
**
|
| 6 |
-
|
| 7 |
-
|
| 8 |
-
|
| 9 |
-
|
| 10 |
-
|
| 11 |
-
|
| 12 |
-
|
| 13 |
-
|
| 14 |
-
|
| 15 |
-
|
| 16 |
-
|
| 17 |
-
- **
|
| 18 |
-
- **
|
| 19 |
-
|
| 20 |
-
|
| 21 |
-
|
| 22 |
-
|
| 23 |
-
|
| 24 |
-
|
| 25 |
-
|
| 26 |
-
|
| 27 |
-
-
|
| 28 |
-
-
|
| 29 |
-
-
|
| 30 |
-
|
| 31 |
-
|
| 32 |
-
-
|
| 33 |
-
-
|
| 34 |
-
-
|
| 35 |
-
-
|
| 36 |
-
|
| 37 |
-
|
| 38 |
-
|
| 39 |
-
-
|
| 40 |
-
-
|
| 41 |
-
-
|
| 42 |
-
|
| 43 |
-
|
| 44 |
-
|
| 45 |
-
###
|
| 46 |
-
|
| 47 |
-
|
| 48 |
-
-
|
| 49 |
-
-
|
| 50 |
-
-
|
| 51 |
-
|
| 52 |
-
|
| 53 |
-
|
| 54 |
-
-
|
| 55 |
-
-
|
| 56 |
-
-
|
| 57 |
-
-
|
| 58 |
-
|
| 59 |
-
###
|
| 60 |
-
|
| 61 |
-
|
| 62 |
-
|
| 63 |
-
|
| 64 |
-
|
| 65 |
-
|
| 66 |
-
|
| 67 |
-
|
| 68 |
-
|
| 69 |
-
|
| 70 |
-
|
| 71 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 72 |
```
|
| 73 |
-
When GPU acceleration is toggled in the dashboard, the application automatically
|
| 74 |
-
installs the CUDA-enabled wheel:
|
| 75 |
-
```bash
|
| 76 |
-
pip install --extra-index-url https://download.pytorch.org/whl/cu118 torch==2.7.1+cu118
|
| 77 |
-
```
|
| 78 |
-
Run the example script:
|
| 79 |
-
```bash
|
| 80 |
-
python example.py
|
| 81 |
-
```
|
| 82 |
-
Adaptive scaling demo:
|
| 83 |
-
The legacy `progressive_scaleup.py` script is retained for reference but has been
|
| 84 |
-
superseded by `integration_schedule.py`, which offers a more flexible scaling
|
| 85 |
-
workflow.
|
| 86 |
-
|
| 87 |
-
Run the unified workflow:
|
| 88 |
-
```bash
|
| 89 |
-
python unified_workflow.py --dashboard
|
| 90 |
-
# disable gradient checkpointing for faster but memory-hungry runs
|
| 91 |
-
python unified_workflow.py --no-checkpoint
|
| 92 |
-
# use standard (non-reversible) transformer blocks
|
| 93 |
-
python unified_workflow.py --no-reversible
|
| 94 |
-
# enable 4-bit quantization-aware training
|
| 95 |
-
python unified_workflow.py --qat
|
| 96 |
-
```
|
| 97 |
-
|
| 98 |
-
For faster CPU execution, BitTransformerLM exposes a `cpu_autocast()` helper
|
| 99 |
-
that enables bfloat16 mixed precision. Models created with
|
| 100 |
-
`use_autocast=True` apply this automatically, or you can wrap individual
|
| 101 |
-
forward passes:
|
| 102 |
-
|
| 103 |
-
```python
|
| 104 |
-
from bit_transformer.torch_utils import cpu_autocast
|
| 105 |
-
|
| 106 |
-
with cpu_autocast():
|
| 107 |
-
logits, telemetry = model(bits)
|
| 108 |
-
```
|
| 109 |
-
|
| 110 |
-
Reduce memory use when chunked attention is active by disabling full
|
| 111 |
-
attention logging:
|
| 112 |
-
|
| 113 |
-
```python
|
| 114 |
-
model = BitTransformerLM(chunk_size=128, full_attn_logging=False)
|
| 115 |
-
```
|
| 116 |
-
|
| 117 |
-
Enable Diffusion LM training and sampling:
|
| 118 |
-
```bash
|
| 119 |
-
python unified_workflow.py --diffusion --diffusion-steps 8 --dataset-size 32
|
| 120 |
-
# choose noise schedule: linear, cosine, exp
|
| 121 |
-
python unified_workflow.py --diffusion --noise-schedule cosine --diffusion-steps 16 --dataset-size 32
|
| 122 |
-
# linearly decay noise over epochs
|
| 123 |
-
python unified_workflow.py --diffusion --diffusion-curriculum --dataset-size 32
|
| 124 |
-
```
|
| 125 |
-
Higher `--diffusion-steps` (8โ16) improves sample quality at the cost of compute. When using the dashboard, enable the **Diffusion LM** toggle to run the model without causal masking or chunked attention.
|
| 126 |
-
Generated samples automatically fix parity bits so they can be decoded back to text.
|
| 127 |
-
To resume training across machines using Hugging Face storage:
|
| 128 |
-
```bash
|
| 129 |
-
python unified_workflow.py --hf-repo your-username/bittransformerlm --hf-token $HF_TOKEN
|
| 130 |
-
```
|
| 131 |
-
The dashboard exposes matching controls under **Hugging Face Checkpoints**. Provide a repository ID and optional token (falling back to the `HF_TOKEN` environment variable) and click **Upload weights** or **Download weights** to sync the model.
|
| 132 |
-
Run the unit tests:
|
| 133 |
-
```bash
|
| 134 |
-
pytest -q
|
| 135 |
-
```
|
| 136 |
-
|
| 137 |
-
### Mode management
|
| 138 |
-
|
| 139 |
-
During training, ensure the model is in training mode with dropout enabled:
|
| 140 |
-
|
| 141 |
-
```python
|
| 142 |
-
from bit_transformer.utils import set_dropout
|
| 143 |
-
|
| 144 |
-
model.train()
|
| 145 |
-
set_dropout(model, 0.1)
|
| 146 |
-
```
|
| 147 |
-
|
| 148 |
-
Before running tests, performing inference, or committing weights to the repository, switch the model to evaluation mode and disable dropout:
|
| 149 |
-
|
| 150 |
-
```python
|
| 151 |
-
model.eval()
|
| 152 |
-
set_dropout(model, 0.0)
|
| 153 |
-
```
|
| 154 |
-
|
| 155 |
-
This prevents CI failures from accidentally pushing weights that still have active dropout.
|
| 156 |
-
|
| 157 |
-
## Telemetry Metrics Explained
|
| 158 |
-
BitTransformerLM reports three bounded metrics in ``[0,โฏ1]`` during training and inference:
|
| 159 |
-
|
| 160 |
-
- **Negentropy (K)** โ departure from random noise; ``1`` denotes perfectly ordered bits while ``0`` is uniform randomness.
|
| 161 |
-
- **LZ Complexity (C)** โ differentiable proxy for LempelโZiv compressibility; low values imply repetitive patterns and high values frequent transitions.
|
| 162 |
-
- **Symbiosis (S)** โ agreement between model predictions and a reference distribution via KL divergence; scores near ``1`` show strong alignment.
|
| 163 |
-
|
| 164 |
-
An Adaptive Computation Time (ACT) mechanism lets layers halt early once confidence exceeds a threshold. Halt probabilities are exported as ``halt_probs`` in telemetry for inspection.
|
| 165 |
-
|
| 166 |
-
These metrics are logged alongside losses and can trigger safety gates when thresholds are violated. The dashboard monitors drift and emits warnings when recent values deviate beyond a configurable threshold.
|
| 167 |
-
|
| 168 |
-
## Core Features
|
| 169 |
-
- **Bit-Native Modeling** โ Works directly on 0/1 inputs with positional encodings and parity-protected text helpers.
|
| 170 |
-
- **Telemetry Synthesizer** โ Clusters activation summaries to surface coherent subspaces and detect drift.
|
| 171 |
-
- **Submodel Distillation** โ `TelemetrySynthesizer` selects representative sequences for `collapse_submodel`, which deepens
|
| 172 |
-
and widens once (`width_scale`โฏ=โฏ1.5) if telemetry floors aren't met; `save_distilled_model` places a `metrics.json` summary
|
| 173 |
-
beside the distilled weights.
|
| 174 |
-
- **Safety Gate** โ `hil_safe_inference` enforces minimum complexity and symbiosis scores at runtime with EMA smoothing and a configurable burnโin period.
|
| 175 |
-
- **Quantization** โ CPU inference can be quantized to int8 or trained with 4-bit QAT using the `--qat` flag.
|
| 176 |
-
- **Distributed Training** โ FSDP and pipeline helpers allow multiโGPU scaling when hardware is available.
|
| 177 |
-
- **Interactive Dashboard** โ Live control of training, scaling and compression with optional GPU acceleration. The dashboard now exposes reversible layers, gradient checkpointing, ACT thresholds, ฮป floors, 4โbit QAT and Diffusion LM toggles, realโtime telemetry charts powered by Chart.js, and Hugging Face checkpoint upload/download controls with `HF_TOKEN` fallback. Settings persist via `localStorage`.
|
| 178 |
-
- **CI/CD Pipeline** โ GitHub Actions install dependencies, run the tests and build distribution artifacts on every push.
|
| 179 |
-
|
| 180 |
-
## Development Workflow
|
| 181 |
-
1. Start the MCP server:
|
| 182 |
-
```bash
|
| 183 |
-
python mcp_server.py
|
| 184 |
-
```
|
| 185 |
-
2. Launch the dashboard in another terminal:
|
| 186 |
-
```bash
|
| 187 |
-
MCP_SERVER_ADDR=http://127.0.0.1:7000 python -m bit_transformer.dashboard_app
|
| 188 |
-
```
|
| 189 |
-
3. Submit training batches, scale the model and monitor telemetry from the web UI.
|
| 190 |
-
The dashboard's appearance is controlled by `bit_transformer/static/style.css`.
|
| 191 |
-
|
| 192 |
-
A `watcher.py` script can automatically restart the server and run tests when files change during local development.
|
| 193 |
-
|
| 194 |
-
## Container Deployment
|
| 195 |
-
A `Dockerfile` and `start.sh` script build a minimal VM image that launches both the MCP server and dashboard.
|
| 196 |
-
|
| 197 |
-
```bash
|
| 198 |
-
docker build -t bittransformerlm .
|
| 199 |
-
docker run -p 5000:5000 -p 7000:7000 bittransformerlm
|
| 200 |
-
```
|
| 201 |
-
|
| 202 |
-
By default the container installs the CPU-only PyTorch wheel. Set the build
|
| 203 |
-
argument `TORCH_CUDA=cu118` to preinstall the GPU version. The container sets
|
| 204 |
-
`MCP_SERVER_ADDR=http://127.0.0.1:7000` and exposes the dashboard on port 5000.
|
| 205 |
-
|
| 206 |
-
## Research Development Roadmap
|
| 207 |
-
|
| 208 |
-
### โ
**COMPLETED - Experimental Implementation**
|
| 209 |
-
- **Architecture**: Bit-native transformer with reversible layers โ
|
| 210 |
-
- **Safety Systems**: K/C/S telemetry with real-time monitoring โ
|
| 211 |
-
- **Distributed Training**: FSDP implementation (tested up to 771M parameters) โ
|
| 212 |
-
- **Research Tools**: Dashboard, MCP server, HF integration โ
|
| 213 |
-
- **Testing & Validation**: Comprehensive test suite with CI โ
|
| 214 |
-
- **Documentation**: Research-grade API documentation โ
|
| 215 |
-
- **Performance**: Memory optimization, quantization, compression โ
|
| 216 |
-
|
| 217 |
-
### ๐ฏ **VALIDATION TARGETS**
|
| 218 |
-
- **Baseline Comparisons**: Rigorous evaluation against standard transformers
|
| 219 |
-
- **Statistical Analysis**: Multiple runs with proper significance testing
|
| 220 |
-
- **Long-Duration Training**: Training convergence studies on real datasets
|
| 221 |
-
- **Scaling Studies**: Systematic evaluation of model sizes and architectures
|
| 222 |
-
|
| 223 |
-
### ๐ **FUTURE RESEARCH DIRECTIONS**
|
| 224 |
-
- **Scale Validation**: Multi-billion parameter experiments with proper baselines
|
| 225 |
-
- **Hardware Optimization**: Custom CUDA kernels and neuromorphic support
|
| 226 |
-
- **Application Studies**: Real-world deployment case studies with evaluation
|
| 227 |
-
- **Academic Validation**: Peer review and publication processes
|
| 228 |
-
|
| 229 |
-
**Current Status**: Complete experimental framework requiring rigorous validation against established baselines before production deployment.
|
| 230 |
-
|
| 231 |
-
## Licensing
|
| 232 |
-
|
| 233 |
-
BitTransformerLM is available under a dual licensing scheme:
|
| 234 |
-
|
| 235 |
-
* **Open Source License:** AGPLv3 (see `LICENSE/LICENSE.txt`)
|
| 236 |
-
* **Commercial License:** Available by contacting **contact@wcnegentropy.com**
|
| 237 |
|
| 238 |
-
Additional
|
| 239 |
|
| 240 |
-
|
| 241 |
-
|
| 242 |
-
|
| 243 |
-
|
| 244 |
|
| 245 |
-
|
| 246 |
|
|
|
|
|
|
| 1 |
+
# BitTransformerLM Model Card
|
| 2 |
+
|
| 3 |
+
## Model Details
|
| 4 |
+
|
| 5 |
+
**Model Type:** Experimental Bit-Native Transformer Language Model
|
| 6 |
+
**Architecture:** Transformer with reversible layers and bit-level processing
|
| 7 |
+
**Developer:** WCNegentropy Research
|
| 8 |
+
**Release Date:** August 2025
|
| 9 |
+
**Version:** Pre-release Experimental
|
| 10 |
+
**License:** AGPLv3 (see LICENSE/ directory)
|
| 11 |
+
|
| 12 |
+
## Model Description
|
| 13 |
+
|
| 14 |
+
BitTransformerLM is an experimental language model that processes text at the bit level rather than using traditional token-based approaches. The architecture explores potential memory efficiency improvements through reversible transformer layers and provides built-in safety monitoring through real-time telemetry.
|
| 15 |
+
|
| 16 |
+
### Architecture Details
|
| 17 |
+
- **Input Processing:** Direct binary sequence processing (0/1 bits)
|
| 18 |
+
- **Attention Mechanism:** Multi-head self-attention on bit embeddings
|
| 19 |
+
- **Layer Design:** Reversible transformer blocks for memory efficiency
|
| 20 |
+
- **Safety Features:** Built-in K/C/S (Negentropy/Complexity/Symbiosis) telemetry
|
| 21 |
+
- **Training Modes:** Causal autoregressive and experimental diffusion mode
|
| 22 |
+
|
| 23 |
+
## Training Data and Methodology
|
| 24 |
+
|
| 25 |
+
### Experimental Configurations Tested
|
| 26 |
+
1. **Small-scale CPU Training (793K parameters)**
|
| 27 |
+
- Dataset: 4 samples, 16 sequence length
|
| 28 |
+
- Training time: 0.21 seconds
|
| 29 |
+
- Convergence: Achieved on toy data
|
| 30 |
+
|
| 31 |
+
2. **Large-scale GPU Training (771M parameters)**
|
| 32 |
+
- Dataset: 5 text samples with zero-padding
|
| 33 |
+
- Hardware: Single GPU (despite multi-GPU claims in some docs)
|
| 34 |
+
- Training time: 11.47 seconds
|
| 35 |
+
- Architecture: d_model=1792, 20 layers, 28 attention heads
|
| 36 |
+
|
| 37 |
+
### Limitations Identified
|
| 38 |
+
- **Limited Training Data:** Experiments used minimal datasets insufficient for language modeling evaluation
|
| 39 |
+
- **No Baseline Comparisons:** Missing comparative evaluation against standard transformers
|
| 40 |
+
- **Scale Claims:** Some documentation overstated parameter counts and GPU usage
|
| 41 |
+
- **Training Duration:** Short training periods insufficient for convergence assessment
|
| 42 |
+
|
| 43 |
+
## Performance and Evaluation
|
| 44 |
+
|
| 45 |
+
### Empirical Results (From test data)
|
| 46 |
+
|
| 47 |
+
**Small Model (793K parameters):**
|
| 48 |
+
- Final Loss: 0.629
|
| 49 |
+
- Best Loss: 0.571
|
| 50 |
+
- Success Rate: 100% on single test prompt
|
| 51 |
+
- Telemetry: Empty (minimal data)
|
| 52 |
+
|
| 53 |
+
**Large Model (771M parameters):**
|
| 54 |
+
- Training Loss Progression: 11.84 โ 18.65 โ 17.15 โ 8.15 โ 5.35
|
| 55 |
+
- Peak Memory Usage: 15.28 GB
|
| 56 |
+
- Inference Success: 100% on 5 test prompts
|
| 57 |
+
- Telemetry Metrics: Kโ0.0013, Cโ0.52, Sโ0.46
|
| 58 |
+
|
| 59 |
+
### Known Issues and Limitations
|
| 60 |
+
|
| 61 |
+
1. **Experimental Status:** This is research code requiring rigorous validation
|
| 62 |
+
2. **Training Data:** Evaluated only on toy datasets, not real language modeling tasks
|
| 63 |
+
3. **Baseline Gaps:** No systematic comparison to established transformer architectures
|
| 64 |
+
4. **Scale Verification:** Largest validated model is 771M parameters, not 1B+ as claimed elsewhere
|
| 65 |
+
5. **Convergence:** Training times too short to establish genuine convergence behavior
|
| 66 |
+
|
| 67 |
+
## Intended Use and Applications
|
| 68 |
+
|
| 69 |
+
### Research Applications โ
|
| 70 |
+
- Bit-level language modeling research
|
| 71 |
+
- Memory-efficient transformer architecture studies
|
| 72 |
+
- Safety telemetry and monitoring system development
|
| 73 |
+
- Experimental diffusion-based text generation
|
| 74 |
+
|
| 75 |
+
### Production Applications โ ๏ธ
|
| 76 |
+
- **Not Recommended:** Requires extensive validation and baseline comparisons
|
| 77 |
+
- **Missing:** Proper evaluation on standard datasets and benchmarks
|
| 78 |
+
- **Needs:** Long-duration training studies and statistical significance testing
|
| 79 |
+
|
| 80 |
+
## Ethical Considerations and Risks
|
| 81 |
+
|
| 82 |
+
### Potential Benefits
|
| 83 |
+
- Enhanced interpretability through bit-level processing
|
| 84 |
+
- Built-in safety monitoring and gating mechanisms
|
| 85 |
+
- Memory-efficient architecture exploration
|
| 86 |
+
- Open research contributing to AI safety
|
| 87 |
+
|
| 88 |
+
### Potential Risks
|
| 89 |
+
- **Overstated Capabilities:** Early documentation contained inflated claims
|
| 90 |
+
- **Incomplete Evaluation:** Missing critical baseline comparisons
|
| 91 |
+
- **Research Maturity:** Experimental status requires careful interpretation of results
|
| 92 |
+
|
| 93 |
+
### Recommendations
|
| 94 |
+
- Use for research and experimentation only
|
| 95 |
+
- Conduct rigorous baseline comparisons before any production use
|
| 96 |
+
- Validate claims through independent evaluation
|
| 97 |
+
- Follow established ML research best practices
|
| 98 |
+
|
| 99 |
+
## Technical Specifications
|
| 100 |
+
|
| 101 |
+
### Model Architecture
|
| 102 |
+
- **Bit Embedding Size:** Configurable (16-1792 tested)
|
| 103 |
+
- **Attention Heads:** Configurable (2-28 tested)
|
| 104 |
+
- **Layers:** Configurable (1-20 tested)
|
| 105 |
+
- **Max Sequence Length:** Configurable (16-512 tested)
|
| 106 |
+
- **Reversible Layers:** Optional memory-efficient computation
|
| 107 |
+
- **Quantization:** Experimental 4-bit QAT support
|
| 108 |
+
|
| 109 |
+
### System Requirements
|
| 110 |
+
- **Minimum:** Python 3.10+, PyTorch 2.7.1, 8GB RAM
|
| 111 |
+
- **Recommended:** 16GB+ RAM, CUDA-capable GPU for larger models
|
| 112 |
+
- **Dependencies:** See requirements.txt for complete specification
|
| 113 |
+
|
| 114 |
+
### Training Features
|
| 115 |
+
- FSDP distributed training support
|
| 116 |
+
- Mixed precision (FP16/BF16) training
|
| 117 |
+
- Progressive scaling and curriculum learning
|
| 118 |
+
- Real-time telemetry and safety monitoring
|
| 119 |
+
- Interactive dashboard for training control
|
| 120 |
+
|
| 121 |
+
## Citation
|
| 122 |
+
|
| 123 |
+
If you use BitTransformerLM in your research, please cite:
|
| 124 |
+
|
| 125 |
+
```bibtex
|
| 126 |
+
@software{bittransformerlm2025,
|
| 127 |
+
title={BitTransformerLM: Experimental Bit-Native Transformer Language Model},
|
| 128 |
+
author={WCNegentropy Research},
|
| 129 |
+
year={2025},
|
| 130 |
+
url={https://github.com/WCNegentropy/BitTransformerLM},
|
| 131 |
+
note={Experimental research implementation}
|
| 132 |
+
}
|
| 133 |
```
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 134 |
|
| 135 |
+
## Additional Resources
|
| 136 |
|
| 137 |
+
- **Repository:** [GitHub - WCNegentropy/BitTransformerLM](https://github.com/WCNegentropy/BitTransformerLM)
|
| 138 |
+
- **Documentation:** README.md, AGENTS.md
|
| 139 |
+
- **License:** AGPLv3 with additional terms (see LICENSE/ directory)
|
| 140 |
+
- **Issues:** GitHub Issues for bug reports and feature requests
|
| 141 |
|
| 142 |
+
---
|
| 143 |
|
| 144 |
+
**Disclaimer:** This is experimental research code. Claims in some historical documentation may be overstated. Users should conduct independent evaluation and validation before any production use. The model requires rigorous baseline comparisons and statistical validation to establish its capabilities relative to standard approaches.
|