Text Generation
Transformers
PyTorch
English
experimental
research
bit-level
transformer
reversible
safety
telemetry
language-modeling
Instructions to use WCNegentropy/BitTransformerLM with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use WCNegentropy/BitTransformerLM with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="WCNegentropy/BitTransformerLM")# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("WCNegentropy/BitTransformerLM", dtype="auto") - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use WCNegentropy/BitTransformerLM with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "WCNegentropy/BitTransformerLM" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "WCNegentropy/BitTransformerLM", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/WCNegentropy/BitTransformerLM
- SGLang
How to use WCNegentropy/BitTransformerLM with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "WCNegentropy/BitTransformerLM" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "WCNegentropy/BitTransformerLM", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "WCNegentropy/BitTransformerLM" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "WCNegentropy/BitTransformerLM", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use WCNegentropy/BitTransformerLM with Docker Model Runner:
docker model run hf.co/WCNegentropy/BitTransformerLM
Remove virtual_machine.md - cleanup for OS launch
Browse files- virtual_machine.md +0 -39
virtual_machine.md
DELETED
|
@@ -1,39 +0,0 @@
|
|
| 1 |
-
Steps to build a self‑contained VM environment
|
| 2 |
-
Identify and lock dependencies. Use the repository’s requirements.txt and the README’s instructions (pip install --extra-index-url https://download.pytorch.org/whl/cpu -r requirements.txt
|
| 3 |
-
GitHub
|
| 4 |
-
) to install a CPU‑only PyTorch build and other packages. The VM should include system libraries required by matplotlib and scikit‑learn.
|
| 5 |
-
Create a container/VM build file. A Dockerfile or VM provisioning script can start from a lightweight base image (e.g., Ubuntu 22.04). It should:
|
| 6 |
-
Install Python 3.11 and system build tools.
|
| 7 |
-
Copy the repository into /opt/bit_transformer.
|
| 8 |
-
Install Python dependencies (optionally using a virtual environment).
|
| 9 |
-
Expose ports for the dashboard (e.g., 5000) and MCP server (e.g., 7000).
|
| 10 |
-
Set environment variables such as MCP_SERVER_ADDR to http://127.0.0.1:7000 so the dashboard automatically forwards requests to the local MCP server when both are running in the same VM.
|
| 11 |
-
Configure entrypoints. Add a shell script that starts the MCP server and dashboard concurrently. For example:
|
| 12 |
-
#!/bin/bash
|
| 13 |
-
# start MCP server in background
|
| 14 |
-
python mcp_server.py &
|
| 15 |
-
# wait for server to start
|
| 16 |
-
sleep 2
|
| 17 |
-
# start dashboard on port 5000
|
| 18 |
-
python -m bit_transformer.dashboard_app
|
| 19 |
-
Alternatively, use a process supervisor (e.g., supervisord) to keep both processes running. The script can also call watcher.py in development mode.
|
| 20 |
-
Persist state inside the VM. Use a volume or directory (e.g., /var/lib/bit_transformer) to store model snapshots and telemetry logs. The ModelManager writes weights and metrics to snapshots/ by default
|
| 21 |
-
GitHub
|
| 22 |
-
, so mount this directory to a persistent disk if needed.
|
| 23 |
-
Include optional hardware support. Since the model is CPU‑only by default but can use GPU when available, the VM build should install CUDA libraries only if GPU support is desired. For a CPU‑only VM, skip these packages.
|
| 24 |
-
Testing and health checks. Add a health‑check endpoint that calls the MCP server’s /lambdas or /infer endpoint to verify that the model responds. Also ensure that the dashboard’s HTML and JS files are included.
|
| 25 |
-
Potential Codex tasks to implement
|
| 26 |
-
The following Codex prompts can guide the creation of a self‑contained VM environment:
|
| 27 |
-
|
| 28 |
-
Write a Dockerfile that builds a container with Python 3.11, installs dependencies from requirements.txt using the CPU‑only PyTorch wheel, copies the repo, and exposes ports 5000 (dashboard) and 7000 (MCP server). Use environment variables to set MCP_SERVER_ADDR.
|
| 29 |
-
Prompt example: “Create a Dockerfile for the current repository. It should use an Ubuntu base image, install Python and pip, copy the project files, install requirements with the CPU‑only PyTorch wheel, set MCP_SERVER_ADDR=http://127.0.0.1:7000, expose ports 5000 and 7000, and set a CMD to run both the MCP server and dashboard.”
|
| 30 |
-
Add an entrypoint script named start.sh that launches the MCP server (python mcp_server.py) in the background and then the dashboard (python -m bit_transformer.dashboard_app), with appropriate sleep to allow server startup.
|
| 31 |
-
Prompt example: “Add a start.sh script to the repo that starts mcp_server.py in the background and then runs python -m bit_transformer.dashboard_app. Make it executable and update the Dockerfile to use this script as the default CMD.”
|
| 32 |
-
Extend the dashboard to allow configurable ports. Currently the dashboard uses Flask’s default port; exposing a parameter (e.g., PORT environment variable) would simplify deployment.
|
| 33 |
-
Prompt example: “Modify bit_transformer/dashboard_app.py so that the run_dashboard function accepts optional host and port parameters and defaults to environment variables HOST and PORT.”
|
| 34 |
-
Automate model initialization on startup. The VM could load a default model or create one based on environment variables.
|
| 35 |
-
Prompt example: “Update ModelManager.__init__ to read a JSON config file from /config/model_params.json if present and initialize the model automatically at startup.”
|
| 36 |
-
Add a health‑check endpoint to the MCP server to verify that the server is running.
|
| 37 |
-
Prompt example: “Add an endpoint /health to mcp_server.py that returns {"status":"ok"}. Update the Docker healthcheck to call this endpoint.”
|
| 38 |
-
## Status: Complete
|
| 39 |
-
All tasks in this guide are now implemented. The Dockerfile builds a CPU-only container and runs `start.sh` which launches both the MCP server and dashboard. The dashboard accepts HOST and PORT variables, the ModelManager can read `/config/model_params.json` on startup, and the MCP server exposes `/health` for Docker health checks.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|