File size: 3,491 Bytes
607c84e 791fde2 607c84e |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 |
---
license: mit
library_name: diffusers
pipeline_tag: text-to-image
tags:
- gcode
- cnc
- plotter
- polargraph
- stable-diffusion
- text-to-gcode
- diffusion
base_model: runwayml/stable-diffusion-v1-5
datasets:
- twarner/dcode-imagenet-sketch
---
# dcode: Text-to-Gcode Diffusion Model
An end-to-end diffusion model that converts **text prompts directly into G-code** for CNC machines, plotters, and polargraph drawing robots.
## Overview
dcode is a fine-tuned Stable Diffusion model with a custom G-code decoder head. It takes a text description (e.g., "a sketch of a horse") and outputs machine-executable G-code.
| Component | Description |
|-----------|-------------|
| Base Model | [Stable Diffusion v1.5](https://huggingface.co/runwayml/stable-diffusion-v1-5) |
| Decoder | 200M param transformer (12 layers, 1024 hidden, 16 heads) |
| Tokenizer | Custom BPE tokenizer for G-code |
| Training Data | [dcode-imagenet-sketch](https://huggingface.co/datasets/twarner/dcode-imagenet-sketch) |
## Architecture
```
Text Prompt
β
[CLIP Text Encoder] β frozen
β
[UNet Diffusion] β frozen
β
Latent (4Γ64Γ64)
β
[CNN Projector] β trained
β
[Transformer Decoder] β trained
β
G-code Tokens
β
G-code Text
```
## Usage
### With Diffusers
```python
import torch
from diffusers import StableDiffusionPipeline
from huggingface_hub import hf_hub_download
from transformers import PreTrainedTokenizerFast
# Load components
pipe = StableDiffusionPipeline.from_pretrained(
"runwayml/stable-diffusion-v1-5",
torch_dtype=torch.float16
).to("cuda")
# Download decoder weights
weights = hf_hub_download("twarner/dcode-sd-gcode-v3", "pytorch_model.bin")
tokenizer_path = hf_hub_download("twarner/dcode-sd-gcode-v3", "gcode_tokenizer/tokenizer.json")
# Load custom gcode tokenizer
gcode_tokenizer = PreTrainedTokenizerFast(tokenizer_file=tokenizer_path)
# Generate latent from text
with torch.no_grad():
latent = pipe("a sketch of a horse", output_type="latent").images
# ... decode with GcodeDecoderV3 (see repo for full inference code)
```
### Interactive Demo
Try the model live: **[huggingface.co/spaces/twarner/dcode](https://huggingface.co/spaces/twarner/dcode)**
## Training
- **Dataset**: 50,000 ImageNet-Sketch images β 200,000 G-code files
- **Hardware**: 8Γ NVIDIA H100 80GB
- **Epochs**: 50
- **Batch Size**: 256 effective (32 Γ 8 GPUs)
- **Learning Rate**: 1e-4 with cosine schedule
- **Regularization**: Label smoothing (0.1), weight decay (0.05)
## G-code Output
The model generates G-code compatible with:
- Polargraph/drawbot machines
- Pen plotters
- Any G-code compatible CNC
Example output:
```gcode
G21 ; mm
G90 ; absolute
M280 P0 S90 ; pen up
G28 ; home
G0 X-200.00 Y100.00 F1000
M280 P0 S40 ; pen down
G1 X-180.00 Y120.00 F500
G1 X-160.00 Y115.00 F500
...
```
## Machine Specs
Default work area (configurable):
- Width: 841mm
- Height: 1189mm (A0 paper)
- Pen servo: 40Β° down, 90Β° up
## Project
Full project documentation, hardware build guide, and source code:
**π [teddywarner.org/Projects/Polargraph/#dcode](https://teddywarner.org/Projects/Polargraph/#dcode)**
**GitHub**: [github.com/Twarner491/dcode](https://github.com/Twarner491/dcode)
## Citation
```bibtex
@misc{dcode2024,
author = {Teddy Warner},
title = {dcode: Text-to-Gcode Diffusion Model},
year = {2026},
url = {https://teddywarner.org/Projects/Polargraph/#dcode}
}
```
## License
MIT License
|