--- license: apache-2.0 language: - en tags: - diffusion - text-generation - non-autoregressive - token-embedding - cybersecurity - DiT - VQ-GAN pipeline_tag: text-generation --- # TexITex — Parallel Text Generation via Token Embedding Diffusion in 2D Image Space > **Can we generate entire sentences in parallel by treating token embeddings as a 2D image?** TexITex (Token-Image-Token) is a research proof-of-concept that encodes token embeddings as 2D latent images and generates them all at once using image diffusion — no autoregressive decoding step by step. 📄 **[Read the full paper (PDF)](paper.pdf)** 💻 **[GitHub — code + experiments](https://github.com/PurpleS3Cf0X/TexITex)** --- ## How It Works ``` Text → token embeddings → VQ-GAN encode → (16,16,16) latent image ↓ DiT diffusion (200 DDIM steps) ↓ Text ← nearest-neighbour lookup ← VQ-GAN decode ← generated latent ``` 64 tokens are arranged in a **16×16 grid** of 2×2 patches. The VQ-GAN compresses each patch to a 16-channel latent. The DiT generates the full latent image in a fixed 200 steps regardless of sequence length. ![Pipeline](fig_pipeline.png) --- ## Results (Phase 4-A, Epoch 200) ![Results Dashboard](fig_results_dashboard.png) | Metric | Value | |--------|-------| | VQ-GAN roundtrip accuracy | **89.8%** | | Composite score — best sample | **0.372** | | Composite score — mean (n=64) | 0.104 | | Bigram coherence — best sample | **0.831** | | Real-word ratio — mean | 0.683 | | Median perplexity | 197 | ### Top Generated Outputs ![Top 5 Samples](fig_top5_samples.png) **Best sample** (composite = 0.372, bigram = 0.831): > *"a simulated adversary engagement. Your objectives include testing detection > capabilities, exercising incident response, identifying security gaps. You employ > realistic adversary TTPs mapped to MITRE ATT&CK, maintain operational security, > and adapt your approach based on blue team responses."* --- ## Architecture ### 34-Channel DiT Input ![Channel Layout](fig_channels.png) | Channels | Role | |----------|------| | ch 0 — position | 0→1 gradient in reading order | | ch 1 — boundary | 1.0 at 2×2 patch edges, prevents token bleed | | ch 2–17 — self-cond | Previous DDIM step's x0 prediction (iterative refinement) | | ch 18–33 — noisy latent | Current x_t from forward diffusion | ### Key Components | Component | Parameters | Role | |-----------|-----------|------| | VQ-GAN (tokence_big_long) | 17.6M | Encode/decode token embeddings ↔ latent image | | DiT (depth=12, dim=512, heads=8) | 57.8M | Denoise the latent image | | LSTM SequencePredictor | 239.7K | Sequence-order auxiliary loss (weight=0.5) | | **Total** | **58.0M** | | ### Denoising Process ![Denoising](fig_denoising.png) --- ## Critical Findings 1. **LSTM sequence loss is mandatory** — reducing weight from 0.5→0.2 causes complete collapse 2. **Self-conditioning enables refinement** — biggest quality jump of all phases 3. **Token boundary channel prevents bleed** — clearest visual improvement in latent space 4. **Best checkpoint = epoch 200** (not 300 — overtraining is real) 5. **DDIM sweet spot = 200 steps** — mode-collapse cliff at ≥300 steps --- ## Training - **Hardware**: Apple Mac Mini M4, 64GB unified memory (MPS backend) - **Base LM**: Qwen/Qwen2.5-1.5B (embedding table only — not fine-tuned) - **Corpus**: Cybersecurity domain (red-team TTPs + blue-team playbooks, 50K sequences) - **Training time**: ~2h VQ-GAN + ~22h DiT (300 epochs) --- ## Citation ```bibtex @misc{cj2026texitex, title = {TexITex: Parallel Text Generation via Token Embedding Diffusion in 2D Image Space}, author = {Jean Paul, C J}, year = {2026}, url = {https://github.com/PurpleS3Cf0X/TexITex} } ``` --- *Author: Jean Paul C J (Unaffiliated)*