TexITex-P4A / README.md
Red0ne's picture
Update README.md
8630d90 verified
---
license: apache-2.0
language:
- en
tags:
- diffusion
- text-generation
- non-autoregressive
- token-embedding
- cybersecurity
- DiT
- VQ-GAN
pipeline_tag: text-generation
---
# TexITex β€” Parallel Text Generation via Token Embedding Diffusion in 2D Image Space
> **Can we generate entire sentences in parallel by treating token embeddings as a 2D image?**
TexITex (Token-Image-Token) is a research proof-of-concept that encodes token embeddings
as 2D latent images and generates them all at once using image diffusion β€” no autoregressive
decoding step by step.
πŸ“„ **[Read the full paper (PDF)](paper.pdf)**
πŸ’» **[GitHub β€” code + experiments](https://github.com/PurpleS3Cf0X/TexITex)**
---
## How It Works
```
Text β†’ token embeddings β†’ VQ-GAN encode β†’ (16,16,16) latent image
↓
DiT diffusion (200 DDIM steps)
↓
Text ← nearest-neighbour lookup ← VQ-GAN decode ← generated latent
```
64 tokens are arranged in a **16Γ—16 grid** of 2Γ—2 patches. The VQ-GAN compresses
each patch to a 16-channel latent. The DiT generates the full latent image in a
fixed 200 steps regardless of sequence length.
![Pipeline](fig_pipeline.png)
---
## Results (Phase 4-A, Epoch 200)
![Results Dashboard](fig_results_dashboard.png)
| Metric | Value |
|--------|-------|
| VQ-GAN roundtrip accuracy | **89.8%** |
| Composite score β€” best sample | **0.372** |
| Composite score β€” mean (n=64) | 0.104 |
| Bigram coherence β€” best sample | **0.831** |
| Real-word ratio β€” mean | 0.683 |
| Median perplexity | 197 |
### Top Generated Outputs
![Top 5 Samples](fig_top5_samples.png)
**Best sample** (composite = 0.372, bigram = 0.831):
> *"a simulated adversary engagement. Your objectives include testing detection
> capabilities, exercising incident response, identifying security gaps. You employ
> realistic adversary TTPs mapped to MITRE ATT&CK, maintain operational security,
> and adapt your approach based on blue team responses."*
---
## Architecture
### 34-Channel DiT Input
![Channel Layout](fig_channels.png)
| Channels | Role |
|----------|------|
| ch 0 β€” position | 0β†’1 gradient in reading order |
| ch 1 β€” boundary | 1.0 at 2Γ—2 patch edges, prevents token bleed |
| ch 2–17 β€” self-cond | Previous DDIM step's x0 prediction (iterative refinement) |
| ch 18–33 β€” noisy latent | Current x_t from forward diffusion |
### Key Components
| Component | Parameters | Role |
|-----------|-----------|------|
| VQ-GAN (tokence_big_long) | 17.6M | Encode/decode token embeddings ↔ latent image |
| DiT (depth=12, dim=512, heads=8) | 57.8M | Denoise the latent image |
| LSTM SequencePredictor | 239.7K | Sequence-order auxiliary loss (weight=0.5) |
| **Total** | **58.0M** | |
### Denoising Process
![Denoising](fig_denoising.png)
---
## Critical Findings
1. **LSTM sequence loss is mandatory** β€” reducing weight from 0.5β†’0.2 causes complete collapse
2. **Self-conditioning enables refinement** β€” biggest quality jump of all phases
3. **Token boundary channel prevents bleed** β€” clearest visual improvement in latent space
4. **Best checkpoint = epoch 200** (not 300 β€” overtraining is real)
5. **DDIM sweet spot = 200 steps** β€” mode-collapse cliff at β‰₯300 steps
---
## Training
- **Hardware**: Apple Mac Mini M4, 64GB unified memory (MPS backend)
- **Base LM**: Qwen/Qwen2.5-1.5B (embedding table only β€” not fine-tuned)
- **Corpus**: Cybersecurity domain (red-team TTPs + blue-team playbooks, 50K sequences)
- **Training time**: ~2h VQ-GAN + ~22h DiT (300 epochs)
---
## Citation
```bibtex
@misc{cj2026texitex,
title = {TexITex: Parallel Text Generation via Token Embedding Diffusion in 2D Image Space},
author = {Jean Paul, C J},
year = {2026},
url = {https://github.com/PurpleS3Cf0X/TexITex}
}
```
---
*Author: Jean Paul C J (Unaffiliated)*