Update README.md

8630d90 verified about 1 month ago

3.98 kB

	---
	license: apache-2.0
	language:
	- en
	tags:
	- diffusion
	- text-generation
	- non-autoregressive
	- token-embedding
	- cybersecurity
	- DiT
	- VQ-GAN
	pipeline_tag: text-generation
	---

	# TexITex — Parallel Text Generation via Token Embedding Diffusion in 2D Image Space

	> Can we generate entire sentences in parallel by treating token embeddings as a 2D image?

	TexITex (Token-Image-Token) is a research proof-of-concept that encodes token embeddings
	as 2D latent images and generates them all at once using image diffusion — no autoregressive
	decoding step by step.

	📄 [Read the full paper (PDF)](paper.pdf)
	💻 [GitHub — code + experiments](https://github.com/PurpleS3Cf0X/TexITex)

	---

	## How It Works

	```
	Text → token embeddings → VQ-GAN encode → (16,16,16) latent image
	↓
	DiT diffusion (200 DDIM steps)
	↓
	Text ← nearest-neighbour lookup ← VQ-GAN decode ← generated latent
	```

	64 tokens are arranged in a 16×16 grid of 2×2 patches. The VQ-GAN compresses
	each patch to a 16-channel latent. The DiT generates the full latent image in a
	fixed 200 steps regardless of sequence length.

	![Pipeline](fig_pipeline.png)

	---

	## Results (Phase 4-A, Epoch 200)

	![Results Dashboard](fig_results_dashboard.png)

	\| Metric \| Value \|
	\|--------\|-------\|
	\| VQ-GAN roundtrip accuracy \| 89.8% \|
	\| Composite score — best sample \| 0.372 \|
	\| Composite score — mean (n=64) \| 0.104 \|
	\| Bigram coherence — best sample \| 0.831 \|
	\| Real-word ratio — mean \| 0.683 \|
	\| Median perplexity \| 197 \|

	### Top Generated Outputs

	![Top 5 Samples](fig_top5_samples.png)

	Best sample (composite = 0.372, bigram = 0.831):
	> *"a simulated adversary engagement. Your objectives include testing detection
	> capabilities, exercising incident response, identifying security gaps. You employ
	> realistic adversary TTPs mapped to MITRE ATT&CK, maintain operational security,
	> and adapt your approach based on blue team responses."*

	---

	## Architecture

	### 34-Channel DiT Input

	![Channel Layout](fig_channels.png)

	\| Channels \| Role \|
	\|----------\|------\|
	\| ch 0 — position \| 0→1 gradient in reading order \|
	\| ch 1 — boundary \| 1.0 at 2×2 patch edges, prevents token bleed \|
	\| ch 2–17 — self-cond \| Previous DDIM step's x0 prediction (iterative refinement) \|
	\| ch 18–33 — noisy latent \| Current x_t from forward diffusion \|

	### Key Components

	\| Component \| Parameters \| Role \|
	\|-----------\|-----------\|------\|
	\| VQ-GAN (tokence_big_long) \| 17.6M \| Encode/decode token embeddings ↔ latent image \|
	\| DiT (depth=12, dim=512, heads=8) \| 57.8M \| Denoise the latent image \|
	\| LSTM SequencePredictor \| 239.7K \| Sequence-order auxiliary loss (weight=0.5) \|
	\| Total \| 58.0M \| \|

	### Denoising Process

	![Denoising](fig_denoising.png)

	---

	## Critical Findings

	1. LSTM sequence loss is mandatory — reducing weight from 0.5→0.2 causes complete collapse
	2. Self-conditioning enables refinement — biggest quality jump of all phases
	3. Token boundary channel prevents bleed — clearest visual improvement in latent space
	4. Best checkpoint = epoch 200 (not 300 — overtraining is real)
	5. DDIM sweet spot = 200 steps — mode-collapse cliff at ≥300 steps

	---

	## Training

	- Hardware: Apple Mac Mini M4, 64GB unified memory (MPS backend)
	- Base LM: Qwen/Qwen2.5-1.5B (embedding table only — not fine-tuned)
	- Corpus: Cybersecurity domain (red-team TTPs + blue-team playbooks, 50K sequences)
	- Training time: ~2h VQ-GAN + ~22h DiT (300 epochs)

	---

	## Citation

	```bibtex
	@misc{cj2026texitex,
	title = {TexITex: Parallel Text Generation via Token Embedding Diffusion in 2D Image Space},
	author = {Jean Paul, C J},
	year = {2026},
	url = {https://github.com/PurpleS3Cf0X/TexITex}
	}
	```

	---

	Author: Jean Paul C J (Unaffiliated)