Update README.md
Browse files
README.md
CHANGED
|
@@ -24,6 +24,10 @@ base_model:
|
|
| 24 |
|
| 25 |
---
|
| 26 |
|
|
|
|
|
|
|
|
|
|
|
|
|
| 27 |
## Thesis
|
| 28 |
|
| 29 |
Autoregressive language models are bottlenecked by sequential generation. CLSD deploys a hybrid causal Diffusion Transformer (DiT) -- a strided 12-layer slice of Qwen3.5-9B -- operating in the continuous embedding space of the same frozen Qwen3.5-9B verifier. Both models share the exact same 4096-dimensional manifold, the same tokenizer, and the same attention geometry. No projection bridges, no dimensional translation loss.
|
|
|
|
| 24 |
|
| 25 |
---
|
| 26 |
|
| 27 |
+
## What is cool
|
| 28 |
+
|
| 29 |
+
The base verifier has fixed weights, but its inference process is not exhausted by ordinary left-to-right decoding. A learned continuous proposer can search for hidden-state trajectories and token paths that the verifier can recognize as correct, even if the verifier would rarely or never reach them under standard autoregressive rollout.
|
| 30 |
+
|
| 31 |
## Thesis
|
| 32 |
|
| 33 |
Autoregressive language models are bottlenecked by sequential generation. CLSD deploys a hybrid causal Diffusion Transformer (DiT) -- a strided 12-layer slice of Qwen3.5-9B -- operating in the continuous embedding space of the same frozen Qwen3.5-9B verifier. Both models share the exact same 4096-dimensional manifold, the same tokenizer, and the same attention geometry. No projection bridges, no dimensional translation loss.
|