Update README.md
Browse files
README.md
CHANGED
|
@@ -100,8 +100,6 @@ The HPC process executes in multiple phases to guarantee globally coherent scali
|
|
| 100 |
4. **Phase 4: 24-Beam Hensel Search**: Maintains 24 parallel configuration beams across the tensor, branching candidates evaluated via triality-weighted scoring.
|
| 101 |
5. **Phase 5: Sub-Block Shor Refinement**: A second, smaller Shor sequential measurement over a 16-node graph corresponding to the 16 sub-blocks within each 256-weight superblock.
|
| 102 |
|
| 103 |
-
|
| 104 |
-
|
| 105 |
## 6. Prerequisites and Build Instructions
|
| 106 |
|
| 107 |
Before you can quantize models, you must build the Shor-optimized HPC C engine.
|
|
@@ -191,5 +189,111 @@ The quantizer reports a fidelity rating based on total RMSE across all quantized
|
|
| 191 |
- **RMSE is higher than standard Q2_K:** This is intentional. The D₆ vesica gate trades total RMSE for computation-aligned error minimization.
|
| 192 |
- **libhexstate_q2k.so not found:** Make sure to compile the C engine using `make -f makefile.quantize`.
|
| 193 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 194 |
## 10. License
|
| 195 |
The quantizer code is part of the HPC project (MIT). Quantized models inherit the license of the base model (e.g., Gemma Terms of Use).
|
|
|
|
|
|
| 100 |
4. **Phase 4: 24-Beam Hensel Search**: Maintains 24 parallel configuration beams across the tensor, branching candidates evaluated via triality-weighted scoring.
|
| 101 |
5. **Phase 5: Sub-Block Shor Refinement**: A second, smaller Shor sequential measurement over a 16-node graph corresponding to the 16 sub-blocks within each 256-weight superblock.
|
| 102 |
|
|
|
|
|
|
|
| 103 |
## 6. Prerequisites and Build Instructions
|
| 104 |
|
| 105 |
Before you can quantize models, you must build the Shor-optimized HPC C engine.
|
|
|
|
| 189 |
- **RMSE is higher than standard Q2_K:** This is intentional. The D₆ vesica gate trades total RMSE for computation-aligned error minimization.
|
| 190 |
- **libhexstate_q2k.so not found:** Make sure to compile the C engine using `make -f makefile.quantize`.
|
| 191 |
|
| 192 |
+
# How the HPC Engine Makes Global Error Decisions
|
| 193 |
+
|
| 194 |
+
The HPC (Holographic Phase Contraction) engine takes a fundamentally different approach to LLM weight quantization. Instead of minimizing local error block-by-block, it frames scale selection as a **global quantum-inspired optimization problem**. It maps quantization candidates to a constraint graph and evaluates them using the **Griffiths-Niu Sequential Measurement** protocol derived from Shor's factoring algorithm.
|
| 195 |
+
|
| 196 |
+
Here is a step-by-step breakdown of how the engine makes global error decisions, directly referencing the implementation in `hexstate_quantize.c`.
|
| 197 |
+
|
| 198 |
+
---
|
| 199 |
+
|
| 200 |
+
## 1. Candidate Generation and D₆ Vesica Scoring
|
| 201 |
+
Before any global optimization occurs, the engine needs candidates. For each block of weights, it generates candidate scales (`d` and `dmin`) around a baseline least-squares optimum.
|
| 202 |
+
|
| 203 |
+
Instead of scoring these candidates using standard Mean Squared Error (MSE), it uses the **D₆ Vesica Gate**.
|
| 204 |
+
|
| 205 |
+
### Code Citation (`hexstate_quantize.c`, lines ~1420–1432)
|
| 206 |
+
```c
|
| 207 |
+
/* Decompose into vesica (DC) and wave (AC) components */
|
| 208 |
+
float vesica_err = 0.0f, wave_err = 0.0f;
|
| 209 |
+
for (int p = 0; p < half_g; p++) {
|
| 210 |
+
float v = e_cur[p] + e_cur[p + half_g];
|
| 211 |
+
float w_wave = e_cur[p] - e_cur[p + half_g];
|
| 212 |
+
float w_avg = (w_cur[p] + w_cur[p + half_g]) * 0.5f;
|
| 213 |
+
vesica_err += v * v * w_avg;
|
| 214 |
+
wave_err += w_wave * w_wave * w_avg;
|
| 215 |
+
}
|
| 216 |
+
/* Triality weighting: penalize vesica 4×, wave 1×. */
|
| 217 |
+
err += 0.5f * (4.0f * vesica_err + 1.0f * wave_err);
|
| 218 |
+
```
|
| 219 |
+
**Why this matters:** `vesica_err` represents errors that sum together during matrix multiplication, propagating through the network and destroying reasoning. `wave_err` represents errors that naturally cancel out. By penalizing `vesica_err` by 4x, the engine heavily biases towards candidates whose errors cancel out during inference, even if their total Euclidean distance (MSE) from the original weights is larger.
|
| 220 |
+
|
| 221 |
+
## 2. Graph Construction and Boltzmann Amplitudes
|
| 222 |
+
The selected candidates and their Vesica errors are grouped into 6 bins (representing the 6 states of a $Z_6$ "quhit" or quantum digit). The engine constructs an `HPCGraph` mapping each block to one or more quhits.
|
| 223 |
+
|
| 224 |
+
The errors are transformed into "Boltzmann amplitudes"—representing the likelihood of selecting each state.
|
| 225 |
+
|
| 226 |
+
### Code Citation (`hexstate_quantize.c`, lines ~1502–1506)
|
| 227 |
+
```c
|
| 228 |
+
for (int ci = 0; ci < Q4_N_CAND; ci++) {
|
| 229 |
+
int qi = Q4_CAND_TO_QUHIT[ci];
|
| 230 |
+
amp_re[qi] += exp(-(double)(agg_errors[ci] - min_err) /
|
| 231 |
+
(2.0 * (double)temperature));
|
| 232 |
+
}
|
| 233 |
+
```
|
| 234 |
+
|
| 235 |
+
## 3. The Griffiths-Niu Sequential Measurement
|
| 236 |
+
This is where the global coordination happens. The function `shor_measure_graph` (lines 1166–1314) executes a sequential measurement MSB to LSB. This replaces standard Belief Propagation with a deterministic evaluation that creates massive global correlation.
|
| 237 |
+
|
| 238 |
+
For each block $k$ being measured:
|
| 239 |
+
|
| 240 |
+
### A. Neighbor Contribution (Entanglement)
|
| 241 |
+
It evaluates how neighboring blocks influence block $k$ by projecting their current amplitudes across the graph edges.
|
| 242 |
+
```c
|
| 243 |
+
// hexstate_quantize.c: lines 1217-1221
|
| 244 |
+
sr += lr*wr - li*wi;
|
| 245 |
+
si += lr*wi + li*wr;
|
| 246 |
+
```
|
| 247 |
+
|
| 248 |
+
### B. Feed-Forward Phase Correction
|
| 249 |
+
It applies a phase shift based on the outcomes of all blocks measured before it, a signature trait of the semi-classical QFT used in Shor's algorithm.
|
| 250 |
+
```c
|
| 251 |
+
// hexstate_quantize.c: lines 1181-1184
|
| 252 |
+
double power = 36.0;
|
| 253 |
+
for (int64_t j = k + 1; j < n_sites; j++) {
|
| 254 |
+
theta_k += (double)measured_out[j] / power;
|
| 255 |
+
power *= 6.0;
|
| 256 |
+
}
|
| 257 |
+
```
|
| 258 |
+
|
| 259 |
+
### C. IDFT6 and Constructive Interference
|
| 260 |
+
It runs an Inverse Discrete Fourier Transform (IDFT6). Because the neighbor influence ($C_k$) was baked into the amplitudes *before* the IDFT, the IDFT acts as a coherence filter. **It produces constructive interference peaks precisely at the scale candidate that creates the best global configuration.**
|
| 261 |
+
```c
|
| 262 |
+
// hexstate_quantize.c: lines 1256-1261
|
| 263 |
+
double angle = 2.0 * 3.14159265358979323846 * d * v / 6.0;
|
| 264 |
+
double er = cos(angle), ei = sin(angle);
|
| 265 |
+
sum_re += alpha_re[d]*er - alpha_im[d]*ei;
|
| 266 |
+
sum_im += alpha_re[d]*ei + alpha_im[d]*er;
|
| 267 |
+
```
|
| 268 |
+
|
| 269 |
+
### D. Measurement and Back-Action (The "Magic Pointer")
|
| 270 |
+
Once an optimal state is selected (using `argmax` on the squared amplitudes via the Born Rule), the engine calls `shor_collapse_site`.
|
| 271 |
+
|
| 272 |
+
This function creates the **global error decision**. It doesn't just lock in block $k$'s choice; it propagates the decision to all unmeasured neighbors.
|
| 273 |
+
```c
|
| 274 |
+
// hexstate_quantize.c: lines 1120-1121
|
| 275 |
+
double old_re = pq->edge_re[d], old_im = pq->edge_im[d];
|
| 276 |
+
pq->edge_re[d] = old_re * w_re - old_im * w_im;
|
| 277 |
+
pq->edge_im[d] = old_re * w_im + old_im * w_re;
|
| 278 |
+
```
|
| 279 |
+
This back-action fundamentally alters the amplitudes of adjacent blocks, *conditioning* their future quantization choices on the choice just made for block $k$. This creates anti-correlated quantization noise across the entire tensor, ensuring that when the matrix is multiplied against an activation vector, the localized errors cancel each other out.
|
| 280 |
+
|
| 281 |
+
## 4. Beam Search Refinement
|
| 282 |
+
Because the measurement is performed left-to-right (MSB to LSB), the sequence is prone to greedy failure. The HPC engine circumvents this using a **24-Beam Hensel Search**.
|
| 283 |
+
|
| 284 |
+
Instead of accepting the single path created by the graph collapse, it maintains 24 parallel quantization paths (`Q4_N_BEAMS`). It evaluates candidate extensions by dividing the Shor probability (from the graph marginals) by the local normalized error, advancing the 24 best global configurations simultaneously.
|
| 285 |
+
|
| 286 |
+
### Code Citation (`hexstate_quantize.c`, lines ~1581–1582)
|
| 287 |
+
```c
|
| 288 |
+
double ext_err = beams[b].acc_error + cand_errors[blk][c];
|
| 289 |
+
extensions[n_ext].score = cand_score[c] / (ext_err + 1e-15);
|
| 290 |
+
```
|
| 291 |
+
|
| 292 |
+
## Summary
|
| 293 |
+
The global error decision is not an iterative smoothing process (like BP). It is an exact evaluation of quantum interference. By encoding quantization errors into complex amplitudes, passing them through an IDFT6, and forcing neighbors to condition their subsequent scale selections via wave-collapse back-action, the engine ensures that the chosen scales globally cancel each other's destructive matrix-multiplication errors.
|
| 294 |
+
|
| 295 |
+
|
| 296 |
+
|
| 297 |
## 10. License
|
| 298 |
The quantizer code is part of the HPC project (MIT). Quantized models inherit the license of the base model (e.g., Gemma Terms of Use).
|
| 299 |
+
|