CompressedGemma commited on
Commit
dd6d6ba
·
verified ·
1 Parent(s): 96fce02

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +106 -2
README.md CHANGED
@@ -100,8 +100,6 @@ The HPC process executes in multiple phases to guarantee globally coherent scali
100
  4. **Phase 4: 24-Beam Hensel Search**: Maintains 24 parallel configuration beams across the tensor, branching candidates evaluated via triality-weighted scoring.
101
  5. **Phase 5: Sub-Block Shor Refinement**: A second, smaller Shor sequential measurement over a 16-node graph corresponding to the 16 sub-blocks within each 256-weight superblock.
102
 
103
-
104
-
105
  ## 6. Prerequisites and Build Instructions
106
 
107
  Before you can quantize models, you must build the Shor-optimized HPC C engine.
@@ -191,5 +189,111 @@ The quantizer reports a fidelity rating based on total RMSE across all quantized
191
  - **RMSE is higher than standard Q2_K:** This is intentional. The D₆ vesica gate trades total RMSE for computation-aligned error minimization.
192
  - **libhexstate_q2k.so not found:** Make sure to compile the C engine using `make -f makefile.quantize`.
193
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
194
  ## 10. License
195
  The quantizer code is part of the HPC project (MIT). Quantized models inherit the license of the base model (e.g., Gemma Terms of Use).
 
 
100
  4. **Phase 4: 24-Beam Hensel Search**: Maintains 24 parallel configuration beams across the tensor, branching candidates evaluated via triality-weighted scoring.
101
  5. **Phase 5: Sub-Block Shor Refinement**: A second, smaller Shor sequential measurement over a 16-node graph corresponding to the 16 sub-blocks within each 256-weight superblock.
102
 
 
 
103
  ## 6. Prerequisites and Build Instructions
104
 
105
  Before you can quantize models, you must build the Shor-optimized HPC C engine.
 
189
  - **RMSE is higher than standard Q2_K:** This is intentional. The D₆ vesica gate trades total RMSE for computation-aligned error minimization.
190
  - **libhexstate_q2k.so not found:** Make sure to compile the C engine using `make -f makefile.quantize`.
191
 
192
+ # How the HPC Engine Makes Global Error Decisions
193
+
194
+ The HPC (Holographic Phase Contraction) engine takes a fundamentally different approach to LLM weight quantization. Instead of minimizing local error block-by-block, it frames scale selection as a **global quantum-inspired optimization problem**. It maps quantization candidates to a constraint graph and evaluates them using the **Griffiths-Niu Sequential Measurement** protocol derived from Shor's factoring algorithm.
195
+
196
+ Here is a step-by-step breakdown of how the engine makes global error decisions, directly referencing the implementation in `hexstate_quantize.c`.
197
+
198
+ ---
199
+
200
+ ## 1. Candidate Generation and D₆ Vesica Scoring
201
+ Before any global optimization occurs, the engine needs candidates. For each block of weights, it generates candidate scales (`d` and `dmin`) around a baseline least-squares optimum.
202
+
203
+ Instead of scoring these candidates using standard Mean Squared Error (MSE), it uses the **D₆ Vesica Gate**.
204
+
205
+ ### Code Citation (`hexstate_quantize.c`, lines ~1420–1432)
206
+ ```c
207
+ /* Decompose into vesica (DC) and wave (AC) components */
208
+ float vesica_err = 0.0f, wave_err = 0.0f;
209
+ for (int p = 0; p < half_g; p++) {
210
+ float v = e_cur[p] + e_cur[p + half_g];
211
+ float w_wave = e_cur[p] - e_cur[p + half_g];
212
+ float w_avg = (w_cur[p] + w_cur[p + half_g]) * 0.5f;
213
+ vesica_err += v * v * w_avg;
214
+ wave_err += w_wave * w_wave * w_avg;
215
+ }
216
+ /* Triality weighting: penalize vesica 4×, wave 1×. */
217
+ err += 0.5f * (4.0f * vesica_err + 1.0f * wave_err);
218
+ ```
219
+ **Why this matters:** `vesica_err` represents errors that sum together during matrix multiplication, propagating through the network and destroying reasoning. `wave_err` represents errors that naturally cancel out. By penalizing `vesica_err` by 4x, the engine heavily biases towards candidates whose errors cancel out during inference, even if their total Euclidean distance (MSE) from the original weights is larger.
220
+
221
+ ## 2. Graph Construction and Boltzmann Amplitudes
222
+ The selected candidates and their Vesica errors are grouped into 6 bins (representing the 6 states of a $Z_6$ "quhit" or quantum digit). The engine constructs an `HPCGraph` mapping each block to one or more quhits.
223
+
224
+ The errors are transformed into "Boltzmann amplitudes"—representing the likelihood of selecting each state.
225
+
226
+ ### Code Citation (`hexstate_quantize.c`, lines ~1502–1506)
227
+ ```c
228
+ for (int ci = 0; ci < Q4_N_CAND; ci++) {
229
+ int qi = Q4_CAND_TO_QUHIT[ci];
230
+ amp_re[qi] += exp(-(double)(agg_errors[ci] - min_err) /
231
+ (2.0 * (double)temperature));
232
+ }
233
+ ```
234
+
235
+ ## 3. The Griffiths-Niu Sequential Measurement
236
+ This is where the global coordination happens. The function `shor_measure_graph` (lines 1166–1314) executes a sequential measurement MSB to LSB. This replaces standard Belief Propagation with a deterministic evaluation that creates massive global correlation.
237
+
238
+ For each block $k$ being measured:
239
+
240
+ ### A. Neighbor Contribution (Entanglement)
241
+ It evaluates how neighboring blocks influence block $k$ by projecting their current amplitudes across the graph edges.
242
+ ```c
243
+ // hexstate_quantize.c: lines 1217-1221
244
+ sr += lr*wr - li*wi;
245
+ si += lr*wi + li*wr;
246
+ ```
247
+
248
+ ### B. Feed-Forward Phase Correction
249
+ It applies a phase shift based on the outcomes of all blocks measured before it, a signature trait of the semi-classical QFT used in Shor's algorithm.
250
+ ```c
251
+ // hexstate_quantize.c: lines 1181-1184
252
+ double power = 36.0;
253
+ for (int64_t j = k + 1; j < n_sites; j++) {
254
+ theta_k += (double)measured_out[j] / power;
255
+ power *= 6.0;
256
+ }
257
+ ```
258
+
259
+ ### C. IDFT6 and Constructive Interference
260
+ It runs an Inverse Discrete Fourier Transform (IDFT6). Because the neighbor influence ($C_k$) was baked into the amplitudes *before* the IDFT, the IDFT acts as a coherence filter. **It produces constructive interference peaks precisely at the scale candidate that creates the best global configuration.**
261
+ ```c
262
+ // hexstate_quantize.c: lines 1256-1261
263
+ double angle = 2.0 * 3.14159265358979323846 * d * v / 6.0;
264
+ double er = cos(angle), ei = sin(angle);
265
+ sum_re += alpha_re[d]*er - alpha_im[d]*ei;
266
+ sum_im += alpha_re[d]*ei + alpha_im[d]*er;
267
+ ```
268
+
269
+ ### D. Measurement and Back-Action (The "Magic Pointer")
270
+ Once an optimal state is selected (using `argmax` on the squared amplitudes via the Born Rule), the engine calls `shor_collapse_site`.
271
+
272
+ This function creates the **global error decision**. It doesn't just lock in block $k$'s choice; it propagates the decision to all unmeasured neighbors.
273
+ ```c
274
+ // hexstate_quantize.c: lines 1120-1121
275
+ double old_re = pq->edge_re[d], old_im = pq->edge_im[d];
276
+ pq->edge_re[d] = old_re * w_re - old_im * w_im;
277
+ pq->edge_im[d] = old_re * w_im + old_im * w_re;
278
+ ```
279
+ This back-action fundamentally alters the amplitudes of adjacent blocks, *conditioning* their future quantization choices on the choice just made for block $k$. This creates anti-correlated quantization noise across the entire tensor, ensuring that when the matrix is multiplied against an activation vector, the localized errors cancel each other out.
280
+
281
+ ## 4. Beam Search Refinement
282
+ Because the measurement is performed left-to-right (MSB to LSB), the sequence is prone to greedy failure. The HPC engine circumvents this using a **24-Beam Hensel Search**.
283
+
284
+ Instead of accepting the single path created by the graph collapse, it maintains 24 parallel quantization paths (`Q4_N_BEAMS`). It evaluates candidate extensions by dividing the Shor probability (from the graph marginals) by the local normalized error, advancing the 24 best global configurations simultaneously.
285
+
286
+ ### Code Citation (`hexstate_quantize.c`, lines ~1581–1582)
287
+ ```c
288
+ double ext_err = beams[b].acc_error + cand_errors[blk][c];
289
+ extensions[n_ext].score = cand_score[c] / (ext_err + 1e-15);
290
+ ```
291
+
292
+ ## Summary
293
+ The global error decision is not an iterative smoothing process (like BP). It is an exact evaluation of quantum interference. By encoding quantization errors into complex amplitudes, passing them through an IDFT6, and forcing neighbors to condition their subsequent scale selections via wave-collapse back-action, the engine ensures that the chosen scales globally cancel each other's destructive matrix-multiplication errors.
294
+
295
+
296
+
297
  ## 10. License
298
  The quantizer code is part of the HPC project (MIT). Quantized models inherit the license of the base model (e.g., Gemma Terms of Use).
299
+