oneryalcin commited on
Commit
b82c4e6
·
verified ·
1 Parent(s): f8392aa

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +42 -19
README.md CHANGED
@@ -271,30 +271,53 @@ English-pretrained models on our training signal.
271
 
272
  ---
273
 
274
- ## Production recommendation
275
 
276
- For a real chess search system, the winning architecture is:
277
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
278
  ```
279
- Stage 1: Static embedding (this model)
280
- - Encode chess-format corpus (4M params, ~9MB)
281
- - Sub-millisecond CPU inference
282
- - Retrieve top-200 candidates via cosine similarity
283
- - Recall@200 = 93.5%
284
-
285
- Stage 2: BM25 over English-bridged corpus
286
- - python-chess + regex (one-time, $0)
287
- - Index the English versions of all docs
288
- - Rerank top-200 candidates to top-10
289
- - NDCG@10 ≈ 0.55-0.62
290
- ```
291
 
292
- **Total: <10ms/query, $0 inference cost, no GPU.**
 
 
 
 
 
 
293
 
294
- The cross-encoder is only worth adding if you have GPU available AND you train
295
- it on a fundamentally different signal (e.g., human-annotated relevance,
296
- chess-engine strategic descriptions, or much more parameters with chess in
297
- pretraining).
298
 
299
  ---
300
 
 
271
 
272
  ---
273
 
274
+ ## Production recommendation — and a surprising honest finding
275
 
276
+ **The static embedding model is not needed for this task.** A direct comparison:
277
 
278
+ | Approach | NDCG@10 (200 unseen-combo queries × 600 docs) |
279
+ |---|---|
280
+ | Static (v4-C2) alone | 0.1202 |
281
+ | BM25 alone over chess-format docs | 0.0107 |
282
+ | **BM25 alone over English-bridged docs** | **1.0000** |
283
+ | Static + BM25 RRF fusion | 0.4940 |
284
+
285
+ **BM25 over deterministically-English-converted documents achieves PERFECT
286
+ ranking (1.0000 NDCG@10) on this eval.** No embedding model needed. No training.
287
+ No GPU.
288
+
289
+ Why: our queries are theme tokens (`fork endgame`), and the English-bridged
290
+ docs explicitly contain those words (`"Short endgame puzzle with fork..."`).
291
+ This is BM25's natural strength — keyword overlap detection. The static model
292
+ labors to learn token-cluster mappings; BM25 just reads the words directly.
293
+
294
+ ### Actual production architecture (the simple answer)
295
+
296
+ ```python
297
+ import chess, re
298
+ from rank_bm25 import BM25Okapi
299
+
300
+ # One-time: convert all puzzles to English (use scripts/convert_to_english.py)
301
+ # Build BM25 index over the English-converted corpus
302
+ bm25 = BM25Okapi([english_doc.split() for english_doc in corpus])
303
+
304
+ # Query
305
+ query = "fork endgame short" # or any theme combo / opening name
306
+ top_indices = bm25.get_top_n(query.split(), corpus_ids, n=10)
307
  ```
 
 
 
 
 
 
 
 
 
 
 
 
308
 
309
+ **Total: <10ms/query, $0 cost, no model, no GPU, no training.**
310
+
311
+ ### When the static embedding would actually help
312
+
313
+ 1. **Natural-language paraphrased queries**: user types `"two-piece tactical in late game"` instead of `"fork endgame"`. BM25 wouldn't match those words. Static (trained with paraphrase augmentation) could match via learned semantic similarity. **We never tested this.**
314
+ 2. **Cross-lingual queries**: BM25 needs exact lexical overlap; embeddings can cross language barriers.
315
+ 3. **Very large corpora** where BM25 index size becomes an issue, embeddings are more storage-efficient per doc.
316
 
317
+ For our actual eval setup (theme-token queries on Lichess puzzles), the static
318
+ model loses by to BM25-over-English-bridged. The static training exercise
319
+ produced valuable methodology insights (especially the LLM-bridge pattern) but
320
+ was the wrong tool for the actual production problem.
321
 
322
  ---
323