--- language: en license: apache-2.0 library_name: transformers tags: - sparse-retrieval - information-retrieval - bert - fles1 datasets: - ms_marco metrics: - ndcg - mrr --- # FLES-1 v14 — Sparse Lexical Encoder (Best Quality) > **Paper:** [Closed-Loop FLOPS Regulation for Learned Sparse Retrieval](https://mindoval.com/ai-research) — Golvis Tavarez, Mindoval, Inc. ## Model Description FLES-1 transforms text into interpretable sparse vectors using BERT's MLM predictions. Each of the 30,522 dimensions corresponds to a real vocabulary word — readable, debuggable, and compatible with standard inverted indices (Elasticsearch, OpenSearch). Trained with two novel techniques: - **L1 FLOPS regularization** — eliminates the gradient explosion that causes training instability in all published sparse retrieval models - **Step-interval CLFR** — closed-loop sparsity control that adjusts regularization every ~6,250 steps (one epoch in our setup) based on measured sparsity ## Metrics ### nfcorpus (threshold=0.3) | Metric | Value | |--------|-------| | NDCG@10 | 0.3049 | | MRR | 0.5182 | | Recall@100 | 0.2544 | | Avg NNZ | 359 | ### Reproducibility This recipe was run 5 times with different seeds: | Seed | NDCG@10 | |------|---------| | v14 (original) | 0.305 | | v17c | 0.299 | | v31a | 0.299 | | v32 (seed=7777) | 0.299 | | v26a (seed=42) | 0.272 | **Mean: 0.295. Std: 0.013.** v14 is at the high end of variance. Expected reproduction: 0.295 ± 0.013. ### Baselines | Model | NDCG@10 | NNZ | Distillation | Training Data | |-------|---------|-----|-------------|---------------| | **FLES-1 v14** | **0.305** | **359** | **None** | **200K MS MARCO** | | BM25 (Pyserini, stemmed) | 0.325 | — | — | — | | BM25 (regex, no stemming) | 0.307 | — | — | — | | SPLADE-Doc (no distillation) | 0.323 | — | None | Full MS MARCO | | SPLADE original (no distillation) | 0.336 | — | None | Full MS MARCO | | SPLADE-cocondenser (distilled) | 0.340 | 125 | Cross-encoder | Full MS MARCO | FLES-1 v14 is 6% behind Pyserini BM25 (0.325) and 6-10% behind non-distilled SPLADE variants. The paper's contribution is the training methodology (CLFR, L1 FLOPS, lambda-steps tradeoff), not the absolute numbers. ### Cross-Domain (zero-shot) | Dataset | Domain | NDCG@10 | |---------|--------|---------| | nfcorpus | Medical | 0.305 | | scifact | Scientific claims | 0.557 | | fiqa | Financial Q&A | 0.212 | | arguana | Argument retrieval | 0.142 | | scidocs | Scientific docs | 0.112 | ### Production | Metric | GPU (A100) | CPU | |--------|-----------|-----| | Encoding | 245 docs/sec | 87 docs/sec | | Query latency | 10 ms avg | 33 ms avg | | Index size (1K docs) | 0.32 MB | — | | vs dense 768d | 9.5x smaller | — | ## Training ``` Foundation: fles1-v12b (2 generations from bert-base-uncased) Data: 200,000 MS MARCO random negatives Epochs: 2 (12,500 steps) Loss: InfoNCE (τ=0.05) + L1 FLOPS (λ_d=0.00003) + anti-collapse Controller: Step-interval CLFR, adjusted every ~6,250 steps (target_nnz_d=400, gain=0.1) Optimizer: AdamW, lr=2e-5, batch_size=32, 7 negatives per query Hardware: 1× A100 80GB, ~2 hours ``` ## The CLFR Paper > Full paper coming soon. This model is the primary result of a 75-run empirical study of training dynamics in sparse retrieval. The study discovered: - L1 FLOPS regularization (reduces training crashes from 10-17 to 0-7 per run) - Epoch-level closed-loop sparsity control (1 adjustment per ~6,250 steps outperforms 12,500 per-step adjustments) - The lambda-steps tradeoff (eff_reg = λ × steps, sweet spot 0.10-0.20) - The binary contrastive ceiling (0.298 ± 0.007 for InfoNCE with random negatives) - Checkpoint archaeology (longitudinal weight analysis across 43 training runs) ## Limitations - Trained on MS MARCO (English web Q&A). Domain transfer to non-English or specialized domains requires fine-tuning. - NNZ=359 is denser than SPLADE (125). For latency-critical deployments, consider [fles1-v12b](../fles1-v12b/) (NNZ=139). - The 0.305 result is at the high end of variance for this recipe (mean=0.295). - Does not use knowledge distillation — the gap to SPLADE (10.4%) is structural. ## Usage ```python from fles1_encoder import FLES1Encoder # Load model encoder = FLES1Encoder.from_pretrained("mindoval/fles1-v14") # Encode text to sparse vector sparse = encoder.encode("What is machine learning?") # Returns: {'machine': 1.39, 'learning': 1.08, 'machines': 0.63, ...} # Batch encode vectors = encoder.encode_batch(["query 1", "query 2"], batch_size=32) # Encode to term IDs (for inverted index) ids, weights = encoder.encode_to_ids("What is machine learning?") ``` ## License Apache 2.0 *Golvis Tavarez — Mindoval, Inc.* *We thank Microsoft Corporation for supporting this research through the Microsoft for Startups program.* *[https://mindoval.com/ai-research](https://mindoval.com/ai-research)*