Instructions to use mindoval/fles1-v14 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use mindoval/fles1-v14 with Transformers:
# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("mindoval/fles1-v14", dtype="auto") - Notebooks
- Google Colab
- Kaggle
| language: en | |
| license: apache-2.0 | |
| library_name: transformers | |
| tags: | |
| - sparse-retrieval | |
| - information-retrieval | |
| - bert | |
| - fles1 | |
| datasets: | |
| - ms_marco | |
| metrics: | |
| - ndcg | |
| - mrr | |
| # FLES-1 v14 β Sparse Lexical Encoder (Best Quality) | |
| > **Paper:** [Closed-Loop FLOPS Regulation for Learned Sparse Retrieval](https://mindoval.com/ai-research) β Golvis Tavarez, Mindoval, Inc. | |
| ## Model Description | |
| FLES-1 transforms text into interpretable sparse vectors using BERT's MLM predictions. Each of the 30,522 dimensions corresponds to a real vocabulary word β readable, debuggable, and compatible with standard inverted indices (Elasticsearch, OpenSearch). | |
| Trained with two novel techniques: | |
| - **L1 FLOPS regularization** β eliminates the gradient explosion that causes training instability in all published sparse retrieval models | |
| - **Step-interval CLFR** β closed-loop sparsity control that adjusts regularization every ~6,250 steps (one epoch in our setup) based on measured sparsity | |
| ## Metrics | |
| ### nfcorpus (threshold=0.3) | |
| | Metric | Value | | |
| |--------|-------| | |
| | NDCG@10 | 0.3049 | | |
| | MRR | 0.5182 | | |
| | Recall@100 | 0.2544 | | |
| | Avg NNZ | 359 | | |
| ### Reproducibility | |
| This recipe was run 5 times with different seeds: | |
| | Seed | NDCG@10 | | |
| |------|---------| | |
| | v14 (original) | 0.305 | | |
| | v17c | 0.299 | | |
| | v31a | 0.299 | | |
| | v32 (seed=7777) | 0.299 | | |
| | v26a (seed=42) | 0.272 | | |
| **Mean: 0.295. Std: 0.013.** v14 is at the high end of variance. Expected reproduction: 0.295 Β± 0.013. | |
| ### Baselines | |
| | Model | NDCG@10 | NNZ | Distillation | Training Data | | |
| |-------|---------|-----|-------------|---------------| | |
| | **FLES-1 v14** | **0.305** | **359** | **None** | **200K MS MARCO** | | |
| | BM25 (Pyserini, stemmed) | 0.325 | β | β | β | | |
| | BM25 (regex, no stemming) | 0.307 | β | β | β | | |
| | SPLADE-Doc (no distillation) | 0.323 | β | None | Full MS MARCO | | |
| | SPLADE original (no distillation) | 0.336 | β | None | Full MS MARCO | | |
| | SPLADE-cocondenser (distilled) | 0.340 | 125 | Cross-encoder | Full MS MARCO | | |
| FLES-1 v14 is 6% behind Pyserini BM25 (0.325) and 6-10% behind non-distilled SPLADE variants. The paper's contribution is the training methodology (CLFR, L1 FLOPS, lambda-steps tradeoff), not the absolute numbers. | |
| ### Cross-Domain (zero-shot) | |
| | Dataset | Domain | NDCG@10 | | |
| |---------|--------|---------| | |
| | nfcorpus | Medical | 0.305 | | |
| | scifact | Scientific claims | 0.557 | | |
| | fiqa | Financial Q&A | 0.212 | | |
| | arguana | Argument retrieval | 0.142 | | |
| | scidocs | Scientific docs | 0.112 | | |
| ### Production | |
| | Metric | GPU (A100) | CPU | | |
| |--------|-----------|-----| | |
| | Encoding | 245 docs/sec | 87 docs/sec | | |
| | Query latency | 10 ms avg | 33 ms avg | | |
| | Index size (1K docs) | 0.32 MB | β | | |
| | vs dense 768d | 9.5x smaller | β | | |
| ## Training | |
| ``` | |
| Foundation: fles1-v12b (2 generations from bert-base-uncased) | |
| Data: 200,000 MS MARCO random negatives | |
| Epochs: 2 (12,500 steps) | |
| Loss: InfoNCE (Ο=0.05) + L1 FLOPS (Ξ»_d=0.00003) + anti-collapse | |
| Controller: Step-interval CLFR, adjusted every ~6,250 steps (target_nnz_d=400, gain=0.1) | |
| Optimizer: AdamW, lr=2e-5, batch_size=32, 7 negatives per query | |
| Hardware: 1Γ A100 80GB, ~2 hours | |
| ``` | |
| ## The CLFR Paper | |
| > Full paper coming soon. | |
| This model is the primary result of a 75-run empirical study of training dynamics in sparse retrieval. The study discovered: | |
| - L1 FLOPS regularization (reduces training crashes from 10-17 to 0-7 per run) | |
| - Epoch-level closed-loop sparsity control (1 adjustment per ~6,250 steps outperforms 12,500 per-step adjustments) | |
| - The lambda-steps tradeoff (eff_reg = Ξ» Γ steps, sweet spot 0.10-0.20) | |
| - The binary contrastive ceiling (0.298 Β± 0.007 for InfoNCE with random negatives) | |
| - Checkpoint archaeology (longitudinal weight analysis across 43 training runs) | |
| ## Limitations | |
| - Trained on MS MARCO (English web Q&A). Domain transfer to non-English or specialized domains requires fine-tuning. | |
| - NNZ=359 is denser than SPLADE (125). For latency-critical deployments, consider [fles1-v12b](../fles1-v12b/) (NNZ=139). | |
| - The 0.305 result is at the high end of variance for this recipe (mean=0.295). | |
| - Does not use knowledge distillation β the gap to SPLADE (10.4%) is structural. | |
| ## Usage | |
| ```python | |
| from fles1_encoder import FLES1Encoder | |
| # Load model | |
| encoder = FLES1Encoder.from_pretrained("mindoval/fles1-v14") | |
| # Encode text to sparse vector | |
| sparse = encoder.encode("What is machine learning?") | |
| # Returns: {'machine': 1.39, 'learning': 1.08, 'machines': 0.63, ...} | |
| # Batch encode | |
| vectors = encoder.encode_batch(["query 1", "query 2"], batch_size=32) | |
| # Encode to term IDs (for inverted index) | |
| ids, weights = encoder.encode_to_ids("What is machine learning?") | |
| ``` | |
| ## License | |
| Apache 2.0 | |
| *Golvis Tavarez β Mindoval, Inc.* | |
| *We thank Microsoft Corporation for supporting this research through the Microsoft for Startups program.* | |
| *[https://mindoval.com/ai-research](https://mindoval.com/ai-research)* |