---
language: en
license: apache-2.0
library_name: transformers
tags:
  - sparse-retrieval
  - information-retrieval
  - bert
  - fles1
datasets:
  - ms_marco
metrics:
  - ndcg
  - mrr
---

# FLES-1 v14 — Sparse Lexical Encoder (Best Quality)

> **Paper:** [Closed-Loop FLOPS Regulation for Learned Sparse Retrieval](https://mindoval.com/ai-research) — Golvis Tavarez, Mindoval, Inc.

## Model Description

FLES-1 transforms text into interpretable sparse vectors using BERT's MLM predictions. Each of the 30,522 dimensions corresponds to a real vocabulary word — readable, debuggable, and compatible with standard inverted indices (Elasticsearch, OpenSearch).

Trained with two novel techniques:
- **L1 FLOPS regularization** — eliminates the gradient explosion that causes training instability in all published sparse retrieval models
- **Step-interval CLFR** — closed-loop sparsity control that adjusts regularization every ~6,250 steps (one epoch in our setup) based on measured sparsity

## Metrics

### nfcorpus (threshold=0.3)

| Metric | Value |
|--------|-------|
| NDCG@10 | 0.3049 |
| MRR | 0.5182 |
| Recall@100 | 0.2544 |
| Avg NNZ | 359 |

### Reproducibility

This recipe was run 5 times with different seeds:

| Seed | NDCG@10 |
|------|---------|
| v14 (original) | 0.305 |
| v17c | 0.299 |
| v31a | 0.299 |
| v32 (seed=7777) | 0.299 |
| v26a (seed=42) | 0.272 |

**Mean: 0.295. Std: 0.013.** v14 is at the high end of variance. Expected reproduction: 0.295 ± 0.013.

### Baselines

| Model | NDCG@10 | NNZ | Distillation | Training Data |
|-------|---------|-----|-------------|---------------|
| **FLES-1 v14** | **0.305** | **359** | **None** | **200K MS MARCO** |
| BM25 (Pyserini, stemmed) | 0.325 | — | — | — |
| BM25 (regex, no stemming) | 0.307 | — | — | — |
| SPLADE-Doc (no distillation) | 0.323 | — | None | Full MS MARCO |
| SPLADE original (no distillation) | 0.336 | — | None | Full MS MARCO |
| SPLADE-cocondenser (distilled) | 0.340 | 125 | Cross-encoder | Full MS MARCO |

FLES-1 v14 is 6% behind Pyserini BM25 (0.325) and 6-10% behind non-distilled SPLADE variants. The paper's contribution is the training methodology (CLFR, L1 FLOPS, lambda-steps tradeoff), not the absolute numbers.

### Cross-Domain (zero-shot)

| Dataset | Domain | NDCG@10 |
|---------|--------|---------|
| nfcorpus | Medical | 0.305 |
| scifact | Scientific claims | 0.557 |
| fiqa | Financial Q&A | 0.212 |
| arguana | Argument retrieval | 0.142 |
| scidocs | Scientific docs | 0.112 |

### Production

| Metric | GPU (A100) | CPU |
|--------|-----------|-----|
| Encoding | 245 docs/sec | 87 docs/sec |
| Query latency | 10 ms avg | 33 ms avg |
| Index size (1K docs) | 0.32 MB | — |
| vs dense 768d | 9.5x smaller | — |

## Training

```
Foundation: fles1-v12b (2 generations from bert-base-uncased)
Data: 200,000 MS MARCO random negatives
Epochs: 2 (12,500 steps)
Loss: InfoNCE (τ=0.05) + L1 FLOPS (λ_d=0.00003) + anti-collapse
Controller: Step-interval CLFR, adjusted every ~6,250 steps (target_nnz_d=400, gain=0.1)
Optimizer: AdamW, lr=2e-5, batch_size=32, 7 negatives per query
Hardware: 1× A100 80GB, ~2 hours
```

## The CLFR Paper

> Full paper coming soon.

This model is the primary result of a 75-run empirical study of training dynamics in sparse retrieval. The study discovered:

- L1 FLOPS regularization (reduces training crashes from 10-17 to 0-7 per run)
- Epoch-level closed-loop sparsity control (1 adjustment per ~6,250 steps outperforms 12,500 per-step adjustments)
- The lambda-steps tradeoff (eff_reg = λ × steps, sweet spot 0.10-0.20)
- The binary contrastive ceiling (0.298 ± 0.007 for InfoNCE with random negatives)
- Checkpoint archaeology (longitudinal weight analysis across 43 training runs)

## Limitations

- Trained on MS MARCO (English web Q&A). Domain transfer to non-English or specialized domains requires fine-tuning.
- NNZ=359 is denser than SPLADE (125). For latency-critical deployments, consider [fles1-v12b](../fles1-v12b/) (NNZ=139).
- The 0.305 result is at the high end of variance for this recipe (mean=0.295).
- Does not use knowledge distillation — the gap to SPLADE (10.4%) is structural.

## Usage

```python
from fles1_encoder import FLES1Encoder

# Load model
encoder = FLES1Encoder.from_pretrained("mindoval/fles1-v14")

# Encode text to sparse vector
sparse = encoder.encode("What is machine learning?")
# Returns: {'machine': 1.39, 'learning': 1.08, 'machines': 0.63, ...}

# Batch encode
vectors = encoder.encode_batch(["query 1", "query 2"], batch_size=32)

# Encode to term IDs (for inverted index)
ids, weights = encoder.encode_to_ids("What is machine learning?")
```

## License

Apache 2.0

*Golvis Tavarez — Mindoval, Inc.*
*We thank Microsoft Corporation for supporting this research through the Microsoft for Startups program.*
*[https://mindoval.com/ai-research](https://mindoval.com/ai-research)*