Spaces:
Runtime error
Runtime error
| ### Putting it all together | |
| When you use the document encoder in an indexing pipeline, the rewritten document contents are indexed: | |
| <div class="pipeline"> | |
| <div class="df" title="Document Frame">D</div> | |
| <div class="transformer attn" title="SPLADE Indexing Transformer">SPLADE</div> | |
| <div class="df" title="Document Frame">D</div> | |
| <div class="transformer" title="Indexer">Indexer</div> | |
| <div class="artefact" title="SPLADE Index">IDX</div> | |
| </div> | |
| ```python | |
| import pyterrier as pt | |
| import pyt_splade | |
| dataset = pt.get_dataset('irds:msmarco-passage') | |
| splade = pyt_splade.Splade() | |
| indexer = pt.IterDictIndexer('./msmarco_psg', pretokenised=True) | |
| indxer_pipe = splade.doc_encoder() >> indexer | |
| indxer_pipe.index(dataset.get_corpus_iter()) | |
| ``` | |
| Once you built an index, you can build a retrieval pipeline that first encodes the query, | |
| and then performs retrieval: | |
| <div class="pipeline"> | |
| <div class="df" title="Query Frame">Q</div> | |
| <div class="transformer attn" title="SPLADE Query Transformer">SPLADE</div> | |
| <div class="df" title="Query Frame">Q</div> | |
| <div class="transformer" title="Term Frequency Transformer">TF Retriever <div class="artefact" title="SPLADE Index">IDX</div></div> | |
| <div class="df" title="Result Frame">R</div> | |
| </div> | |
| ```python | |
| splade_retr = splade.query_encoder() >> pt.terrier.Retriever('./msmarco_psg', wmodel='Tf') | |
| ``` | |
| ### References & Credits | |
| This package uses [Naver's SPLADE repository](https://github.com/naver/splade). | |
| - Thibault Formal, Benjamin Piwowarski, Stéphane Clinchant. [SPLADE: Sparse Lexical and Expansion Model for First Stage Ranking](https://arxiv.org/abs/2107.05720). SIGIR 2021. | |
| - Craig Macdonald, Nicola Tonellotto, Sean MacAvaney, Iadh Ounis. [PyTerrier: Declarative Experimentation in Python from BM25 to Dense Retrieval](https://dl.acm.org/doi/abs/10.1145/3459637.3482013). CIKM 2021. | |