| | --- |
| | license: mit |
| | --- |
| | # Translation Tables for Probablistic Structured Queries |
| |
|
| | This repository contains the raw translation tables for tha package [`fast_psq`](https://github.com/hltcoe/PSQ). |
| | Please refer to the GitHub for more information. |
| | The following is a brief example for using the tables. |
| |
|
| | ## Get started |
| |
|
| | `fast_psq` is available on PyPI. |
| | ```bash |
| | pip install fast_psq ir_datasets ir_measures |
| | ``` |
| |
|
| | The following is an example indexing command. |
| | ```bash |
| | python -m fast_psq.index \ |
| | --doc_file irds:neuclir/1/zh/trec-2022 \ |
| | --lang zh \ |
| | --psq_file hltcoe/psq_translation_tables:zh.table.dict.gz \ |
| | --min_translation_prob 0.00010 \ |
| | --max_translation_alternatives 64 \ |
| | --max_translation_cdf 0.99 \ |
| | --docid doc_id \ |
| | --title title \ |
| | --body text \ |
| | --min_translation_prob 1e-4 \ |
| | --max_translation_alternatives 64 \ |
| | --output_dir ./indexes/neuclir-zh.f32/ \ |
| | --compression \ |
| | --nworkers 64 |
| | ``` |
| |
|
| | The following command is an example for searching. |
| | ```bash |
| | python -m fast_psq.search \ |
| | --query_source irds:neuclir/1/zh/trec-2022 \ |
| | --query_field title \ |
| | --index_dir ./indexes/neuclir-zh.f32/ \ |
| | --qrels irds:neuclir/1/zh/trec-2022 \ |
| | --query_lang en \ |
| | --output_file ./neuclir-zh.en.title.f32.trec |
| | ``` |
| |
|
| |
|
| | ## Citation |
| |
|
| | ```bibtex |
| | @article{psq-repro, |
| | title = {Efficiency-Effectiveness Tradeoff of Probabilistic Structured Queries for Cross-Language Information Retrieval}, |
| | author = {Eugene Yang and Suraj Nair and Dawn Lawrie and James Mayfield and Douglas W. Oard and Kevin Duh}, |
| | journal = {arXiv preprint arXiv}, |
| | year = {2024} |
| | } |
| | ``` |