Buckets:
| dataset_info: | |
| features: | |
| - name: prompt | |
| dtype: string | |
| - name: answer | |
| dtype: string | |
| - name: labels | |
| list: | |
| - name: end | |
| dtype: int64 | |
| - name: label | |
| dtype: string | |
| - name: start | |
| dtype: int64 | |
| - name: split | |
| dtype: string | |
| - name: task_type | |
| dtype: string | |
| - name: dataset | |
| dtype: string | |
| - name: language | |
| dtype: string | |
| splits: | |
| - name: train | |
| num_bytes: 66155103 | |
| num_examples: 17790 | |
| download_size: 17097014 | |
| dataset_size: 66155103 | |
| configs: | |
| - config_name: default | |
| data_files: | |
| - split: train | |
| path: data/train-* | |
| license: mit | |
| task_categories: | |
| - question-answering | |
| - text-generation | |
| - summarization | |
| - text-classification | |
| - text-retrieval | |
| language: | |
| - tr | |
| tags: | |
| - turkish | |
| - hallucination-detection | |
| - rag | |
| - low-resource | |
| - multilingual | |
| # RAGTruth-TR | |
| `newmindai/RAGTruth-TR` is a Turkish-translated version of the [`wandb/RAGTruth-processed`](https://huggingface.co/datasets/wandb/RAGTruth-processed) dataset. | |
| It is designed for evaluating **Retrieval-Augmented Generation (RAG)** systems in Turkish, enabling research in hallucination detection, fact-checking, and response quality assessment. | |
| --- | |
| ## Dataset Summary | |
| * **Source Dataset:** [`wandb/RAGTruth-processed`](https://huggingface.co/datasets/wandb/RAGTruth-processed) | |
| * **Target Language:** Turkish | |
| * **Purpose:** Hallucination detection and RAG evaluation in Turkish NLP systems | |
| * **License:** MIT (inherits from the original dataset) | |
| This dataset follows the same schema as the original RAGTruth-processed dataset but provides high-quality Turkish translations. | |
| --- | |
| ## Dataset Creation | |
| The dataset was originally derived from the **RAGTruth** dataset, which contains responses from retrieval-augmented generation models annotated for hallucinations. | |
| For this version: | |
| * The English dataset was **translated into Turkish**. | |
| * Translation was performed using **Gemma 3 27B**. | |
| * We used a **single NVIDIA L40S GPU** and served the model through **vLLM** for efficient translation. | |
| This ensures **consistent terminology** and **domain-specific fidelity** in the Turkish version. | |
| --- | |
| ## Languages | |
| * **Turkish (tr)** | |
| * Original dataset: **English (en)** | |
| --- | |
| ## Dataset Structure | |
| The structure mirrors the original `wandb/RAGTruth-processed`. | |
| Each example includes: | |
| * `id`: Unique identifier | |
| * `question`: The user query (translated to Turkish) | |
| * `context`: Retrieved passages (translated to Turkish) | |
| * `answer`: Generated response (translated to Turkish) | |
| * `hallucination_label`: Annotation for hallucinations (inherited, unchanged) | |
| --- | |
| ## Example | |
| ```json | |
| { | |
| "id": "12345", | |
| "question": "Türkçeye çevrilmiş bir örnek soru", | |
| "context": [ | |
| "Türkçeye çevrilmiş bir bağlam pasajı." | |
| ], | |
| "answer": "Türkçeye çevrilmiş model cevabı.", | |
| "hallucination_label": "no_hallucination" | |
| } | |
| ``` | |
| --- | |
| ## Intended Uses | |
| * Evaluate **hallucination detection** methods in Turkish. | |
| * Fine-tune or evaluate **RAG models** in Turkish. | |
| * Use as a **benchmark dataset** for multilingual hallucination detection. | |
| --- | |
| ## Citation | |
| If you use this dataset, please cite both the original and this translated version, and our paper: | |
| ```bibtex | |
| @misc{ragtruth-tr, | |
| author = {newmindai}, | |
| title = {RAGTruth-TR: Turkish Translation of RAGTruth}, | |
| year = {2025}, | |
| howpublished = {Hugging Face}, | |
| url = {https://huggingface.co/datasets/newmindai/RAGTruth-TR} | |
| } | |
| @misc{ragtruth-processed, | |
| author = {Weights & Biases}, | |
| title = {RAGTruth-processed}, | |
| year = {2025}, | |
| howpublished = {Hugging Face}, | |
| url = {https://huggingface.co/datasets/wandb/RAGTruth-processed} | |
| } | |
| @article{turklettuceDetect2025, | |
| title={Turk-LettuceDetect: A Hallucination Detection Models for Turkish RAG Applications}, | |
| author={Selva Taş, Mahmut El Huseyni, Özay Ezerceli, Reyhan Bayraktar, Fatma Betül Terzioğlu}, | |
| journal={arXiv preprint arXiv:2509.17671}, | |
| year={2025} | |
| } | |
| ``` |
Xet Storage Details
- Size:
- 3.98 kB
- Xet hash:
- d69566901e06fe4493473f9ea9921d14c933df52024d90406c9cb4269ad519b0
·
Xet efficiently stores files, intelligently splitting them into unique chunks and accelerating uploads and downloads. More info.