Spaces:
Sleeping
Sleeping
Rename Operational_Instructions/DM_Overview (1).md to Operational_Instructions/DM_Overview.md
3f9bccf
verified
| # DataMorgana Overview | |
| DataMorgana is an innovative tool designed to generate diverse and customizable synthetic benchmarks for Retrieval-Augmented Generation (RAG) systems. Its key innovation lies in its ability to create highly varied question-answer pairs that more realistically represent how different types of users might interact with a system. | |
| The tool operates through two main stages: **configuration** and **generation**. | |
| --- | |
| ## Configuration Stage | |
| The configuration stage allows for the definition of detailed categorizations and associated categories for both questions and end-users, which provide high-level information on the expected traffic of the RAG application. A **categorization** is a list of mutually exclusive question or user categories along with their desired distribution within the generated benchmark. | |
| For example, a question categorization might include **search queries vs. natural language questions**, while a user categorization might include **novice vs. expert users**. There can be as many categorizations of questions and users as needed, and they can be easily defined to address the specific requirements of the applicative scenario. For instance: | |
| - In a **healthcare RAG application**, a user categorization could consist of **patient, doctor, and public health authority**. | |
| - In a **RAG-based embassy chatbot**, a categorization might include **diplomat, student, worker, and tourist**. | |
| --- | |
| ## Generation Stage | |
| At the generation stage, DataMorgana leverages state-of-the-art **LLMs** (e.g., Claude 3.5 Sonnet) to incrementally build a benchmark of Q&A pairs. Each pair is generated by following the procedure depicted in **Figure 1**. | |
|  | |
| <center><b>Fig. 1: DataMorgana Generation Stage</b> In the configuration, we provide an end-user categorization and two question categorizations, namely question formulations and question types.</center> | |
| More specifically, the DataMorgana generation process follows these steps: | |
| 1. **Category Selection:** | |
| - It selects a **user/question category** for each categorization according to the probability distributions specified in the configuration file. | |
| - These are automatically combined to create a unique prompt. | |
| 2. **Document Selection:** | |
| - It randomly selects **documents** from the target corpus and adds them to the prompt. | |
| 3. **Question-Answer Generation:** | |
| - The chosen **LLM** is invoked with the instantiated prompt to generate **๐ candidate question-answer pairs** about the selected documents. | |
| 4. **Filtering and Verification:** | |
| - A final filtering stage verifies that these candidate pairs: | |
| - Adhere to the specified **categories**. | |
| - Are **faithful** to the selected documents. | |
| - Satisfy general constraints (e.g., be **context-free**). | |
| - If multiple pairs satisfy the quality requirements, **one is sampled**. | |
| --- | |
| ## Key Advantages | |
| The rich and easy-to-use configurability of DataMorgana allows for **fine-grained control** over question and user characteristics. Furthermore, by jointly using multiple categorizations, DataMorgana can achieve a **combinatorial number of possibilities** to define Q&A pairs. This leads to more **diverse benchmarks** compared to existing tools that typically use a predefined list of possible question types. | |
| Further details about DataMorgana, as well as **experimental results demonstrating its superior diversity**, are available in this [paper](Generating_Diverse_Q&A_Benchmarks_for_RAG_Evaluation_with_DataMorgana.pdf). | |