Spaces:
Sleeping
Sleeping
Commit
·
4e4f6b0
1
Parent(s):
30d6e30
added dataset
Browse files
README.md
CHANGED
|
@@ -21,7 +21,7 @@ Local laws are often written in dense legal terminology that the average person
|
|
| 21 |
|
| 22 |
## 2. Data
|
| 23 |
|
| 24 |
-
For the RAG pipeline, the knowledge base consists of the unedited Charlottesville Municipal Code text, scraped and pre-processed from [Municode](https://library.municode.com/va/charlottesville/codes/code_of_ordinances). These chunks were not rephrased, ensuring that the retrieval mechanism pulls the exact letter of the law. To evaluate the RAG pipeline, I utilized a set of questions and answers generated from the original sections of the municipal code to validate retrieval accuracy (checking if the retrieved node matched the ground truth node for a given query).
|
| 25 |
|
| 26 |
## 3. Methodology
|
| 27 |
|
|
|
|
| 21 |
|
| 22 |
## 2. Data
|
| 23 |
|
| 24 |
+
For the RAG pipeline, the knowledge base consists of the unedited Charlottesville Municipal Code text, scraped and pre-processed from [Municode](https://library.municode.com/va/charlottesville/codes/code_of_ordinances). These chunks were not rephrased, ensuring that the retrieval mechanism pulls the exact letter of the law. To evaluate the RAG pipeline, I utilized a set of questions and answers generated from the original sections of the municipal code to validate retrieval accuracy (checking if the retrieved node matched the ground truth node for a given query). This dataset can be found at [jme-datasci/charlottesville_qa](https://huggingface.co/datasets/jme-datasci/charlottesville_qa/tree/main).
|
| 25 |
|
| 26 |
## 3. Methodology
|
| 27 |
|