jme-datasci commited on
Commit
4e4f6b0
·
1 Parent(s): 30d6e30

added dataset

Browse files
Files changed (1) hide show
  1. README.md +1 -1
README.md CHANGED
@@ -21,7 +21,7 @@ Local laws are often written in dense legal terminology that the average person
21
 
22
  ## 2. Data
23
 
24
- For the RAG pipeline, the knowledge base consists of the unedited Charlottesville Municipal Code text, scraped and pre-processed from [Municode](https://library.municode.com/va/charlottesville/codes/code_of_ordinances). These chunks were not rephrased, ensuring that the retrieval mechanism pulls the exact letter of the law. To evaluate the RAG pipeline, I utilized a set of questions and answers generated from the original sections of the municipal code to validate retrieval accuracy (checking if the retrieved node matched the ground truth node for a given query).
25
 
26
  ## 3. Methodology
27
 
 
21
 
22
  ## 2. Data
23
 
24
+ For the RAG pipeline, the knowledge base consists of the unedited Charlottesville Municipal Code text, scraped and pre-processed from [Municode](https://library.municode.com/va/charlottesville/codes/code_of_ordinances). These chunks were not rephrased, ensuring that the retrieval mechanism pulls the exact letter of the law. To evaluate the RAG pipeline, I utilized a set of questions and answers generated from the original sections of the municipal code to validate retrieval accuracy (checking if the retrieved node matched the ground truth node for a given query). This dataset can be found at [jme-datasci/charlottesville_qa](https://huggingface.co/datasets/jme-datasci/charlottesville_qa/tree/main).
25
 
26
  ## 3. Methodology
27