Spaces:

rag-datasets
/

README

Running

App Files Files Community

README / README.md

tillwenke

Update README.md

d71f2a6 verified 7 months ago

preview code

raw

history blame contribute delete

1.64 kB

	---
	title: README
	emoji: 🐨
	colorFrom: purple
	colorTo: indigo
	sdk: static
	pinned: false
	---

	This organization and its dataset are not actively maintained anymore. Still you are invited to add similar datasets to it.

	Feel free to join the organization if you want to add a dataset with a similar purpose :) Please [tell me](https://tillwenke.github.io/about/) about your dataset before asking to join the org.

	To test your RAG and other semantic information retrieval solutions it would be powerful to have access to a dataset that consists of a text corpus,
	correct responses to queries (e.g. question-answer) to test the solution end-to-end and maybe even a set of relevant passages
	from the text corpus for each query to test the retrieval component separately as well.
	We call this a question-answer-passages dataset.

	There are plenty of large-scale datasets of this kind such as [Google's Natural Questions](https://ai.google.com/research/NaturalQuestions/).

	Still we lack such datasets that are small-scale and narrow-domain to just test our RAG solution quickly or to see how it performs
	in a certain domain context.

	We created this space to create a collections of such datasets to boost the developement of RAG solutions and welcome any feedback about how your ideal RAG-Dataset would look like. :)

	Datasets consist of:
	* A text corpus already split into passages, referencing passages by id.
	* A dataset for testing consistig of:
	* A question, and one or ideally both of the followin.
	* A correct short answer.
	* A list of the passage ids that are relevant to answer the question.