--- title: README emoji: 📈 colorFrom: red colorTo: gray sdk: static pinned: false --- ## Chroma Datasets Making it easy to load data into Chroma since 2023 ``` pip install chroma_datasets ``` ### Current Datasets - State of the Union `from chroma_datasets import StateOfTheUnion` - Paul Graham Essay `from chroma_datasets import PaulGrahamEssay` - Glue `from chroma_datasets import Glue` - SciPy `from chroma_datasets import SciPy` `chroma_datasets` is generally backed by hugging face datasets, but it is not a requirement. ### How to use The following will: 1. Download the 2022 State of the Union 2. Chunk it up for you 3. Embed it using Chroma's default open-source embedding function 4. Import it into Chroma ```python import chromadb from chroma_datasets import StateOfTheUnion from chroma_datasets.utils import import_into_chroma chroma_client = chromadb.Client() collection = import_into_chroma(chroma_client=chroma_client, dataset=StateOfTheUnion) result = collection.query(query_texts=["The United States of America"]) print(result) ``` Learn about how to create and contribute a package at [chroma-core/chroma_datasets](https://github.com/chroma-core/chroma_datasets).