Buckets:
| # PostgreSQL | |
| [PostgreSQL](https://www.postgresql.org/docs/) is a powerful, open source object-relational database system. It is the most [popular](https://survey.stackoverflow.co/2024/technology#most-popular-technologies-database) database by application developers for a few years running. [pgai](https://github.com/timescale/pgai) is a PostgreSQL extension that allows you to easily ingest huggingface datasets into your PostgreSQL database. | |
| ## Run PostgreSQL with pgai installed | |
| You can easily run a docker container containing PostgreSQL with pgai. | |
| ```bash | |
| docker run -d --name pgai -p 5432:5432 \ | |
| -v pg-data:/home/postgres/pgdata/data \ | |
| -e POSTGRES_PASSWORD=password timescale/timescaledb-ha:pg17 | |
| ``` | |
| Then run the following command to install pgai into the database. | |
| ```bash | |
| docker exec -it pgai psql -c "CREATE EXTENSION ai CASCADE;" | |
| ``` | |
| You can then connect to the database using the `psql` command line tool in the container. | |
| ```bash | |
| docker exec -it pgai psql | |
| ``` | |
| or using your favorite PostgreSQL client using the following connection string: `postgresql://postgres:password@localhost:5432/postgres | |
| ` | |
| Alternatively, you can install pgai into an existing PostgreSQL database. For instructions on how to install pgai into an existing PostgreSQL database, follow the instructions in the [github repo](https://github.com/timescale/pgai). | |
| ## Create a table from a dataset | |
| To load a dataset into PostgreSQL, you can use the `ai.load_dataset` function. This function will create a PostgreSQL table, and load the dataset from the Hugging Face Hub | |
| in a streaming fashion. | |
| ```sql | |
| select ai.load_dataset('rajpurkar/squad', table_name => 'squad'); | |
| ``` | |
| You can now query the table using standard SQL. | |
| ```sql | |
| select * from squad limit 10; | |
| ``` | |
| Full documentation for the `ai.load_dataset` function can be found [here](https://github.com/timescale/pgai/blob/main/docs/load_dataset_from_huggingface.md). | |
| ## Import only a subset of the dataset | |
| You can also import a subset of the dataset by specifying the `max_batches` parameter. | |
| This is useful if the dataset is large and you want to experiment with a smaller subset. | |
| ```sql | |
| SELECT ai.load_dataset('rajpurkar/squad', table_name => 'squad', batch_size => 100, max_batches => 1); | |
| ``` | |
| ## Load a dataset into an existing table | |
| You can also load a dataset into an existing table. | |
| This is useful if you want more control over the data schema or want to predefine indexes and constraints on the data. | |
| ```sql | |
| select ai.load_dataset('rajpurkar/squad', table_name => 'squad', if_table_exists => 'append'); | |
| ``` | |
Xet Storage Details
- Size:
- 2.57 kB
- Xet hash:
- 701f09e9becc91a273834294fa4f51c7d6c501c975a64d25e70a186dc050013c
·
Xet efficiently stores files, intelligently splitting them into unique chunks and accelerating uploads and downloads. More info.