Spaces:

MrHoosk
/

bigquery_metadata_generator

Runtime error

App Files Files Community

bigquery_metadata_generator / docs /example_usage.md

MrHoosk's picture

Add initial unit tests and utility modules for Schema Descriptor application

1521ef5 about 1 year ago

|

history blame contribute delete

4.26 kB

A newer version of the Streamlit SDK is available: 1.56.0

Example Usage

This guide demonstrates how to use Schema Descriptor to generate descriptions for BigQuery datasets.

Prerequisites

Before you begin, make sure you have:

A Google Cloud service account with access to BigQuery
An OpenAI API key
Schema Descriptor installed and configured (see the README.md)

Basic Usage

Step 1: Start the application

streamlit run app.py

This will open the application in your web browser.

Step 2: Configure Authentication

Enter your OpenAI API key in the sidebar
Upload your Google Cloud service account JSON key file
The application will verify your credentials

Step 3: Select Project and Dataset

Enter your Google Cloud project ID
Select a dataset from the dropdown menu
Verify that the tables are displayed correctly

Step 4: Configure Sampling Parameters

Adjust the "Sample Size" slider to control how many rows to sample per table
If your tables are partitioned, set date filters to sample a specific range

Step 5: Generate Descriptions

Click "Check Cost" to see an estimate of the BigQuery usage (optional)
Click "Create Data Descriptions" to start the process
Watch the progress indicators as the application:
- Samples data from each table
- Sends information to the LLM
- Generates descriptions for the dataset, tables, and columns

Step 6: Review and Edit

Review the automatically generated descriptions
Edit any descriptions that need improvement or correction
The editor supports markdown formatting for better readability

Step 7: Save to BigQuery

When you're satisfied with the descriptions, click "Commit Changes to BigQuery"
The application will update your BigQuery metadata with the new descriptions
You'll see a confirmation message when complete

Advanced Features

Custom Instructions

You can provide custom instructions to the LLM by entering them in the "Additional Instructions" field. For example:

"Focus on data governance aspects"
"Highlight PII and sensitive data fields"
"Use technical terminology appropriate for financial data"

Error Handling

If you encounter errors:

Check the logs in the console where you started Streamlit
Verify that your service account has the correct permissions
For OpenAI API errors, check your rate limits and API key status

Caching

The application caches LLM responses to save costs. If you want to regenerate descriptions:

Clear the cache by restarting the application
Or use the "Force Refresh" option if implemented

Example Outputs

Below is an example of how your descriptions might look in BigQuery after using Schema Descriptor:

Dataset Description

Sales Data Warehouse (SDW)

This dataset contains comprehensive sales transaction data from our e-commerce platform. It includes customer information, product details, orders, and shipping data from January 2020 to present.

The data is refreshed daily through an ETL process and is used for sales reporting, customer analysis, and inventory management.

Table Description

Customer Orders Table

This table records all customer orders with associated metadata. Each row represents a unique order with details about the customer, timing, payment method, and order status.

The table is partitioned by order_date for efficient querying of specific time periods.

Column Descriptions

- customer_id: Unique identifier for the customer who placed the order
- order_date: Timestamp when the order was placed (YYYY-MM-DD format)
- payment_method: Method used for payment (e.g., "credit_card", "paypal", "gift_card")
- order_total: Total monetary value of the order in USD, excluding tax and shipping

Conclusion

Schema Descriptor makes it easy to maintain comprehensive, accurate documentation for your BigQuery resources with minimal manual effort.

For more details on the application's features and configuration options, refer to the README.md.