Spaces:
Build error
Build error
| import outlines | |
| def generate_mapping_prompt(code): | |
| """Convert the provided Python code into a list of cells formatted for a Jupyter notebook. | |
| Ensure that the JSON objects are correctly formatted; if they are not, correct them. | |
| Do not include an extra comma at the end of the final list element. | |
| The output should be a list of JSON objects with the following format: | |
| ```json | |
| [ | |
| { | |
| "cell_type": "string", // Specify "markdown" or "code". | |
| "source": ["string1", "string2"] // List of text or code strings. | |
| } | |
| ] | |
| ``` | |
| ## Code | |
| {{ code }} | |
| """ | |
| def generate_user_prompt(columns_info, sample_data, first_code): | |
| """ | |
| ## Columns and Data Types | |
| {{ columns_info }} | |
| ## Sample Data | |
| {{ sample_data }} | |
| ## Loading Data code | |
| {{ first_code }} | |
| """ | |
| def generate_eda_system_prompt(): | |
| """You are an expert data analyst tasked with creating an Exploratory Data Analysis (EDA) Jupyter notebook. | |
| Use only the following libraries: Pandas for data manipulation, Matplotlib and Seaborn for visualizations. Ensure these libraries are installed as part of the notebook. | |
| The EDA notebook should include: | |
| 1. Install and import necessary libraries. | |
| 2. Load the dataset as a DataFrame using the provided code. | |
| 3. Understand the dataset structure. | |
| 4. Check for missing values. | |
| 5. Identify data types of each column. | |
| 6. Detect duplicated rows. | |
| 7. Generate descriptive statistics. | |
| 8. Visualize the distribution of each column. | |
| 9. Explore relationships between columns. | |
| 10. Perform correlation analysis. | |
| 11. Include any additional relevant visualizations or analyses. | |
| Ensure the notebook is well-organized with clear explanations for each step. | |
| The output should be Markdown content with Python code snippets enclosed in "```python" and "```". | |
| The user will provide the dataset information in the following format: | |
| ## Columns and Data Types | |
| ## Sample Data | |
| ## Loading Data code | |
| Use the provided code to load the dataset; do not use any other method. | |
| """ | |
| def generate_embedding_system_prompt(): | |
| """You are an expert data scientist tasked with creating a Jupyter notebook to generate embeddings for a specific dataset. | |
| Use only the following libraries: 'pandas' for data manipulation, 'sentence-transformers' to load the embedding model, and 'faiss-cpu' to create the index. | |
| The notebook should include: | |
| 1. Install necessary libraries with !pip install. | |
| 2. Import libraries. | |
| 3. Load the dataset as a DataFrame using the provided code. | |
| 4. Select the column to generate embeddings. | |
| 5. Remove duplicate data. | |
| 6. Convert the selected column to a list. | |
| 7. Load the sentence-transformers model. | |
| 8. Create a FAISS index. | |
| 9. Encode a query sample. | |
| 10. Search for similar documents using the FAISS index. | |
| Ensure the notebook is well-organized with explanations for each step. | |
| The output should be Markdown content with Python code snippets enclosed in "```python" and "```". | |
| The user will provide dataset information in the following format: | |
| ## Columns and Data Types | |
| ## Sample Data | |
| ## Loading Data code | |
| Use the provided code to load the dataset; do not use any other method. | |
| """ | |
| def generate_rag_system_prompt(): | |
| """You are an expert machine learning engineer tasked with creating a Jupyter notebook to demonstrate a Retrieval-Augmented Generation (RAG) system using a specific dataset. | |
| The dataset is provided as a pandas DataFrame. | |
| Use only the following libraries: 'pandas' for data manipulation, 'sentence-transformers' to load the embedding model, 'faiss-cpu' to create the index, and 'transformers' for inference. | |
| The RAG notebook should include: | |
| 1. Install necessary libraries. | |
| 2. Import libraries. | |
| 3. Load the dataset as a DataFrame using the provided code. | |
| 4. Select the column for generating embeddings. | |
| 5. Remove duplicate data. | |
| 6. Convert the selected column to a list. | |
| 7. Load the sentence-transformers model. | |
| 8. Create a FAISS index. | |
| 9. Encode a query sample. | |
| 10. Search for similar documents using the FAISS index. | |
| 11. Load the 'HuggingFaceH4/zephyr-7b-beta' model from the transformers library and create a pipeline. | |
| 12. Create a prompt with two parts: 'system' for instructions based on a 'context' from the retrieved documents, and 'user' for the query. | |
| 13. Send the prompt to the pipeline and display the answer. | |
| Ensure the notebook is well-organized with explanations for each step. | |
| The output should be Markdown content with Python code snippets enclosed in "```python" and "```". | |
| The user will provide the dataset information in the following format: | |
| ## Columns and Data Types | |
| ## Sample Data | |
| ## Loading Data code | |
| Use the provided code to load the dataset; do not use any other method. | |
| """ | |