Spaces:
Sleeping
Sleeping
| title: Llm Classifier | |
| emoji: π | |
| colorFrom: green | |
| colorTo: purple | |
| sdk: streamlit | |
| sdk_version: 1.40.2 | |
| app_file: app/main.py | |
| pinned: false | |
| license: mit | |
| short_description: Zero-Shot Classifier | |
| This app allows you to use large language models (LLMs) for text classification on your custom datasets. | |
| ## π Key Features | |
| 1. **Custom Labels and Descriptions** | |
| - The system allows end-users to define their own labels and provide descriptive text for each label. | |
| 2. **Binary and Multi-Class Classification** | |
| - The system supports both **binary classification** (e.g., spam vs. not spam) and **multi-class classification** (e.g., positive, negative, neutral). | |
| 3. **Few-Shot Learning** | |
| - Users can enable **few-shot prompting** by selecting example rows from the dataset to guide the model's understanding. | |
| - The system automatically selects and excludes these examples from the main dataset to improve prediction accuracy without affecting evaluation. | |
| 4. **Additional Utility Features** | |
| - **Cost-aware**: Limits max tokens generated by the LLM and sends only as many rows as the user specifes to minimize costs during experimentation. | |
| - **Inference Mode**: Automatically adapts when no target column is specified, providing label distribution statistics instead of evaluation metrics. | |
| - **Verbose Mode**: Users can inspect raw prompts sent to the LLM and responses received, enabling transparency and debugging. | |
| - **Progress Tracking**: A progress bar shows the classification status row-by-row. | |
| ## π¦ How It Works | |
| 1. **Upload Data**: Drag and drop a CSV file to load data into the system. | |
| 2. **Select Target Column**: Choose the column to classify or run in inference mode (no target column). | |
| 3. **Define Labels**: Add custom labels and their descriptions to guide classification. | |
| 4. **Choose Features**: Select the features (columns) that should be used for classification. | |
| 5. **Few-Shot Examples**: Optionally enable few-shot learning by providing examples from the dataset. | |
| 6. **Run Classification**: View predictions, evaluate metrics (if labels are provided), or analyze label distribution (in inference mode). | |
| ## Example Datasets | |
| 1. (Binary) https://www.kaggle.com/datasets/ozlerhakan/spam-or-not-spam-dataset | |
| 2. (Multi-class) https://www.kaggle.com/datasets/mdismielhossenabir/sentiment-analysis | |
| 3. (Multi-class) https://www.kaggle.com/datasets/pashupatigupta/emotion-detection-from-text | |
| ## π Notes | |
| - Ensure your **OpenAI API** key is valid and has sufficient quota. | |
| - If your CSV includes a target column you can take advantage of few-shot prompting. | |
| ## π‘ Ideas for future | |
| - (**Clustering + LLM hybrid**) I was considering implementing clustering (say with K-Means) and a specific k and then asking the LLM to associate provided labels with those k clusters. | |
| - (**Multi-modal** support) Would be nice to support images, audio, etc. so the beloved cats vs dogs classification could be feasible. We'd use one of the multi-modal LLMs from OpenAI to base64-encode the image and send it along with the rest of the conversation. | |
| ## π₯ Needs work | |
| - **Evaluation** needs a lot of work. If I had more time, I'd start there. We'd have to both show that the selected LLM configuration + PROMPT achives good performance on standard classification datasets. The hope is then it will do well on datasets with explicit supervision signal. | |
| - The UI is still pretty clunky. There is a lot of logic that's mixed in with visual elements. | |
| - Tests, which I've generated entirely with an LLM, are not at all sufficient. | |
| - The system prompt can be improved, I didn't make any modifications from the initial one. | |