File size: 6,273 Bytes
7e0815e fe7cad2 20be5b8 7ba47b9 fe7cad2 7ba47b9 fe7cad2 0d6d59d 9c4a566 7ba47b9 fe7cad2 ad2a69c fe7cad2 ad2a69c 0d6d59d ad2a69c fe7cad2 7ba47b9 fe7cad2 7ba47b9 fe7cad2 7ba47b9 fe7cad2 20be5b8 ac3bc2f 7ba47b9 fe7cad2 7ba47b9 fe7cad2 7ba47b9 fe7cad2 ad2a69c 9c4a566 ad2a69c 0d6d59d fe7cad2 0d6d59d 9c4a566 20be5b8 fe7cad2 7ba47b9 fe7cad2 7ba47b9 fe7cad2 7ba47b9 fe7cad2 0d6d59d fe7cad2 0d6d59d fe7cad2 0d6d59d fe7cad2 20be5b8 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 | ---
title: Business Category Description Generator
emoji: π’
colorFrom: blue
colorTo: purple
sdk: gradio
app_file: app.py
pinned: false
---
# Business Category Description Generator
A Hugging Face Gradio application that generates CLIP-ready visual descriptions for business category keywords from CSV files.
## Features
- π€ **Upload Multiple CSV Files**: Process one or more CSV files at once
- π **Batch Processing**: Automatically processes all unique categories from your files
- π€ **AI-Powered**: Uses OpenAI's GPT-OSS-20B model for high-quality descriptions
- π **Automatic Retry Logic**: 3 attempts per category with intelligent error recovery
- β
**Validation**: JSON validation and quality checks for every description
- π **Progress Tracking**: Real-time progress updates with success/failure reporting
- πΎ **Automatic Saving**: Output files with Status column showing results
- π₯ **Easy Download**: Download all processed files directly from the interface
- β‘ **Zero GPU Support**: Use Zero GPU for faster, free GPU acceleration
## How to Use
### 1. Deploy to Hugging Face Spaces
1. Go to [Hugging Face Spaces](https://huggingface.co/spaces)
2. Click "Create new Space"
3. Choose "Gradio" as the SDK
4. Upload `app.py`, `requirements.txt`, and `README.md`
5. **Add Your HF Token as a Secret (Required)**:
- Go to your Space's Settings (gear icon)
- Find the "Repository secrets" or "Secrets" section
- Click "Add a secret" or "New secret"
- Enter:
- **Name**: `HF_TOKEN`
- **Value**: Your Hugging Face token (get from https://huggingface.co/settings/tokens)
- Click "Save"
6. **Optional: Enable Zero GPU for Faster Processing**:
- Zero GPU provides free GPU acceleration
- No Pro subscription required
- Space will automatically use GPU when available
- Significantly speeds up processing for large batches
7. Your app will be deployed and restart automatically!
### 2. Prepare Your CSV Files
Your CSV files should contain a column with business category keywords. For example:
```csv
category,other_column
Car Rental For Self Driven,additional_data
Mehandi,additional_data
Photographer,additional_data
Equipment,additional_data
```
### 3. Use the Application
1. **Upload Files**: Upload one or more CSV files
2. **Specify Column**: Enter the name of the column containing categories (default: "category")
3. **Adjust Settings** (optional):
- Max Tokens: 64-512 (default: 256)
- Temperature: 0.1-1.0 (default: 0.7)
- Top-p: 0.1-1.0 (default: 0.9)
4. **Process**: Click "Process Files" and wait for completion
5. **Download**: Download the output CSV files with descriptions
*Note: Authentication is handled automatically via the HF_TOKEN secret you configured in Space settings.*
## Output Format
Each output CSV file contains:
| Column | Description |
|--------|-------------|
| `Category` | The original category keyword |
| `Description` | The generated CLIP-ready visual description (validated) |
| `Raw_Response` | The complete model response (for debugging) |
| `Status` | "Success" or "Failed" with error details |
## Example Output
```csv
Category,Description,Raw_Response,Status
Car Rental For Self Driven,"a car available for self-drive rental, parked at a pickup spot without a chauffeur; looks travel-ready, clean, well-maintained, keys handed over to customer","{""Category"": ""Car Rental For Self Driven"", ""Description"": ""...""}",Success
```
## Model Settings
- **Max Tokens**: Controls the maximum length of generated descriptions (default: 256)
- **Temperature**: Controls output consistency (default: 0.3)
- 0.2-0.4: Consistent, focused descriptions (recommended)
- 0.5-0.7: Balanced creativity and consistency
- 0.8-1.0: More creative variations
- **Top-p**: Nucleus sampling parameter, controls diversity (default: 0.9)
## Technical Details
- **Model**: openai/gpt-oss-20b
- **Framework**: Gradio (latest stable version)
- **Retry Logic**: 3 attempts per category with 1-second delay between retries
- **Validation**: JSON parsing, structure validation, and minimum length checks
- **Processing**: Categories are deduplicated automatically
- **Rate Limiting**: 0.5-second delay between categories to avoid API throttling
- **Output Files**: Named as `output_{original_name}_{timestamp}.csv`
- **Zero GPU Support**: Free GPU acceleration available for Spaces
## Troubleshooting
### "HF_TOKEN not found" error
- Make sure you've added `HF_TOKEN` as a Secret in your Space settings
- Go to Space Settings β Secrets β Add a secret
- Name must be exactly: `HF_TOKEN` (case-sensitive)
- Value: your token from https://huggingface.co/settings/tokens
- Restart your Space after adding the secret (or it will restart automatically)
### "Column not found" error
- Check that the column name matches exactly (case-sensitive)
- View the error message to see available columns
### Authentication errors
- Ensure your HF token has proper permissions (Read access minimum)
- Check that your account has access to the Inference API
- Verify the token hasn't expired
- Make sure you're using a valid token from https://huggingface.co/settings/tokens
### Inconsistent or incomplete output
- Lower the Temperature to 0.2-0.4 for more consistent results
- Check the Status column in output CSV to identify failed categories
- Failed categories can be extracted and reprocessed separately
- Zero GPU will provide more reliable processing with better resources
### Slow processing
- The model processes each unique category individually (includes retries)
- Large files with many unique categories will take longer
- Consider splitting very large files into smaller batches
- Zero GPU acceleration is automatically available for your Space
- Each category has a 0.5s delay to prevent rate limiting
## Local Development
To run locally:
```bash
# Install dependencies
pip install -r requirements.txt
# Set your Hugging Face token as an environment variable
# Windows (PowerShell):
$env:HF_TOKEN="your_hf_token_here"
# Linux/Mac:
export HF_TOKEN="your_hf_token_here"
# Run the app
python app.py
```
Get your token from: https://huggingface.co/settings/tokens
## License
This project uses the GPT-OSS-20B model via Hugging Face Inference API.
|